I have a folder called "C:/Users/Documents/files_i_want" which contains several PDF files (all with different names) that I am trying to import into R.

I tried to use the following code to import all the pdf files together:


#Get the path of filenames

filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)

#Read them in a list

list_data <- lapply(filenames,  pdftools::pdf_convert)

#Name them as per your choice (df_1, df_2 etc)

names(list_data) <- paste('df', seq_along(filenames), sep = '_')

#Create objects in global environment.

list2env(list_data, .GlobalEnv)

But this produced the following errors:

Converting page 1 to 2_sample_1.png...PDF error: No display font for 'ArialUnicode'
Converting page 2 to 2_sample_2.png... done!
Converting page 1 to sample_1_1.png...PDF error: No display font for 'ArialUnicode'
Converting page 2 to sample_1_2.png... done!

When I try to view to view the pdf files that were imported, all I get is this:

[1] "2_sample_1.png" "2_sample_2.png

Can someone please show me how to fix this?


Note: I figured out how to solve this problem by manually importing each file, e.g.

#import and convert 1st file
   pngfile_1 <- pdftools::pdf_convert('myfile_1.pdf', dpi = 600)
    text_1 <- tesseract::ocr(pngfile_1)

#import and convert 2nd file (note: the files do not have the same naming convention)
   pngfile_2 <- pdftools::pdf_convert('second_file.pdf', dpi = 600)
    text_2 <- tesseract::ocr(pngfile_2)


But I am trying to find a quicker way to do this.


