I have a folder called
"C:/Users/Documents/files_i_want" which contains several PDF files (all with different names) that I am trying to import into R.
I tried to use the following code to import all the pdf files together:
library(pdftools) library(tesseract) #Get the path of filenames filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE) #Read them in a list list_data <- lapply(filenames, pdftools::pdf_convert) #Name them as per your choice (df_1, df_2 etc) names(list_data) <- paste('df', seq_along(filenames), sep = '_') #Create objects in global environment. list2env(list_data, .GlobalEnv)
But this produced the following errors:
Converting page 1 to 2_sample_1.png...PDF error: No display font for 'ArialUnicode' done! Converting page 2 to 2_sample_2.png... done! Converting page 1 to sample_1_1.png...PDF error: No display font for 'ArialUnicode' done! Converting page 2 to sample_1_2.png... done!
When I try to view to view the pdf files that were imported, all I get is this:
df_1  "2_sample_1.png" "2_sample_2.png
Can someone please show me how to fix this?
Note: I figured out how to solve this problem by manually importing each file, e.g.
#import and convert 1st file pngfile_1 <- pdftools::pdf_convert('myfile_1.pdf', dpi = 600) text_1 <- tesseract::ocr(pngfile_1) #import and convert 2nd file (note: the files do not have the same naming convention) pngfile_2 <- pdftools::pdf_convert('second_file.pdf', dpi = 600) text_2 <- tesseract::ocr(pngfile_2) etc
But I am trying to find a quicker way to do this.