Convert PDF to an Excel or CSV

Hello i am very new to R, i am looking at how to convert a PDF to Excel. is there any example code and i can try?

This is a tougher task than it might seem, since PDF encoding is very complicated and can't always be extracted with the same spatial relations we perceive. For instance, copy-pasting from a PDF table often yields garbage.

Here's a blog post walking through one way:

If you've converted the data to image (eg using imagemagick) you could then perform OCR with Tesseract:

thank you, i have managed to convert the PDF into images and output a CSV, how would i go about formating this CSV. e.g. separate the spaces into cells

here my code so far


# Render pdf to png image

img_file <- pdftools::pdf_convert("filepath/test.pdf", format = 'tiff',  dpi = 400)

# Extract text from png image
text <- ocr(img_file)
writeLines(text, "filepath/mydata.csv")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.