Help with code to convert PDF into Excel (.xlsx)

Hi,

I am trying to convert a PDF file into excel. I am providing screenshots of the first two pages of the PDF and the result that I am looking for in excel format. The first two pages of the PDF have 25 row entries which are shown in the excel format in the third picture. The first page of the PDF has some heading (info on the firm) that is not followed on the following pages of the PDF. I am aware of pdftools and pdftables package on R but the pdftables package has a limit to the number of pages that one can convert free.

I used the following code mooched from R StackOverflow and didn't get the demarcation of columns as I expect (the output I am looking for is Figure 3 - the excel format). I believe there is a mistake in the way I have specified tx2 and tx3 below. Any help in this conversion would be greatly appreciated! Thanks

library(pdftools)
library(stringr)
tx <- pdf_text("Charges.pdf")
tx2 <- unlist(str_split(tx, "[\r\n]+"))
tx3 <- str_split_fixed(str_trim(tx2), "\s{2,}", 5)
df <- as.data.frame(tx3)
library(writexl)
write_xlsx(df, col_names = TRUE, format_headers = TRUE,"charges.xlsx")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.