pdftools to import pdf into R

rstudio

#1

I have imported in a pdf document of 8 pages using pdf tools. I'm having trouble converting it from a character object into a dataframe where I can change the strings and search for patterns to convert into different columns

library(pdftools)
text <- pdf_text("DownloadJobFinancialReportPDF.pdf")

image

I've then tried to split each line using \r\n which breaks at the end of each line but when I try to save this object which is classed as a character, as a dataframe R closes down. So I'm obviously doing something wrong.

text2 <- text %>%
  str_split(pattern = "\r\n") %>%
  unlist()

glimpse(text2)

View(text2)
chr [1:283] "Job Financial Report  

df %>% as_data_frame(text2)

Any help gratefully appreciated,
Many thanks in advance


#2

I managed to make it work

write(text, file = "data",
write(text, file = "data",append = FALSE, sep = "\r\n")


saveRDS(data, "data.rds")
data <- readRDS("data.rds")

df <-  data.frame(data)

df$data <- as.character(df$data)

am sure there are easier ways though


#3

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: