Search PDF's extract lines with keyword and print Not available if keyword not found

Link for input PDF's

https://drive.google.com/drive/folders/1dcgDpfiVjMTGmYSRGnQA65YjZzv0AwXL?usp=sharing

Code goes through all the PDF files in the path and creates a corpus and separates each line with a separator. Next it checks through all the lines with the given search list and pulls that line and tells if the search word is present in the PDF or not (a <- sapply(unlist(Table_search), grepl, x = tablelines)).

setwd("D:")
tables<- list.files(pattern='pdf
```)
tablecorpus <- Corpus(URISource(tables),
                      readerControl = list(reader=readPDF))

tospace <-content_transformer(function(x, pattern) gsub(pattern, " ",x))
tablecorpus <- tm_map(tablecorpus, tospace, "\r")
Table_Filenames <-DublinCore(tablecorpus,"id")
lapply(tables, function(x) strsplit(pdf_text(x), "\n")[[1]]) -> tablelines
tablelist <- unlist(tablelines)  %>% str_split("\n")
Table_search        <- list("Table 14", "Source Data:","VERSION")
a <- sapply(unlist(Table_search), grepl, x = tablelines)

image

I want the code to print the actual line where ever it finds the keyword in the PDF file like shown in image 2.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.