I use R to analyze PDF documents. I face a problem when I try to read a PDF document with several columns. The document is read line by line and that make a mixture of the text. I would like to be able to read column after column can anyone help me please?
Found an interesting solution to a similar question on StackOverflow, which uses the pdftools package, and stores the most frequent space values in the pdf pages as a vector to slice a single page (I'm not describing it as well as the author, and it'll be easier to see with the code):
If you're not familiar with pdftools, there's a nice intro vignette here: