I am conducting text mining on a bunch of pdf files and have to convert the pdf to txt format first. While I don't have any issue with reading the pdf files, I realize that the generated txt files reserve the same format as the pdf. Therefore, in my txt files, a lot of them have multiple columns on one page. While it is fine for reading, it does not fit for analysis (since when analyzing the text, the software will assume the content from one page are in the same paragraph). I upload a pic for an example, there are two paragraphs on the same page, but the software will not be correctly identified them. Instead, the software will interpret them as one paragraph.
So I am wondering whether there is any way for me (when I read the pdf in R) to properly format the content before converting them into txt. I have searched extensively but cannot find any solution. I will be grateful for any insight/suggestion you might have and thank you very much in advance!