Creating a dataframe using many company reports and splitting it into Company/Reporttype/Date/textpart

Hey guys!
I´d like to create a Dataframe for a couple of reports from company, they are anual reports and for quarter periods. I´ve worked out how to create the dataframe and splitting all the pdf files into the raw text and the data title. So I got two coloumns a the moment. The Information about the companys name, the date of the report and which kind it is, can be found in the title of the single pdf files. Is it possible to extract those informations into single values and add them automaticly to the dataframe and the matching coloumns?

Thanks a lot for you help already!

It may or may not be possible. It depends on what the information content is and how it is encoded.
Probably regex would be used.

You might provide a reprex to facilitate a deeper look.

It sounds like you probably want to check out stringr, but without a sample of your data it's hard to say for sure.

I actually made it work with stringr, thanks a lot for your answer :smiley:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.