Posit Community

Extract whole sentences including a word and fiting it into a dataframe

ricdob February 21, 2022, 11:14am 1

Hey folks!
I managed to create a new dataframe for a project of mine. Trying to work with the text passages from company reports and fitting them nicely into my dataframe. Now my problem. I´m trying to extract all the sentences from several reports which include the word privacy. I managed to extract the sentences from the text and wanted to fit them into a new coloumn of my dataframe now. Getting the error of different amount of rows with it rn. Is there any better solution so I can match the extracted sentences into the right place too? Like 3m, annual, 2016, 4 sentences with privacy in one row and so on
This is my code so far for the dataframe, works nicely:
Tabelle<-data.frame(Company= str_extract(filestextdataframe$Document, "3M|Abbott|AbbVie|Accenture|Adobe|ADP|AdvancedMicroDevices|alphabet|Altria|amazon|AmericanExpress|Amgen|Anthem|apple|AppliedMaterials|AT&T|BerkshireHathaway|BlackRock|Boeing|Booking|Bristol-MyersSquibb|Broadcom|Caterpillar|CharlesSchwab|Charter|Chevron|Cigna|Cisco|Citi|CocaCola|Comcast|Costco|CrownCastle|CVSHealth|Danaher|Deere|DukeEnergy|EliLilly|Exxon|facebook|Fidelity|GeneralElectric|GileadSciences|HomeDepot|Honeywell|Intel|Intuit|IntuitiveSurgical|Johnson&Johnson|JP|Lam|Linde|LockheedMartin|Lowe's|Mastercard|McDonalds|Medtronic|Merck|MicronTechnology|Mondelez|MorganStanley|Netflix|Nexteraenergy|Nike|NVIDIA|OracleCorporation|P&G|PayPal|PepsiCo|Pfizer|PhilipMoris|Prologis|Qualcomm|Raytheon|S&P|Salesforce|Servicenow|Starbucks|Stryker|T-Mobile|Target|Tesla|TexasInstruments|Thermo_Fisher_Scientific|TJX|UnionPacificCoporation|UnitedHealthGroup|UPS|Verizon|Visa|Walmart|WaltDisney|WellsFargo"),
Report=str_extract(filestextdataframe$Document, "annual|transcript|quarterly"),
Quartal=str_extract(filestextdataframe$Document, "annual|q1|q2|q3|q4"),
Date=str_extract(filestextdataframe$Document, "2015|2016|2017|2018|2019|2020|2021"),
Text=sapply(files, function(x) paste0(pdf_text(x), collapse = " ")))
Thats the code for extracting the sentences:
Sentences_privacy(grep("privacy|Privacy", unlist(strsplit(filestextdataframe$text, '(?<=\.)\s+',
perl=TRUE)), value=TRUE)

edgararuiz February 21, 2022, 8:44pm 2

Hi, maybe try using the tidytext package for this? 1 The tidy text format | Text Mining with R

system Closed March 14, 2022, 8:45pm 3

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.