Sorting Company Names into the right sector to add as new coloumn into dataframe

ricdob · March 3, 2022, 10:43am

Hey there,
I´ve created a dataframe for work which includes a bunch of different company. First cloumn is the name of the company, so now I want to add a second one which contains the sector this company appears in. I´ve already created a list with the kind of sector and which company does fit into that specific one. For Example sector "Industrials" includes "3M|Abbott|....". I was looking for a stringr solution, but can´t get it work. Is there any easy way to match my first coloumn with my list of sectors and letting it add into a second coloumn? First: 3M Second: Industrial and so on with every company.
Already thanks for every answer!

GreyMerchant · March 3, 2022, 10:48am

Can you provide a reprex (FAQ: How to do a minimal reproducible example ( reprex ) for beginners )? It is hard to know exactly what your data looks like and then we can see if the solution works on a subset of your data.

ricdob · March 3, 2022, 10:55am

####sector table###

sector_data<-list(
industrials=c("3M","Boeing","Caterpillar","Deere","GeneralElectric","Honeywell","LockheedMartin","Raytheon","UnionPacificCoporation","UPS"),
health_care=c("Abbott","AbbVie","Amgen","Anthem","Bristol","Cigna","CVSHealth","Danaher","EliLilly","GileadSciences","IntuitiveSurgical","Johnson&Johnson","MedTronic","Merck","Pfizer","Stryker","Thermo_Fischer_Scientific","UnitedHealthGroup"),
cummunication_services=c("Alphabet","AT&T","Charter","Comcast","Facebook","Netflix","T-Mobile","Verizon","WaltDisney"),
information_technology=c("Accenture","Adobe","Adobe","AdvancedMicroDevices","Apple","AppliedMaterials","Broadcom","Cisco","Fidelity","Intel","Intuit","Lam","Mastercard","MicronTechnology","NVIDIA","OracleCorporation","PayPal","Qualcomm","Salesforce","Servicenow","TexasInstruments","Visa"),
consumer_discretionary=c("Amazon","Booking","HomeDepot","Lowes","McDonalds","Nike","Starbucks","Target","Tesla","TJX"),
utilities=c("DukeEnergy","Nexteraenergy"),
financials=c("AmericanExpress","BerkshireHathaway","BlackRock","CharlesSchwab","Citi","JP","MorganStanley","S&P","WellsFargo"),
materials=c("Linde"),
real_estate=c("CrownCastle","Prologis"),
consumer_staples=c("Altria","CocaCola","Costco","Mondelez","P&G","PepsiCo","PhilipMoris","Walmart"),
energy=c("Chevron","Exxon"))

thats the code I got so far, sorry it´s a bit messy, don´t know how to post it properly tbh.

ricdob · March 3, 2022, 10:58am

and thats my data.frame so far

xvalda · March 3, 2022, 3:26pm

Hi @ricdob ,

I cannot replicate all of your code since it uses your local files.
I can only use your sector_data list.
But you can try to adapt the code below:

sector_data<-list(
  industrials=c("3M","Boeing","Caterpillar","Deere","GeneralElectric","Honeywell","LockheedMartin","Raytheon","UnionPacificCoporation","UPS"),
  health_care=c("Abbott","AbbVie","Amgen","Anthem","Bristol","Cigna","CVSHealth","Danaher","EliLilly","GileadSciences","IntuitiveSurgical","Johnson&Johnson","MedTronic","Merck","Pfizer","Stryker","Thermo_Fischer_Scientific","UnitedHealthGroup"),
  cummunication_services=c("Alphabet","AT&T","Charter","Comcast","Facebook","Netflix","T-Mobile","Verizon","WaltDisney"),
  information_technology=c("Accenture","Adobe","Adobe","AdvancedMicroDevices","Apple","AppliedMaterials","Broadcom","Cisco","Fidelity","Intel","Intuit","Lam","Mastercard","MicronTechnology","NVIDIA","OracleCorporation","PayPal","Qualcomm","Salesforce","Servicenow","TexasInstruments","Visa"),
  consumer_discretionary=c("Amazon","Booking","HomeDepot","Lowes","McDonalds","Nike","Starbucks","Target","Tesla","TJX"),
  utilities=c("DukeEnergy","Nexteraenergy"),
  financials=c("AmericanExpress","BerkshireHathaway","BlackRock","CharlesSchwab","Citi","JP","MorganStanley","S&P","WellsFargo"),
  materials=c("Linde"),
  real_estate=c("CrownCastle","Prologis"),
  consumer_staples=c("Altria","CocaCola","Costco","Mondelez","P&G","PepsiCo","PhilipMoris","Walmart"),
  energy=c("Chevron","Exxon"))

library(tidyverse)

#transform your list into a dataframe:
sector_data <- sector_data %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  rename('industry' = 'name', "company" = 'value')

#create bogus sample data (I use some company names from sector_data)
set.seed(6546)
sample_data <- tibble(
  company = sample(sector_data$company, 20), 
  sales = rnorm(20, 10000, 8000)
)

#then you can do a left_join
sample_data %>% left_join(sector_data, by = "company")

Hope it helps.

ricdob · March 3, 2022, 4:29pm

xvalda:

library(tidyverse)

#transform your list into a dataframe:
sector_data <- sector_data %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  rename('industry' = 'name', "company" = 'value')

#create bogus sample data (I use some company names from sector_data)
set.seed(6546)
sample_data <- tibble(
  company = sample(sector_data$company, 20), 
  sales = rnorm(20, 10000, 8000)
)

#then you can do a left_join
sample_data %>% left_join(sector_data, by = "company")

Thanks a lot it fixed some of my issues, but not the main part. Maybe I just had some problems writing it down specificly. I´d like to add the coloumn industry, which you did with the leftjoin command, into the main Data.Frame from the beginning. The one that you couldn´t use because of the missing pdf data files I´m using. Also the one which I uploaded as a JPG. Just want to everything have in one DF.
Sorry I´m kinda new to R, not knowing specific vocabulary or commands

xvalda · March 3, 2022, 5:30pm

Don't be sorry, it takes time to get up to scratch, but you'll be surprised it comes quite quickly.

I'm not sure by what you mean with "from the beginning".
Or maybe I forgot to assign the changed dataframe (with added column) to the dataframe itself.
If so, try this:

#same as previously, transform list into a dataframe (I just uppercased the column names to match the example in your screenshot)
sector_data <- sector_data %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  rename('Industry' = 'name', "Company" = 'value')

#this time you assign the changed Table to the Table object itself
Table <- Table %>% left_join(sector_data, by = "Company")

But let me know if I didn't get your question right.

If you plan on doing more with R, you can check this must-read book: https://r4ds.had.co.nz/

And this one has a very smooth learning curve: R for Excel Users

ricdob · March 4, 2022, 7:51am

xvalda:

sector_data <- sector_data %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  rename('Industry' = 'name', "Company" = 'value')

#this time you assign the changed Table to the Table object itself
Table <- Table %>% left_join(sector_data, by = "Company")

Thanks a lot, I had to adjust my main table a little but, but managed to get it work with your code!
Also thanks for the fourther information about R, gonna read the book for sure!

system · March 11, 2022, 7:51am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.