How to add a new column of data that corresponds to data in another column (newbie)

Beach_Life · December 5, 2022, 11:35pm

Hello,
I would appreciate any help with the following.
I have a dataset that has a column of numbers that represent countries. The countries, however, are not shown, just the numbers.
I want to add a column right next to this column that will take the numbers from the first column and show what country that number represents. There are only five (5) different numbers, but the current column (with the numbers) have the numbers showing up at random, not grouped or in any particular order. I want to be able to create a brand new column that matches say, USA, whenever there is a #2, or Sweden whenever there is a #3. Would anyone provide some guidance on this? I know what country names correspond to the different numbers, I just need a way to get them into my dataset in R. Any help is greatly appreciated. Thanks.

woodward · December 6, 2022, 12:07am

I usually use a named vector for this

country <- c("1" = "Honduras", "2" = "Taiwan", "3" = "Belize", "4" = "Denmark", "5" = "New Zealand")
mydata$country_name <- unname(country[mydata$country_code])

You could also do the same thing if you had the code and names in a dataframe:

country <- data.frame(code = 1:5, name = c("Honduras", "Taiwan", "Belize", "Denmark", "New Zealand"))
mydata$country_name <- country$name[match(mydata$country_code, country$code)]

EconProf · December 6, 2022, 12:32am

I have been using the case_when( ) function from the dplyr package

library(tidyverse)

country_data <- data.frame(country_code = c(1, 2, 1, 3),
                           widgets = c(123, 65, 144, 24)
                           )
country_data %>%
  mutate(country_name = case_when(
    country_code == 1 ~ "USA",
    country_code == 2 ~ "CAN",
    TRUE ~ "Other")
  ) %>%
  select(-country_code)
#>   widgets country_name
#> 1     123          USA
#> 2      65          CAN
#> 3     144          USA
#> 4      24        Other

^{Created on 2022-12-05 with reprex v2.0.2}

Beach_Life · December 6, 2022, 5:33pm

Thanks for your response. I tried your way and also the code suggestion from EconProf. And I think I screwed it up even further. Let me start from the top and explain what I have and what I've done. Hopefully that will give you an idea of where you think I need to fix something.
My original df is called Shades. It has ten columns, the last of which is called 'group'. This column contains numbers 0 - 7 that represent information, including countries (four countries: USA, Nigeria, India and Japan). The other three bits of data are relevant but not countries. (Pls note that in my original post, I said there were five numbers. There actually eight). I wanted to have my "shades" dataset to include data corresponding to the numbers in the 'group' column, in a separate column right next to the 'group' column. So that when there is a "3" in the 'group' column, R places "USA" right next to it in the new column (for example).

Based on your feedback and the feedback from EconProf, I decided to place the 'group' codes and the "country" designation into a df. It would contain two columns labeled 'group' and 'code_countries', and eight rows. I called that df "country_codes". Here is the code I used to se that up:

group<-c(0:7)
code_countries<-c("Fenty_Beauty_Foundation_only", "Make_Up_Forevers_Ultra_Foundation_only", "US_Best_sellers", "BIPOC_Founders", "BIPOC_Other_Founders", "Nigerian_Best_Sellers", "Japanese_Best_Sellers", "Indian_Best_Sellers")
country_codes<-data_frame(group, code_countries)
View(country_codes)

#That set my country_codes df up with no problem. I then wanted to have it "match" up against the 'group' column data in the "shades" df. I used your feedback and the feedback from EconProf and did the following:

country_codes %>%
mutate(shades$country_name = case_when(
group== 0 ~ "Fenty_Beauty_Foundation_only"
group== 1 ~ "Make_Up_Forevers_Ultra_Foundation_only",
group== 2 ~ "US_Best_sellers",
group== 3 ~ "BIPOC_Founders",
group== 4 ~ "BIPOC_Other_Founders",
group== 5 ~ "Nigerian_Best_Sellers",
group== 6 ~ "Japanese_Best_Sellers",
group== 7 ~ "Indian_Best_Sellers",
TRUE ~ "Other")
) %>%
select(-group)

#I got the following error msg and the "shades" df remained unchanged.

Error: unexpected '=' in:
"country_codes %>%
mutate(shades$country_name ="

Any clue as to what I'm doing wrong? Am I making this complicated for no reason? Any help is greatly appreciated.

Beach_Life · December 6, 2022, 7:42pm

Finally figured it out:
country <- data.frame(code = 0:7, name = c("Fenty_Beauty_Foundation_only
", "Make_Up_Forevers_Ultra_Foundation_only", "US_Best_sellers", "BIPOC_Founders", "BIPOC_Other_Founders", "Nigerian_Best_Sellers", "Japanese_Best_Sellers", "Indian_Best_Sellers"))
shades$country_name <- country$name[match(shades$group, country$code)]

Thanks so much for your help!

Beach_Life · December 6, 2022, 7:43pm

Finally figured it out:
country <- data.frame(code = 0:7, name = c("Fenty_Beauty_Foundation_only
", "Make_Up_Forevers_Ultra_Foundation_only", "US_Best_sellers", "BIPOC_Founders", "BIPOC_Other_Founders", "Nigerian_Best_Sellers", "Japanese_Best_Sellers", "Indian_Best_Sellers"))
shades$country_name <- country$name[match(shades$group, country$code)]

Thanks so much for your help!

EconProf · December 6, 2022, 8:36pm

Just for fun, you might see if this works as well.

library(dplyr)

shades %>%
mutate(group_name = case_when(
group== 0 ~ "Fenty_Beauty_Foundation_only"
group== 1 ~ "Make_Up_Forevers_Ultra_Foundation_only",
group== 2 ~ "US_Best_sellers",
group== 3 ~ "BIPOC_Founders",
group== 4 ~ "BIPOC_Other_Founders",
group== 5 ~ "Nigerian_Best_Sellers",
group== 6 ~ "Japanese_Best_Sellers",
group== 7 ~ "Indian_Best_Sellers",
TRUE ~ "Other")
)

Beach_Life · December 6, 2022, 8:45pm

Thanks. Will do. I'm finding there are multiple ways of doing this in R, which is both good but also frustrating.

I'm having an issue with grouping and finding the median of a list of numbers. I'll post under a different topic but if you have time, I would greatly appreciate your feedback. Thanks again for all your help. Your guidance has helped me a great deal.

woodward · December 6, 2022, 9:52pm

This is certainly a feature of R that there are many ways to do things. In contrast to Python which is designed to promote a single way to do everything: PEP 20 – The Zen of Python | peps.python.org

It might be worth your while to work through the book "R For Data Science" which is an excellent introductory tutorial to data science and to R using the popular "tidyverse" approach.

Beach_Life · December 6, 2022, 10:03pm

Thank you. I'll look into it!

system · December 13, 2022, 10:03pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.