Create a new variable conditioned to an external list

Hello,

I am having some trouble creating a new variable in R in a dataframe conditioned to an external list of values:

# Sample dataframe 

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, 23400, 26800)
color <- c('blue', `green`, `red`)

data.frame(employee, salary, color)

#List of colors and categories 
color_categories <- list( cold = c("blue", "green"), warm = c("red", "orange"))

I want to create a new variable in the dataframe with the color category depending on the color chosen by each employee, so I want to get something like this result:

    employee  salary   color  category
1   John Doe  21000    blue   cold
2 Peter Gynn  23400    green  cold
3 Jolie Hope  26800    red    warm

Of course, this example uses very simple data, but I have to solve this problem with a much larger dataframe and a 220 codes list organized in categories to classify the dataframe observations...

Thank you very much!!!

Hi there,

With so many categories you will probably have to look at a proper join. I would suggest doing something like this:

library(tidyverse)

df1 <- tibble(x = 1:5)
df2 <- tibble(x = c(1, 2, 3), y = c("blue", "green", "red"))
df1 %>% left_join(df2)
#> Joining, by = "x"
#> # A tibble: 5 x 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 blue 
#> 2     2 green
#> 3     3 red  
#> 4     4 <NA> 
#> 5     5 <NA>

Created on 2021-10-25 by the reprex package (v2.0.0)

You can just change the columns to be whatever it is you want so you can change the numeric to say your colours and then have the cold or warm associated with it and then simply do a join which will perform it for all your cases and you only have to pass that initial lookups. Let me know if this helps?

Many thanks :slight_smile: It worked pretty well!

1 Like

Utilising unnest_longer and pivot_longer, you can left_join to achieve the desired result:

library(tidyverse)

frame <- tibble(
  employee = c('John Doe','Peter Gynn','Jolie Hope'),
  salary = c(21000, 23400, 26800),
  color = c('blue', 'green', 'red')
)

color_categories <- list( cold = c("blue", "green"), warm = c("red", "orange"))

frame %>% 
  left_join(unnest_longer(color_categories, col = 2) %>% pivot_longer(cols = c("cold", "warm")), by = c("color" = "value"))
#> # A tibble: 3 × 4
#>   employee   salary color name 
#>   <chr>       <dbl> <chr> <chr>
#> 1 John Doe    21000 blue  cold 
#> 2 Peter Gynn  23400 green cold 
#> 3 Jolie Hope  26800 red   warm

Created on 2021-10-25 by the reprex package (v2.0.1)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.