Replace chunks of string based on a lookup table

*edited for clarity

Hello! Thanks for checking out my question. I'd love any guidance on this problem, including direct solutions or resources that could point me in the right direction. I haven't been able to find anything that quite matches this particular situation, and I'm very new to R / a novice at programming in general.

I have a dataset from an online survey on a charity website that allows recipients of donations to shop for resources through a variety of vendors. The survey includes a list of ID numbers that represent vendor(s) that a person shopped with as well as their responses to the questionnaire (represented below by the "Score" column). I need to replace the ID numbers with the name of the vendor, which I have stored in a separate data frame. Because donation recipients can shop through 1 or more vendors, some rows have a single ID while other have multiple. There are about 100 unique rows and I'm hoping there is an efficient way to reference the lookup table I created to replace the IDs with the vendor name(s).

The original data looks something like this
You can see that in the "ID" column there are a sometimes multiple IDs.

ID Score
27 5
7 6
767,27 7
2 3
82,27 5
2,773 1

The lookup data looks like this
Each ID has a unique label

ID label
2 a
7 b
27 c
82 d
773 e
767 f

The output I'm hoping for:
You can see that each ID has been replaced by the label from the lookup table.

ID Score
c 5
b 6
f,c 7
a 3
d,c 5
a,e 1

Sample data to work with:

#original data
og <- data.frame(
  ID = c("27", "7", "767,27", "2","82,27", "2,773"),
  score = c(5, 6, 7, 3, 5, 1)
)

#lookup data
lookup <- data.frame(
  ID = c("2", "7", "27", "82", "773", "767"),
  label = c("a", "b", "c", "d", "e", "f")
)

This is how I've learned to do these replacements one unique value at a time, but there must be a better way!

og <- og %>% 
  mutate(ID = (case_when(
    og$ID == "27" ~ "c",
    og$ID == "7" ~  "b",
    og$ID == "767,27" ~ "f, c",
    og$ID == "2" ~ "a",
    og$ID == "82,27" ~ "d,c",
    og$vid == "2,773" ~ "a,f"
  )))

Thanks so much! I've already benefitted so much from this community and am looking forward to contributing more. And please let me know if I could better format this question so as to be more useful.

Hey @jeursh,

I'm afraid it's not clear what is your goal. Could you elaborate on the output that you imagine? Also, maybe would help if you'd give some context for your data.

Thanks!

Thanks for asking, @Paulius . I made some adjustments, including better table formatting, to hopefully clarify the context and what I'm hoping to achieve. Thanks so much for taking a look!

It seems you have made some mistakes on your desired output but if I understand you correctly you want to do something like this:

library(dplyr)
library(tidyr)

og <- data.frame(
    ID = c("27", "7", "767,27", "2","82,27", "2,773"),
    score = c(5, 6, 7, 3, 5, 1)
)

lookup <- data.frame(
    ID = c("2", "7", "27", "82", "773", "767"),
    label = c("a", "b", "c", "d", "e", "f")
)

og %>%
    mutate(index = row_number()) %>% 
    separate_rows(ID) %>% 
    inner_join(lookup, by = "ID") %>% 
    group_by(index) %>% 
    summarise(ID = paste(label, collapse = ","),
              score = first(score)) %>% 
    select(-index)
#> # A tibble: 6 × 2
#>   ID    score
#>   <chr> <dbl>
#> 1 c         5
#> 2 b         6
#> 3 f,c       7
#> 4 a         3
#> 5 d,c       5
#> 6 a,e       1

Created on 2022-10-08 with reprex v2.0.2

Thanks so much @andresrcs. This does exactly what I need. Really grateful for the help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.