Changing all values in one data frame based off key values in another

Hello,

So I have a problem that's stumping me. I have one large data frame that is a series of questions. These questions each have a set number of responses. I've manually decided to assign those responses certain numeric values. What I'd like to do now is map those values from one data frame onto the larger data frame. In other words, one question may have five unique answers to which I've assigned five unique numerical values. I'd like to use that key to map onto the larger data frame that may include thousands of those unique responses. Then you'd have a column of strings replaced by numeric values kept in a smaller dataset.

Is this possible or is there another route I should take? Also, haven't posted before so I wasn't sure how to get reproducible results in. I've included screenshots to help.


Hi @CianStryker,

Try this:

dplyr::left_join(Example_Full_Set, Example_Key)

Hi @mattwarkentin thank you so much! I totally forgot about anti_join and feel foolish now.

Would you happen to have a good strategy for making this scalable? I have 79 separate columns with 79 separate keys that I'd like to map. Is there a way to recreate this for all 79? And is there a way to make that agnostic to column name? Ideally I'd like this to be a reproducible product with other columns and data.

Thanks again,
Cian

It's definitely doable but hard to offer specific advice without knowing the structure of your data. I think purrr::reduce() is going to be your best bet for doing iterative joins and ending up with a single merged data frame.

Something like:

# A list of dataframes like `Example_Key` for each of 78 variables
list_of_key_dfs <- list(...)

final_df <- purrr::reduce(list_of_key_dfs, dplyr::right_join, y = Example_Full_Set)

Playing around with that idea now. Theoretically it should work, but my error seems to be that it wants columns not a list or a dataframe. In terms of the structure, I'm attaching a screenshot below that essentially captures the structure. There are 3 question columns each having a column adjacent where I've assigned a numeric value. I'd like to map this somehow onto the main data set like "Example_Full_Set" but it has all three of these columns with the string inputs.

Now I'm not opposed to taking each of these and downloading them as a csv and then putting them all into a list, but that's not really working out yet.

Thanks,
Cian

I think the format of the data in the image is just a bit of an awkward way to store the data. Is the spreadsheet 156 columns wide (78 variables * 2 columns each)?

A more tidy/efficient way to work with these data would be a "long" data format that has three columns: variable, key, and value. Such as:

tibble::tribble(
  ~variable, ~key, ~value,
  "how would you describe your...", "Active", 10,
  "how would you describe your...", "Very active", 7,
  "how would you describe your...", "Not very active", 5 
)
#> # A tibble: 3 × 3
#>   variable                       key             value
#>   <chr>                          <chr>           <dbl>
#> 1 how would you describe your... Active             10
#> 2 how would you describe your... Very active         7
#> 3 how would you describe your... Not very active     5
#> ...

This could then be manipulated to do the iterative joins as described previously. If you can provide a reproducible example of the data I would be happy to help further, but otherwise I can only offer suggestions.

Is it possible to bring csvs into here? I'm not sure how to produce a reproducible example without the csvs.

I am not sure if you can upload CSV files on this forum and whether there is a file size limit. If you have the data loaded into R, you can use dput(object) to produce a snippet of code that can be copied and pasted here for easy data sharing. But it is your responsibility about whether the data can be shared or not.

You can place the data inside code fences (i.e. ```) to format is as code on this forum.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.