Recoding multiple columns using "long" data

I have collected a dataset with coded values and a decoding table with the actual values, and now I need to decode the dataset.

Here's an example to illustrate my challenge:

Coded Dataset

## # A tibble: 3 x 4
##   FRUIT   NUM  NAME ORDINAL
##   <dbl> <dbl> <dbl>   <dbl>
## 1    1.    1.    1.      1.
## 2    2.    2.    2.      2.
## 3    3.    3.    3.      4.

Decoding Table

## # A tibble: 13 x 3
##    VAR      CODE VALUE 
##    <chr>   <dbl> <chr> 
##  1 FRUIT      1. apple 
##  2 FRUIT      2. banana
##  3 FRUIT      3. cherry
##  4 NUM        1. one   
##  5 NUM        2. two   
##  6 NUM        3. three 
##  7 NAME       1. Adam  
##  8 NAME       2. Barb  
##  9 NAME       3. Chad  
## 10 ORDINAL    1. first 
## 11 ORDINAL    2. second
## 12 ORDINAL    3. third 
## 13 ORDINAL    4. fourth

How can I transform my dataset to return the following result?

Desired Result

## # A tibble: 3 x 4
##   FRUIT  NUM   NAME  ORDINAL
##   <chr>  <chr> <chr> <chr>  
## 1 apple  one   Adam  first  
## 2 banana two   Barb  second 
## 3 cherry three Chad  fourth

The actual dataset has approximate 50 columns, so I'm looking for a solution that avoids recoding each column one at a time.

Thanks!

Reprex ``` r library(tidyverse)

(tbl <-
tribble(
~ FRUIT, ~ NUM, ~ NAME, ~ORDINAL,
1, 1, 1, 1,
2, 2, 2, 2,
3, 3, 3, 4
))

# A tibble: 3 x 4

FRUIT NUM NAME ORDINAL

1 1. 1. 1. 1.

2 2. 2. 2. 2.

3 3. 3. 3. 4.

(lookup <-
tribble(
~ VAR, ~ CODE, ~ VALUE,
"FRUIT", 1, "apple",
"FRUIT", 2, "banana",
"FRUIT", 3, "cherry",
"NUM", 1, "one",
"NUM", 2, "two",
"NUM", 3, "three",
"NAME", 1, "Adam",
"NAME", 2, "Barb",
"NAME", 3, "Chad",
"ORDINAL", 1, "first",
"ORDINAL", 2, "second",
"ORDINAL", 3, "third",
"ORDINAL", 4, "fourth"
))

# A tibble: 13 x 3

VAR CODE VALUE

1 FRUIT 1. apple

2 FRUIT 2. banana

3 FRUIT 3. cherry

4 NUM 1. one

5 NUM 2. two

6 NUM 3. three

7 NAME 1. Adam

8 NAME 2. Barb

9 NAME 3. Chad

10 ORDINAL 1. first

11 ORDINAL 2. second

12 ORDINAL 3. third

13 ORDINAL 4. fourth

(result <-
tribble(
~ FRUIT, ~ NUM, ~ NAME, ~ORDINAL,
"apple", "one", "Adam", "first",
"banana", "two", "Barb", "second",
"cherry", "three", "Chad", "fourth"
))

# A tibble: 3 x 4

FRUIT NUM NAME ORDINAL

1 apple one Adam first

2 banana two Barb second

3 cherry three Chad fourth


</details>

How about a left join? First you gather() your coded data to long form, left_join() it with the lookup table, then back to wide with spread().

library(tidyverse)

tbl <-
    tribble(
      ~ FRUIT, ~ NUM, ~ NAME, ~ORDINAL,
      1, 1, 1, 1,
      2, 2, 2, 2,
      3, 3, 3, 4
    )

lookup <-
    tribble(
      ~ VAR, ~ CODE, ~ VALUE,
      "FRUIT", 1, "apple",
      "FRUIT", 2, "banana",
      "FRUIT", 3, "cherry",
      "NUM", 1, "one",
      "NUM", 2, "two",
      "NUM", 3, "three",
      "NAME", 1, "Adam",
      "NAME", 2, "Barb",
      "NAME", 3, "Chad",
      "ORDINAL", 1, "first",
      "ORDINAL", 2, "second",
      "ORDINAL", 3, "third",
      "ORDINAL", 4, "fourth"
    )
tbl %>%
  gather(VAR, value = CODE) %>%
  left_join(lookup, by = c("VAR", "CODE")) %>%
  select(-CODE) %>%
  group_by(VAR) %>%
  mutate(row_id = 1:n()) %>%
  spread(VAR, VALUE) %>%
  select(-row_id)
#> # A tibble: 3 x 4
#>   FRUIT  NAME  NUM   ORDINAL
#>   <chr>  <chr> <chr> <chr>  
#> 1 apple  Adam  one   first  
#> 2 banana Barb  two   second 
#> 3 cherry Chad  three fourth

Created on 2018-03-06 by the reprex package (v0.2.0).

4 Likes

That solves it.

For some reason the group_by %>% unique_id %>% spread pattern is really unintuitive to me, so every time I reach this point in a project I get stuck.

Thanks for providing the solution and hopefully this post will help me (and others) avoid this sticking point in the future.

For sure -- it seems like if you can gather() from wide to long you should be able to spread() back to your original wide data without adding anything. There's a good post here that helps to clarify why spread() needs unique ids to work:

Also some more discussion here about the same issue:

2 Likes