Creating labelled data from data and metadata

The background of this problem is that I have a large database of values and
a database of metadata (i.e. labels for the factor variables).

In order to add factor labels from the metadata to the data I am trying to use the labelled function from haven.
This will give the working data with the nice property that I can choose if I want to work with the data as numbers (as stored in the database) or using factors with labels (nice for plotting, readability etc.).

I have thousands of variables so I can not code this per variable. I also want to use this on the server backend in order to create datasets for visualisations, exports to R, SPSS, SAS etc.

A small example shows that I have no problem to solve my problem for one variable:

suppressWarnings(library(tidyverse))

d <- dplyr::tribble(~V1, ~V2, ~V3,
             1, 2, NA,
             2, 99, 'missing',
             1, 2, NA,
             1, 1, NA)

f <- dplyr::tribble(~Var, ~value, ~label,
             'V1', 1, 'a',
             'V1', 2, 'b',
             'V1', 3, 'c',
             'V2', 1, 'yes',
             'V2', 2, 'no',
             'V2', 99, 'missing')

# create a named list for labels
f_list <- f %>% split(.$Var) %>% map(~purrr::set_names(.$value, .$label))

# View f_list
f_list
#> $V1
#> a b c 
#> 1 2 3 
#> 
#> $V2
#>     yes      no missing 
#>       1       2      99

# Set labels on d$V1 and V2
d_new <- d %>% mutate(V1 = haven::labelled(V1, labels = f_list$V1),
                  V2 = haven::labelled(V2, labels = f_list$V2))

# Check result
d_new %>% haven::as_factor()
#> # A tibble: 4 x 3
#>   V1    V2      V3     
#>   <fct> <fct>   <chr>  
#> 1 a     no      <NA>   
#> 2 b     missing missing
#> 3 a     no      <NA>   
#> 4 a     yes     <NA>

# It Works!

Now. I can not figure out how to map over all columns in d without calling them directly in d and f_list
That is to apply the labels in the named list f_list to the correct column in d.

The following will obviusly not work:

lapply(d, haven::labelled, labels = f_list)
#> Error: `x` and `labels` must be same type

Created on 2018-12-02 by the reprex package (v0.2.1)

This can be a good application of tidy evaluation. Function modify_column does the translation for one column and then you can use f_list to map over the columns you need to convert:

suppressPackageStartupMessages(library(tidyverse))
d <- tibble::tribble(~V1, ~V2, ~V3,
                    1, 2, NA,
                    2, 99, 'missing',
                    1, 2, NA,
                    1, 1, NA)

f <- tibble::tribble(~Var, ~value, ~label,
                    'V1', 1, 'a',
                    'V1', 2, 'b',
                    'V1', 3, 'c',
                    'V2', 1, 'yes',
                    'V2', 2, 'no',
                    'V2', 99, 'missing')

# create a named list for labels
f_list <- f %>% split(.$Var) %>% map(~purrr::set_names(.$value, .$label))
d_new <- d %>% mutate(V1 = haven::labelled(V1, labels = f_list$V1),
                      V2 = haven::labelled(V2, labels = f_list$V2))
d_new %>% haven::as_factor()
#> # A tibble: 4 x 3
#>   V1    V2      V3     
#>   <fct> <fct>   <chr>  
#> 1 a     no      <NA>   
#> 2 b     missing missing
#> 3 a     no      <NA>   
#> 4 a     yes     <NA>


modify_column <- function(variable){
  v_sym <- rlang::sym(variable)
  rlang::quo(haven::labelled(!!v_sym, labels = f_list[[variable]]))
}

quos <- purrr::map(names(f_list), modify_column) %>%
  purrr::set_names(names(f_list))

res <- d %>%
  dplyr::mutate(
    !!!quos
  )

res %>% haven::as_factor()
#> # A tibble: 4 x 3
#>   V1    V2      V3     
#>   <fct> <fct>   <chr>  
#> 1 a     no      <NA>   
#> 2 b     missing missing
#> 3 a     no      <NA>   
#> 4 a     yes     <NA>

dplyr::all_equal(d_new, res)
#> [1] TRUE

Created on 2018-12-02 by the reprex package (v0.2.1)

2 Likes

Thousands of thanks for this answer. I think that your answer provides both a good first solution for my problem and a useful guide for my future use of tidy evaluation.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.