So I have a dataset which has ~250 variables and ~500K observations. I also have an accompanying data dictionary which specifies the value-label definitions for each nominal/ordinal variable in the data set. The dictionary is organized in three columns which specify the variable name, the possible values, and the labels for the values. The data set only contains the number values, without labels.
I have generated some fake data which captures the problem. I am drawing a blank for a good way to automatically make use of the codebook/dictionary to automate
factoring these variables and using the labels. Any ideas?
library(dplyr) data <- tribble( ~var, 1, 2, 3, 4 ) dict <- tribble( ~var_name, ~value, ~label, 'var', 1, 'A', 'var', 2, 'B', 'var', 3, 'C', 'var', 4, 'D' )