Possible to save variable labels with data frame?

I am currently working on a project that involves reading a lot of Stata files (282 at last count). I have no problems with automating that part, but since I need to split the project up into multiple files, I need to save the data frames I generate for later analysis. My problem is whether it is possible to save the variable labels that come with the original data. Currently, I use write_rds but that does not keep the variable label. I guess I could save them back to a Stata file, but that somehow seems like a backward way of going about it. Any suggestions would be much appreciated.

I don't know about Stata files, but you could use purrr with map_dfr() and setnames().

2 Likes

Thank you, but I already have the variable names and the labels thanks to haven. What I am looking for is some way of saving the data frame, including the labels, to my computer so I can have another script load the data frame and the labels would still be there. Using write_rds, only the variable names are saved.

Ok, I misunderstood what you were after.

If you save("yourdata.RData") and load("yourdata.RData") then the names of the dataframes are maintained.

Otherwise I'm not sure what you mean by labels.

1 Like

When you say "labels" do you mean metadata that provides additional information about each column in a data frame? If so, you might check out the label function in the Hmisc package. label creates a label attribute to a data frame that can store a metadata label for each column. There's also a contents function for storing general metadata about an object. These attributes become part of the data frame and persist when you write/read the data as rds files.

I haven't really used this type of metadata feature before, but I think there are a few other packages that have similar features. You can also create attributes for any R object if you want to write your own function(s) for custom metadata storage. The Advanced R book has a section on this.

Thank you @martin.R and @joels. It turns out that there is a known issue with bind_rows where it sometimes strips data frame attributes like variable labels. Here, I mean "variable labels" as in the Stata sense, a description of the variable in addition to the variable name. In Stata, they allow for a convenient way to make, for example, table and graph legends. I am still adjusting to R when it comes to specific features that I relied on in Stata.

The package [sjlabelled](https://cran.r-project.org/web/packages/sjlabelled/index.html) has a command for getting variable and factor labels, so I am probably just going to save those and then reapply then once I am all done with bind_rows.

Thank you both for the suggestions.