Haven seems to recode/change variable values when exporting labeled data to SPSS and Stata

Good day,

I'm trying to export a labeled data set to SPSS and Stata using the Haven package. When I open the data in SPSS, the labels export correctly, however, the underlying values are recoded. For example, 0 is recoded to 1.

The result that I expect is

0 = "Never [0]"
1 = "Monthly or less [1]",
2 = "2 to 4 times a month [2]",
3 = "2 to 3 times a week [3]",
4 = "4 or more times a week [4]"

However, these are the values and labels I'm getting in SPSS for SPSS and Stata datasets.

1 = "Never [0]"
2 = "Monthly or less [1]",
3 = "2 to 4 times a month [2]",
4 = "2 to 3 times a week [3]",
5 = "4 or more times a week [4]"

Here is the R code I've used:

library(Hmisc)
audit$a_audit_1_1 =  factor(audit$a_audit_1_1, levels = c(0, 1, 2, 3, 4))
levels(audit$a_audit_1_1) = c(
"Never [0]",
"Monthly or less [1]",
"2 to 4 times a month [2]",
"2 to 3 times a week [3]",
"4 or more times a week [4]")
label(audit$a_audit_1_1)="How often do you..."

library(haven)
write_dta(audit, "audit.dta")

Is there a way to for Haven to keep the original codes?

Thank you very much for your help.

André

It seems you also posted this as an issue in the haven repo:

This is fine, but we ask that you please link to the thread so others can follow it/we avoid duplication of effort if you get an answer over there. See our FAQ re. cross-posting below.

Thanks

1 Like

Thanks, @mara. I appreciate you having a look at it and thanks for the feedback.

1 Like

It doesn't seem to be a problem with Haven. The problem was replicated using foreign::write.dta(). According to this introduction to variable labels and expss:

The usual way to connect numeric data to labels in R is in factor variables. However, factors miss important features which the value labels provide. Factors only allow for integers to be mapped to a text label, these integers have to be a count starting at 1 and every value need to be labelled. Also, we can’t calculate means or other numeric statistics on factors.

Please note, I've also posted this question on the following forums:

  1. StackOverflow
  2. GitHub

Sorry, I could not add the links to the threads on StackOverflow and GitHub:

Sorry, new users can only put 2 links in a post.

The tile is the same:

Haven seems to recode/change variable values when exporting labeled data to SPSS and Stata

This is the SO link

2 Likes

When you create a factor, R doesn't remember the "old" values. Factors are really just dolled-up integer vectors. Each integer value is an index for the matching level; they have nothing to do with the old values.

When you, or haven, use as.integer(myfactor) or as.numeric(myfactor), the plain vector of indices is returned. It will always be integers between 1 and the number of levels.

To solve your problem, you'll need to maintain a "lookup" vector mapping labels to codes. Then use it convert the column to a labelled vector before exporting:

library(haven)

lookup <- lookup <- c(
  "Never [0]"                  = 0,
  "Monthly or less [1]"        = 1,
  "2 to 4 times a month [2]"   = 2,
  "2 to 3 times a week [3]"    = 3,
  "4 or more times a week [4]" = 4
)

audit$a_audit_1_1 =  factor(
  audit$a_audit_1_1,
  levels = lookup,
  labels = names(lookup)
)

# do other tasks in R...
audit$a_audit_1_1 <- labelled(
  x      = lookup[as.character(audit$a_audit_1_1)],
  labels = lookup
)
write_dta(audit, ...)
3 Likes

I really appreciate the suggestion. Thank you.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.