R 'memisc' package: why has "as.data.frame()" changed 0/1 values of data.set to 1/2 in data.frame?

Hi all, I'm trying to prepare an SPSS .sav data file with survey data for performing analyses in R.
Now I have an issue that some variables with binary values 0/1 (signifying no/yes) have been transformed unexpectedly.

I have used the memisc package to import the data as a data.set object.

Dset.core <- spss.system.file(file="C://..../data_coded.sav",
                            varlab.file=NULL,
                            codes.file=NULL,
                            missval.file=NULL,
                            count.cases=TRUE,
                            to.lower=FALSE      ## set to FALSE since we want to keep the upper case VarNames
)

This worked all fine, from what I saw from str() and codebook() outputs. One example of a 0/1 variable $AMEVYES (labels are 0=no, 1=yes) is shown here:

str(Dset.core)
Data set with 1999 obs. of 106 variables:
(...)
$ AMEVYES : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 1 ...

I now want to convert the special data.set object created by memisc into a data frame with:

Dset2Df.core <- as.data.frame(Dset.core)

As intended, the nominal 0/1 variable was changed into a factor variable with corresponding levels. But for some strange reason, this procedure also changed the values of the variables, from 0/1 to 1/2, like in this example output:

str(Dset2Df.core)
'data.frame': 1999 obs. of 106 variables:
(...)
$ AMEVYES : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...

Why did this happen, and most importantly, how can I stop this from happening?
Many thanks for a hint!

PS: I'm rather new to R and new to this forum, so please excuse if I missed any best practices when formulating my question.

Doy you have to necessarily use nemisc for any reason? If not, you can try haven package to read .sav files directly as tibbles (tibbles are a modern take on data frames), and if you want, you can use as.data.frame without any problem

I think there's a gap in your understanding of factors.

Factors map your character values to a index. By default the index starts at 1 and goes up to n where n is the number of unique values in your field.

It's hard to know without having some dummy data, but I don't think there's any conversion from 0/1 happening. I think you are seeing R map the values "Yes" and "No" to a factor with levels "Yes" and "No" with underlying index of 1, 2.

Intro to factors:
http://monashbioinformaticsplatform.github.io/2015-09-28-rbioinformatics-intro-r/01-supp-factors.html

Thanks to all! I actually had assessed both package descriptions of 'haven' and 'memisc'. Then I selected the latter, because I was impressed with their functionalities around defining/manipulating various label types of survey data sets, possibility to generate a codebook() etc. - like what I know from traditional social science statistics software.
But you are right, if I'm not able to solve this problem above, I will likely switch to 'haven' to complete my work. Also because it's comfortable to use as a built-in RStudio import manager for .sav files.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.