RStudio will not import categorical variables from .dta files

I need help! I'm new to R and I'm going to go crazy.
I have a stata dataset (.dta file), and it has both numerical and categorical variables. But when I import it into R, it replaces everything that is not numbers (eg. "Male/Female", "Low/High") with numbers. I've been searching for hours and do not understand what the problem is.

Any tip would be greatly appreciated.

how are you importing it ? i.e. what code do you use ?

Thank you for responding! So I went to RStudio, clicked on "import from stata" and found the file (I did nothing else). This is the code it's used:

library(haven)
X1999class621 <- read_dta("~/Downloads/R:Stata practice/STATA Lab-20201222-3/1999class621.dta")
View(X1999class621)

I don't have your data to consider if there is anything unusual about it.
Lets first validate that a dta file can be read and include categorical data, there is an example included that you can use to test.

library(haven)
path <- system.file("examples", "iris.dta", package = "haven")
read_dta(path)

does this result in a frame containing a categorical species column for you ?

Yes! It does say the names of the species in the "species" column.
I also somehow found that R has taken the labels of the categorical data I had in stata, and converted them to integers, while having them all as "dbl+lbl" next to each data in my categorical variables. Does that make sense?
I just want whatever's written as "dbl+lbl"s to be in the dataframe cells itself.

I think I figured what to do

library(haven)
library(tidyverse)

print("our beginning data in R")
(our_own_test <- tibble(xfac = factor(letters[1:3]),
                           ynum = 1:3))

tf <- tempfile()
print("write it out to a stata .dta file")
haven::write_dta(data = our_own_test,
                path = tf)

print("reading back in we get")
(read_test <- read_dta(tf))

print("recovering the info in the r friendly way")
(fixed_test <- mutate_if(read_test,
                        is.labelled,as_factor))

I'm sorry.. it did not work (or apparently I'm really bad at this). But Thank you sooo much! I really appreciate all your time!!

in what way did it not work ?
if there are error messges, you should write them here...

There are no error messages. It seems like the code ran with no problems. But my dataframe still uses numbers where it should use words (ie. it uses 1,2 instead of Male,Female). Maybe I needed to replace some of the things you wrote with code that is relevant to my dataset. But I don’t know what I'd have to replace.

library(haven)
library(tidyverse)
(initial_df <- read_dta("~/Downloads/R:Stata practice/STATA Lab-20201222-3/1999class621.dta"))
(fixed_df <- mutate_if(initial_df,
                         is.labelled,as_factor))

Still shows everything as numbers :confused:

Maybe this will make it easier to understand what my problem is. As you can see, Stata (on the left) shows the variables and their data as categorical names (M/F, Excellent/Good, etc). But R (On the right)shows it as numbers (1,2). I know, of course, that I can label everything myself in R after importing the file. But it's time consuming, especially when I know there has to be a way to import everything as it is (as names).

ok, maybe you can share a few rows from each table

dput(head(initial_df,5))
dput(head(fixed_df,5))

This is the result I get:

dput(head(initial_df,5))
structure(list(gender = structure(c(1, 1, 2, 1, 2), label = "Student's gender", format.stata = "%8.0g"),
age = structure(c(23, 25, 26, 37, 28), label = "Age in years", format.stata = "%9.0g"),
weight = structure(c(180, 150, 150, 170, 51), label = "Weight in pounds", format.stata = "%9.0g"),
height = structure(c(71, 67, 54, 72, 62), label = "Height in inches", format.stata = "%9.0g"),
phys = structure(c(2, 1, 2, 1, 1), label = "Length of time since check-up", format.stata = "%8.0g"),
health = structure(c(1, 2, 2, 1, 1), label = "General health status", format.stata = "%9.0g"),
smoke100 = structure(c(2, 2, 2, 2, 2), label = "Smoked 100 cigarettes during lifetime", format.stata = "%8.0g"),
cursmoke = structure(c(3, 3, 3, 3, 3), label = "Current cigarette smoker", format.stata = "%10.0g"),
wtchange = structure(c(1, 3, 2, 1, 1), label = "Satisfaction with weight", format.stata = "%15.0g"),
exercise = structure(c(1, 1, 1, 1, 1), label = "Daily level of recreational exercise", format.stata = "%8.0g"),
degree = structure(c(4, 4, 4, 1, 4), label = "Highest degree attained", format.stata = "%14.0g"),
program = structure(c(3, 2, 3, 3, 2), label = "Current degree program", format.stata = "%12.0g")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
dput(head(fixed_df,5))
structure(list(gender = structure(c(1, 1, 2, 1, 2), label = "Student's gender", format.stata = "%8.0g"),
age = structure(c(23, 25, 26, 37, 28), label = "Age in years", format.stata = "%9.0g"),
weight = structure(c(180, 150, 150, 170, 51), label = "Weight in pounds", format.stata = "%9.0g"),
height = structure(c(71, 67, 54, 72, 62), label = "Height in inches", format.stata = "%9.0g"),
phys = structure(c(2, 1, 2, 1, 1), label = "Length of time since check-up", format.stata = "%8.0g"),
health = structure(c(1, 2, 2, 1, 1), label = "General health status", format.stata = "%9.0g"),
smoke100 = structure(c(2, 2, 2, 2, 2), label = "Smoked 100 cigarettes during lifetime", format.stata = "%8.0g"),
cursmoke = structure(c(3, 3, 3, 3, 3), label = "Current cigarette smoker", format.stata = "%10.0g"),
wtchange = structure(c(1, 3, 2, 1, 1), label = "Satisfaction with weight", format.stata = "%15.0g"),
exercise = structure(c(1, 1, 1, 1, 1), label = "Daily level of recreational exercise", format.stata = "%8.0g"),
degree = structure(c(4, 4, 4, 1, 4), label = "Highest degree attained", format.stata = "%14.0g"),
program = structure(c(3, 2, 3, 3, 2), label = "Current degree program", format.stata = "%12.0g")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame")

"Labelled" vectors are not native to R and are not automatically imported. You'll need to explicitly set each of the labelled columns (usually labelled()) and then apply as_factor() to be useful in R.

See e.g. https://cran.r-project.org/web/packages/haven/haven.pdf

Ah alright! Good to know! Thank you for your response!!

indeed. perhaps you only ran three of the four lines of my code suggestion ?

I wish that was the case. I copy-pasted all four lines but it just would not work.

I believe this may be a version issue, can you tell me what results from the following :

packageVersion("haven")

Sorry for the late reply! It says "2.3.1"