Hi everyone, I have a question about data cleaning using .dta files (stata files) in r. I would like to do a logistic regression analysis using a number of variables from this file, and I've tried to start the process of data cleaning by transforming the missing values (currently entered as 999 or 999.9 or similar) into NA. However, everything that I've tried has been unsuccessful - I can't seem to transform the values at all. Please see my code below:
#setting up
options(digits=3,show.signif.stars=F)
source("Rfunctions.R")
library(foreign)
library(naniar)
#read in data
allbus.df <- read.dta("Allbus04.dta", convert.dates = TRUE, convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
attach(allbus.df)
#replace missings with NA attempt 1
for (i in 1:length(allbus.df))
{
if (class(allbus.df[[i]] == "labelled")
allbus.df[[i]][allbus.df[[i]] < 100] = NA
}
#attempt 2
allbus.df %>% replace_with_na(replace = list(v244 = 999.9))
allbus.df %>% replace_with_na(replace = list(v244 = 999.9))
na_strings <- c("99", "999", "999.9", "99/9999", "9999")
allbus.df %>%
replace_with_na_all(condition = ~.x %in% na_strings)
print(v244)
#attempt 3
na_if(v244, 999.9)
package(dplyr)
library(dplyr)
na_if(v244, 999.9)
attach(allbus.df)
#attempt 4, try to re-read in the file (already detached)
allbus.df <- read.dta("Allbus04.dta", convert.dates = TRUE, convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
allbus.df <- read.dta("Allbus04.dta", convert.dates = TRUE, convert.factors = NA,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
attach(allbus.df)
These are only a few of the options I've tried - whatever I do, when I try to then check the variable after, it still contains the original missing value number, in this case 999.9
I am new here, my apologies if anything in this post doesn't follow the guidelines.
Melissa