A common procedure in preparing data sets for analysis is to impute missing values by replacing the NAs with a random member of the set of non-missing values (only legitimate with a small fraction of missing data!). I tried the following code:
df <- data.frame( var = c( 1,NA, 2,3,4,NA,6,NA,9,NA))
df$var
[1] 1 NA 2 3 4 NA 6 NA 9 NA
df <- df %>% mutate(var = ifelse(!is.na(var),var, sample(var[!is.na(var)],1)))
df$var
[1] 1 9 2 3 4 9 6 9 9 9
This didn't work, as the "sample(var[!is.na(var)],1)" only ran once and chose one value (9)to fill in all the NAs.
I then worked out the following code:
df <- data.frame(pid = 1:10, var = c( 1,NA, 2,3,4,NA,5,NA,6,NA))
df$var
[1] 1 NA 2 3 4 NA 5 NA 6 NA
df$var <- replace(df$var, which(is.na(df$var)), sample(df$var[!is.na(df$var)], length(which(is.na(df$var)))))
df$var
[1] 1 2 2 3 4 4 5 6 6 3
This worked, but it is excessively complicated.
My question is whether there is a way to modify the first code so that the sample function is invoked separately for each NA rather than once for all the NAs?
Thanks to anyone that can suggest a better(simpler) way to do this.
:Larry Hunsicker