Error missforest command

Continuing the discussion from Error in missForest command:

Dear Mara,

I have managed to run the reprex for miss forest command. Please see the details below. I have three questions: 1) the Missforest command runs with subset of data I cerated for reprex but not with the main dataset which is quite large. So how do I over come this problem. 2) In the reprex command below, I see lot of warning messages, should I ignore them. 3) How do I use imputed data in further analysis ? Can I view them or export the data to other format e.g excel or stata?

Many thanks in advance. Regards, Saran

library(missForest)
#> Loading required package: randomForest
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> Loading required package: foreach
#> Loading required package: itertools
#> Loading required package: iterators
df <- data.frame(
  drecall = c(NA, NA, 6, 7, 5, NA, NA, NA, NA, 8, 5, NA, NA, NA, 3, NA,
              NA, 6, 5, 5, 9, 4, 3, 4, NA, NA, 7, 3, NA, 3, 7, 7, 4, NA,
              4, 4, NA, 4, NA, 2, 4, 7, 7, 5, 7, 5, 2, 4, NA, NA),
  orientation = c(NA, NA, 3, 4, 4, NA, NA, NA, NA, 4, 4, NA, NA, NA, 4, NA,
                  NA, 4, 3, 4, 4, 4, 3, 4, NA, NA, 3, 4, NA, 3, 4, 4, 4, NA,
                  3, 4, NA, 3, NA, 3, 3, 4, 4, 3, 4, 4, 4, 4, NA, NA),
  number = c(NA, NA, NA, 3, 2, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA,
             NA, NA, 2, 3, 3, NA, NA, 0, NA, NA, NA, 3, NA, NA, 2, 3, 3,
             NA, 2, NA, NA, NA, NA, NA, 1, NA, 3, NA, NA, NA, NA, NA, NA,
             NA),
  slfall = c(NA, NA, 5, 5, 5, NA, NA, NA, NA, 4, 5, NA, NA, NA, 5, NA,
             NA, 3, 5, 5, 3, 5, 5, 5, NA, NA, 5, 5, NA, 5, 5, 4, 3, NA,
             3, 4, NA, 3, NA, 5, 5, 4, 5, 5, 4, 5, 5, 2, NA, NA),
  slwake = c(NA, NA, 4, 3, 1, NA, NA, NA, NA, 3, 4, NA, NA, NA, 4, NA,
             NA, 1, 1, 2, 2, 1, 4, 1, NA, NA, 3, 1, NA, 4, 1, 3, 1, NA,
             2, 3, NA, 1, NA, 4, 4, 2, 1, 4, 4, 1, 4, 1, NA, NA),
  sltired = c(NA, NA, 4, 2, 4, NA, NA, NA, NA, 4, 4, NA, NA, NA, 1, NA,
              NA, 2, 4, 4, 2, 1, 4, 4, NA, NA, 4, 3, NA, 4, 2, 1, 4, NA,
              1, 4, NA, 1, NA, 4, 2, 4, 4, 4, 4, 4, 2, 4, NA, NA),
  slmorn = c(NA, NA, 4, 4, 2, NA, NA, NA, NA, 4, 1, NA, NA, NA, 1, NA,
             NA, 2, 4, 4, 1, 1, 4, 3, NA, NA, 2, 4, NA, 1, 4, 4, 1, NA,
             2, 4, NA, 1, NA, 1, 2, 4, 4, 4, 4, 4, 4, 4, NA, NA),
  affect = c(NA, NA, 7, 8, 8, NA, NA, NA, NA, 7, 7, NA, NA, NA, 4, NA,
             NA, 7, 8, 6, 7, 0, 8, 8, NA, NA, 8, 7, NA, 8, 5, 7, 4, NA,
             8, 8, NA, NA, NA, 8, 8, 8, 8, 8, 8, 8, 7, 7, NA, NA),
  hear = c(NA, NA, 5, 4, 3, NA, 5, 4, NA, 4, 4, NA, NA, 1, 2, NA, NA,
           3, 2, 4, 3, 2, 5, 5, 3, NA, 2, 3, 2, 3, 2, 4, 5, NA, 2, 2,
           NA, 1, NA, 3, 3, 3, 4, 5, 5, 5, 5, 3, 4, NA),
  nvision = c(NA, NA, 4, 4, 4, NA, NA, NA, NA, 5, 4, NA, NA, NA, 3, NA,
              NA, 3, 4, 5, 5, 2, 5, 5, NA, NA, 4, 4, NA, 3, 3, 4, 1, NA,
              3, 5, NA, 4, NA, 2, 3, 3, 3, 3, 4, 5, 5, 4, NA, NA)
)


iris.mis <- prodNA(df)
summary(iris.mis)
#>     drecall    orientation        number          slfall     
#>  Min.   :2    Min.   :3.000   Min.   :0.000   Min.   :2.000  
#>  1st Qu.:4    1st Qu.:3.000   1st Qu.:2.000   1st Qu.:4.000  
#>  Median :5    Median :4.000   Median :3.000   Median :5.000  
#>  Mean   :5    Mean   :3.655   Mean   :2.357   Mean   :4.448  
#>  3rd Qu.:6    3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:5.000  
#>  Max.   :9    Max.   :4.000   Max.   :3.000   Max.   :5.000  
#>  NA's   :25   NA's   :21      NA's   :36      NA's   :21     
#>      slwake         sltired       slmorn          affect     
#>  Min.   :1.000   Min.   :1    Min.   :1.000   Min.   :0.000  
#>  1st Qu.:1.000   1st Qu.:2    1st Qu.:1.000   1st Qu.:7.000  
#>  Median :2.000   Median :4    Median :4.000   Median :8.000  
#>  Mean   :2.357   Mean   :3    Mean   :2.793   Mean   :7.077  
#>  3rd Qu.:4.000   3rd Qu.:4    3rd Qu.:4.000   3rd Qu.:8.000  
#>  Max.   :4.000   Max.   :4    Max.   :4.000   Max.   :8.000  
#>  NA's   :22      NA's   :27   NA's   :21      NA's   :24     
#>       hear        nvision   
#>  Min.   :1.0   Min.   :1.0  
#>  1st Qu.:2.5   1st Qu.:3.0  
#>  Median :3.0   Median :4.0  
#>  Mean   :3.4   Mean   :3.7  
#>  3rd Qu.:4.5   3rd Qu.:4.0  
#>  Max.   :5.0   Max.   :5.0  
#>  NA's   :15    NA's   :20
iris.imp <- missForest(iris.mis)
#>   missForest iteration 1 in progress...
#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?
#> done!
#>   missForest iteration 2 in progress...
#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?
#> done!
#>   missForest iteration 3 in progress...
#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?
#> done!
#>   missForest iteration 4 in progress...
#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?

#> Warning in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry =
#> mtry, : The response has five or fewer unique values. Are you sure you want
#> to do regression?
#> done!

Created on 2019-02-21 by the reprex package (v0.2.1)

Re. 1 and 2, you can safely ignore them because when you actually run this, you'll likely have more unique values.

In regards to your questions after 3, that's a whole field of study unto itself, so I can't really answer for your specific case. The missing data links I provided in your earlier question are definitely worth looking at.

Re. exporting, the answer is yes. One of the rio package vignettes has a nice table of packages that can import and export to and from various formats:
https://cran.r-project.org/web/packages/rio/vignettes/rio.html

For writing to Stata you might take a look at haven

Dear Mara,
Thank you for the quick response and useful links. I appreciate this help.

When I try to run missforest with original full data, I get this following error message.

Error in sample.int(length(x), size, replace, prob) :
invalid first argument

The code I run was
im.out.1 <- missForest(xmis =wi.miss, maxiter = 10, ntree =100,variablewise = FALSE,decreasing = FALSE, verbose = FALSE,mtry = floor(sqrt(ncol(wi.miss))), replace = TRUE,classwt = NULL, cutoff = NULL, strata = NULL,sampsize = NULL, nodesize = NULL, maxnodes = NULL,xtrue = NA, parallelize = "no")

Is it something to do with the data frame?

Regards,

J

Based on the documentation, and vignette (here) it looks like the first argument you pass into missForest() is xmis which needs to be:

a data matrix with missing values. The columns correspond to the variables and the rows to the observations.

In addition to the documentation on CRAN and elsewhere, you can see this info directly in R by running ?missForest or help(missForest) with the library loaded.

https://www.r-project.org/help.html

Thanks alot. This is very helpful.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.