Error: cannot allocate vector of size 25.5 Gb

rstudioserver
randomforest

#1

Hi,

I am trying to perform an image classification using the random forest model. The image is a Landsat 7 ETM with bands 4,3,2. I am running the code on Rstudio server. However, I get the following error.

rf <- randomForest(as.factor(class) ~ B2 + B3 + B4, data=training,

  •                 importance=TRUE,
    
  •                 ntree=2000, na.action = na.omit)
    

Error: cannot allocate vector of size 25.5 Gb

What could be the problem?

Regards
Edward


R Memory problem with Panelvar package
#2

This means that there is insufficient RAM to complete the operation.


#3

You'd be better off using the ranger package and also avoiding the formula method.


#4

I do not have much experience using R, can you suggest the code I can run to ensure I use the ranger package and also avoid the formula method?


#5

This may not solve your memory issue:

# with randomForest:
rf <- randomForest(
  x = training[, c("B2", "B3", "B4")],
  y = as.factor(training$class),
  importance = TRUE,
  ntree = 2000,
  na.action = na.omit
)

# With ranger

install.packages(ranger, repos = "http://cran.r-project.org")

library(ranger)

# remove missing vlaues before this call
rf_2 <- ranger(
  as.factor(class) ~ B2 + B3 + B4, 
  data = training, 
  importance = "impurity",
  num.trees = 2000
)

#6

for the random forest model I got the following error:

  • x = training[, c("B2", "B3", "B4")],
  • y = as.factor(training$class),
  • importance = TRUE,
  • ntree = 2000)
    Error in randomForest.default(x = training[, c("B2", "B3", "B4")], y = as.factor(training$class), :
    NA not permitted in predictors

Using the ranger package I got the following error:

rf_2 <- ranger(

  • as.factor(class) ~ B2 + B3 + B4,
  • data = training,
  • importance = "impurity",
  • num.trees = 2000
  • )
    Error: Missing data in columns: B2, B3, B4.

#7

Remove the NA values before running that. You can use na.omit or complete.cases


#8

These are the latest errors:

#with ranger
Error: cannot allocate vector of size 25.5 Gb > install.packages(ranger, repos = "http://cran.r-project.org") Error in install.packages : 'match' requires vector arguments > library(ranger) > rf_2 <- ranger( + as.factor(class) ~ B2 + B3 + B4, + data = training, + importance = "impurity", + num.trees = 2000, na.omit(training)) Error: Missing data in columns: B2, B3, B4.

#with random forest
rf <- randomForest(

  • x = training[, c("B2", "B3", "B4")],
  • y = as.factor(training$class),
  • importance = TRUE,
  • ntree = 2000, na.omit(training))
    Error in randomForest.default(x = training[, c("B2", "B3", "B4")], y = as.factor(training$class), :
    x and xtest must have same number of columns

#9

It looks like the underlying issues is the insufficient amount of memory on your system for these data.


#10

I have tried to change the script for the random forest and I got the following error:

rf <- train(as.factor(class) ~ B2 + B3 + B4, method = "rf", data = training, na.action = na.omit)
note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .

Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :2 NA's :2
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Any solution?


#11

Can do much more without a small, reproducible example.