Hi,
I am trying to run a series of rf model by looping over regions using the following code, but keep getting the error:
Error in `$<-.data.frame`(`*tmp*`, "region", value = 10) :
replacement has 1 row, data has 0
Can you please help me see what am I missing in this method?
Here is a glimpse of the data:
Rows: 6,978
Columns: 5
$ age <dbl> 15, 18, 18, 18, 15, 16, 16, 18, 15, 15, 17, 17, 18, 18, 19, 17, 15, 16, 16, 15, 16, …
$ fs <dbl> 12, 12, 12, 12, 10, 10, 10, 12, 10, 10, 10, 12, 12, 11, 12, 12, 11, 10, 11, 12, 12, …
$ sex <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ marital <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ region <dbl> 10, 12, 10, 9, 9, 7, 9, 9, 9, 8, 1, 4, 9, 4, 12, 10, 4, 9, 9, 4, 4, 4, 10, 4, 6, 10,…
And here is the code:
result <- data.frame()
for (i in unique(bb$region)) {
sub_train <- subset(bb, region == i)
rf <- randomForest(fs ~ age + sex + marital , data = sub_train, ntree = 5000, mtry = 3)
imp <- importance(rf, type = 1)
if (nrow(imp) > 0) {
imp_table <- as.data.frame(t(imp))
imp_table$region <- i
colnames(imp_table) <- c("MeanDecreaseAccuracy", "MeanDecreaseGini", "region")
if (nrow(result) == 0) {
result <- imp_table
} else {
result <- rbind(result, imp_table)
}
}
}
result