I have a file with samplDs 1,2,3 to 150, and variables Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and target variable Species. (Iris dataset).
library(randomForest)
library(tidyverse)
(my_iris <- mutate(iris %>% as_tibble(),
sample_id = row_number()))
# # A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species sample_id
# <dbl> <dbl> <dbl> <dbl> <fct> <int>
# 1 5.1 3.5 1.4 0.2 setosa 1
# 2 4.9 3 1.4 0.2 setosa 2
# 3 4.7 3.2 1.3 0.2 setosa 3
# 4 4.6 3.1 1.5 0.2 setosa 4
# 5 5 3.6 1.4 0.2 setosa 5
# 6 5.4 3.9 1.7 0.4 setosa 6
# 7 4.6 3.4 1.4 0.3 setosa 7
# 8 5 3.4 1.5 0.2 setosa 8
# 9 4.4 2.9 1.4 0.2 setosa 9
# 10 4.9 3.1 1.5 0.1 setosa 10
# # ... with 140 more rows
(my_forest <- randomForest::randomForest(formula= Species ~ Sepal.Length +
Sepal.Width +
Petal.Length +
Petal.Width ,
data=my_iris,
importance=TRUE))
# Call:
# randomForest(formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = my_iris, importance = TRUE)
# Type of random forest: classification
# Number of trees: 500
# No. of variables tried at each split: 2
#
# OOB estimate of error rate: 4.67%
# Confusion matrix:
# setosa versicolor virginica class.error
# setosa 50 0 0 0.00
# versicolor 0 47 3 0.06
# virginica 0 4 46 0.08
my_forest$importance
# setosa versicolor virginica MeanDecreaseAccuracy MeanDecreaseGini
# Sepal.Length 0.029520409 0.0200755630 0.031317932 0.027222880 7.924639
# Sepal.Width 0.007755149 -0.0004422338 0.007087913 0.004987078 2.370104
# Petal.Length 0.309264772 0.2994399041 0.289524287 0.295677873 42.211894
# Petal.Width 0.352249068 0.3200053003 0.275227825 0.312119859 46.740186
what do you want to do with this ?