In the software I have been using for deep machine learning models, I was able to mark them as label columns that would remove them from training, but supply them with the outputs. For example, a column with "name" or "id".
What is the equivalency of this concept in RStudio?
#Load the data
setwd("~/R/Projects/MyData")
df <- read.csv("MyData.csv", header = TRUE)
N <- nrow(df)
p <- which(colnames(df)=="Prediction")
X <- dummy.data.frame(df[, c(10:35)])
Y <- df[, p]
data = cbind(X, Y)
## split data, training & testing, 80:20, AND convert dataframe to a matrix
set.seed(777);
Ind = sample(N, N*0.8, replace = FALSE)
p = ncol(data)
Y_train = data.matrix(data[Ind, p])
X_train = data.matrix(data[Ind, -c(1:9)])
Y_test = data.matrix(data[-Ind, p])
X_test = data.matrix(data[-Ind, -c(1:9)])
k = ncol(X_train)
In this case, the first 9 columns are label rows. Thus, not involved in the training and testing.
I would like to include them with the output though. This one above might be more complex because there are 9 label rows, but below is a simple example of the output.
For example, instead of just: 25.6, 22.3, 24.1 I would want to see Car Model (label) and MPG (prediction). That way, I can write it to a csv file with all the labels and the predictions that the model made.
All I want is the columns not used in training because they have no prediction value to show up with my prediction results.
So if I train with 26 variables and have 9 columns not being used, I would like the 9 columns, which are unique identifiers of the data to be tied with the prediction value.