#Load the data
setwd("~/R/Projects/MyData")
df <- read.csv("MyData.csv", header = TRUE)
N <- nrow(df)
p <- which(colnames(df)=="Prediction")
X <- dummy.data.frame(df[, c(10:35)])
Y <- df[, p]
data = cbind(X, Y)
## split data, training & testing, 80:20, AND convert dataframe to a matrix
set.seed(777);
Ind = sample(N, N*0.8, replace = FALSE)
p = ncol(data)
Y_train = data.matrix(data[Ind, p])
X_train = data.matrix(data[Ind, -c(1:9)])
Y_test = data.matrix(data[-Ind, p])
X_test = data.matrix(data[-Ind, -c(1:9)])
k = ncol(X_train)
In this case, the first 9 columns are label rows. Thus, not involved in the training and testing.
I would like to include them with the output though. This one above might be more complex because there are 9 label rows, but below is a simple example of the output.
For example, instead of just: 25.6, 22.3, 24.1 I would want to see Car Model (label) and MPG (prediction). That way, I can write it to a csv file with all the labels and the predictions that the model made.