Error in xcr[train, ] : subscript out of bounds

Could I get a little assistance here, I keep getting

> xtrain = Xcr[train,]
Error in Xcr[train, ] : subscript out of bounds



#Used to read data from UCI ML Repo and assigning to cred.
cred = read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", header = F, sep = "")

head(cred)


default = cred$V21 - 1 # If defalut is 2 then this will set it to true.

amount = cred$V5

purp = factor(cred$V4, levels = c("A40","A41","A42","A43","A44","A45","A46","A48","A49","A110"))

levels(purp) = c("New_Car", "Used_Car", "Furniture_Equipment", "Radio_TV", "Applications", "Repairs", "Education", "Retraining", "Business", "Other")

cred$Default = default

cred$Amount = amount

cred$Purpose = purp

cr = cred[,c("Default", "Amount", "Purpose")]

head(cr[,])

summary(cr[,])


###########################

Xcr <- model.matrix(default~., data = cr)[,-1]

Xcr[1:3,]

##########################
set.seed(1)

train = sample(1:1000,900)

##########################

xtrain = xcr[train,]

xtest = xcr[-train]

ytrain = cr$Default[train]

ytest = cr$Default[-train]

##########################

datas=data.frame(default = ytrain, xtrain)

Hi and welcome! I have taken a look at the code provided. The problem is the length of the vector train is 1000, but the number of rows in Xcr is only 988, therefore the subscripts are out of the bounds. The reason for the difference in the number of rows of Xcr and cred is that model.matrix removes any rows with NAs.

I recommend defining train based on the number of rows of Xcr and the proportion of data you want to train, 90%, as

N <- nrow(Xcr)
prop <- .9
train = sample(N, size = floor(prop*N))

You will also need to make sure that your ytrain and ytest have NAs removed before subsetting. Since you included the response, Default, in model.matrx you could do the following:

xtrain = Xcr[train,-1]

xtest = Xcr[-train,-1]

ytrain = Xcr[train,1]

ytest = Xcr[-train,1]

since the first column of Xcr is Default.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.