 # Cross validation using K folds

I am trying use the function below to carry cross validation on some of the regression model using K folds instead of set validation. I am struggling to work out how to use the code to test the model using the LASSO regression. I would really appreciate any help with this please many thanks.

``````## Set the seed to make the analysis reproducible
set.seed(1)

## 10-fold cross validation
nfolds = 10
## Sample fold-assignment index
fold_index = sample(nfolds, n, replace=TRUE)
## Print first few fold-assignments

reg_cv = function(X1, y, fold_ind) {
Xy = data.frame(X1, y=y)
nfolds = max(fold_ind)
if(!all.equal(sort(unique(fold_ind)), 1:nfolds)) stop("Invalid fold partition.")
cv_errors = numeric(nfolds)
for(fold in 1:nfolds) {
tmp_fit = lm(y ~ ., data=Xy[fold_ind!=fold,])
yhat = predict(tmp_fit, Xy[fold_ind==fold,])
yobs = y[fold_ind==fold]
cv_errors[fold] = mean((yobs - yhat)^2)
}
fold_sizes = numeric(nfolds)
for(fold in 1:nfolds) fold_sizes[fold] = length(which(fold_ind==fold))
test_error = weighted.mean(cv_errors, w=fold_sizes)
return(test_error)
}
``````
``````lasso_fit = glmnet(
X1,
y,
family = "binomial",
alpha = 1,
standardize = FALSE,
lambda = grid
)
``````

1. What are you passing to `fold_ind`? A vector of indices of the same length as X1 (and y) denoting which fold that observation correspond to? If it is so, why do you restrict the indices to be 1, 2, ..., `max(n_fold)` only? There is no need for that. The way you are checking will probably lead to another problem, as I pointed in the note below.
2. Why do you want to do this? `cv.glmnet` already does it, isn't it?

Note: `all.equal` either return `TRUE`, or a character vector. You cannot do `!` operation over a character. It should give you errors. See this for example:

``````> fold_ind <- c(2,4,5,5,2,6,2,3,2)
> nfolds <- max(fold_ind)
> all.equal(sort(unique(fold_ind)), 1:nfolds) # not true, as expected
 "Numeric: lengths (5, 6) differ"
> !all.equal(sort(unique(fold_ind)), 1:nfolds)
Error in !all.equal(sort(unique(fold_ind)), 1:nfolds) :
invalid argument type
``````