Cross validation using K folds

I am trying use the function below to carry cross validation on some of the regression model using K folds instead of set validation. I am struggling to work out how to use the code to test the model using the LASSO regression. I would really appreciate any help with this please many thanks.

## Set the seed to make the analysis reproducible
set.seed(1)

## 10-fold cross validation
nfolds = 10
## Sample fold-assignment index
fold_index = sample(nfolds, n, replace=TRUE)
## Print first few fold-assignments
head(fold_index)

reg_cv = function(X1, y, fold_ind) {
  Xy = data.frame(X1, y=y)
  nfolds = max(fold_ind)
  if(!all.equal(sort(unique(fold_ind)), 1:nfolds)) stop("Invalid fold partition.")
  cv_errors = numeric(nfolds)
  for(fold in 1:nfolds) {
    tmp_fit = lm(y ~ ., data=Xy[fold_ind!=fold,])
    yhat = predict(tmp_fit, Xy[fold_ind==fold,])
    yobs = y[fold_ind==fold]
    cv_errors[fold] = mean((yobs - yhat)^2)
  }
  fold_sizes = numeric(nfolds)
  for(fold in 1:nfolds) fold_sizes[fold] = length(which(fold_ind==fold))
  test_error = weighted.mean(cv_errors, w=fold_sizes)
  return(test_error)
}
lasso_fit = glmnet(
  X1,
  y,
  family = "binomial",
  alpha = 1,
  standardize = FALSE,
  lambda = grid
)

I do not completely understand your question. Can you please answer these queries?

  1. What are you passing to fold_ind? A vector of indices of the same length as X1 (and y) denoting which fold that observation correspond to? If it is so, why do you restrict the indices to be 1, 2, ..., max(n_fold) only? There is no need for that. The way you are checking will probably lead to another problem, as I pointed in the note below.
  2. Why do you want to do this? cv.glmnet already does it, isn't it?

Note: all.equal either return TRUE, or a character vector. You cannot do ! operation over a character. It should give you errors. See this for example:

> fold_ind <- c(2,4,5,5,2,6,2,3,2)
> nfolds <- max(fold_ind)
> all.equal(sort(unique(fold_ind)), 1:nfolds) # not true, as expected
[1] "Numeric: lengths (5, 6) differ"
> !all.equal(sort(unique(fold_ind)), 1:nfolds)
Error in !all.equal(sort(unique(fold_ind)), 1:nfolds) : 
  invalid argument type

Thanks for your help!

fold ind,contains a consecutive set of integers, starting at 1. On line 4, that create a vector to hold the average MSE computed over each fold. Then loop over the folds.

I am required to do the cross validation in a fair comparison way. This way ensure the same folds is used by each model.

Thanks for your help