large dataset and using sapply for the returning outcome as "mean" & "sd"

I have been tasked to generate a large dataset corresponding the following request:

"Write a function that takes a size n, then (1) builds a dataset using the code provided in Q1 but with n observations instead of 100 and without the set.seed(1), (2) runs the replicate() loop that you wrote to answer Q1, which builds 100 linear models and returns a vector of RMSEs, and (3) calculates the mean and standard deviation. "

Dataset = underneath
n <- c(100, 500, 1000, 5000, 10000)
Sigma <- 9*matrix(c(1.0, 0.5, 0.5, 1.0), 2, 2)
dat <- MASS::mvrnorm(n = 100, c(69, 69), Sigma) %>%
data.frame() %>% setNames(c("x", "y"))

rmse <- replicate(100, {
test_index <- createDataPartition(dat$y, times = 1, p = 0.5, list = FALSE)
train_set <- dat %>% slice(-test_index)
test_set <- dat %>% slice(test_index)
fit <- lm(y ~ x, data = train_set)
y_hat <- predict(fit, newdata = test_set)


The goal of this is to return the numbers assigned to variable "n" as ''mean'' & ''Standard deviation''.

So far i have approached the numbers of "n" & "RMSE's" to be plugged as value within sapply as reference to results:

results <- sapply(n, rmse)

that transits to a error: ''can't extract residuals from model''. However performing the "mean" specified with a row or column index "[1]" manually:


an incorrect decimal value is received, whereby SD is nothing more than a "NA" attribute.

[1] NA

I might have overlooked some critical factors here. A friendly reminder with extra approaches and tips to solve the section would be highly appreciated.



