I want to create a regression model within another function; but my problem is that when saving the model it becomes really, really big because other data in the environment is being saved with it. Thus, I think the solution might be to handle different environments; this helped me understand this better. Below I have explained the problems in a few steps.
# Helper function just to quickly assess how big the object becomes when being saved.
saveSize <- function (object) {
tf <- tempfile(fileext = ".RData")
on.exit(unlink(tf))
save(object, file = tf)
file.size(tf)
}
# Subset of columns to be used
subset = 1:4
# Model size to compare with; i.e., not created within a function
model1 <- lm(Sepal.Length ~ Sepal.Width, data = iris, subset = subset)
saveSize(model1)
# Size = 965
# Function where there are other data that should NOT be saved.
Function2 <- function (subset){
data_not_to_be_saved <- 1:1e+15
model2 <- lm(Sepal.Length ~ Sepal.Width, data = iris, subset = subset)
}
model2 <- Function2(subset)
saveSize(model2)
# Size = 1148 ; Problematic that size is larger that model 1.
# Solution to above is to create a new environment
Function3 <- function (subset){
data_not_to_be_saved <- 1:1e+15
# New environment
env <- new.env(parent = globalenv())
env$subset <- subset
with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset = subset))
}
model3 <- Function3(subset)
saveSize(model3)
# 1002 # Success: considerably smaller than in Function 2.
# PROBLEM: Getting solution in Function 3 to work within another function.
# This function runs but result in large sized object again
# Also note that I do not want to call iris dataset within the lm call.
Function5 <- function (subset){
data_not_to_be_saved <- 1:1e+15
Function5 <- function (subset) {
env <- new.env(parent = globalenv())
env$subset <- subset
env$datainenvorment <- iris
with(env, lm(Sepal.Length ~ Sepal.Width, data = datainenvorment, subset = subset))
}
model5 <- Function5(subset)
}
model5 <- Function5(subset)
saveSize(model5)
Thanks in advance