I have time series data and I want to use fund's returns as dependent variable and some factors as indipendent variables. Given the fact that I have hundreds of funds, I would lose a lot of time performing each regression manually. Is possible fixing the indipendent variables, performing the regression for each fund automatically and saving the output?

I created some hypothetical data. Does this work for you?

df <- data.frame( x1 = runif(30), x2 = runif(30), y = runif(30) ) # hypothetical data
summary <- data.frame()
n <- ncol(df) - 1
indepvars <- colnames(df[1:n])
for(i in 1:n) {
formula <- paste0("y ~ ", indepvars[i])
model <- lm(formula, df)
summary[i,1] <- colnames(df[i])
summary[i,2] <- model$coefficients[1]
summary[i,3] <- model$coefficients[2]
s <- summary(model)
summary[i,4] <- s$adj.r.squared
colnames(summary) <- c("Indep var", "Intercept", "Slope", "Adj R^2")

[1] "x1" "x2"
Indep var Intercept Slope Adj R^2
1 x1 0.8357128 -0.38962175 0.12795058
2 x2 0.6683086 -0.07937175 -0.02845706


Dear fcas80,

Many thanks for your answer.

With your code, I see that y is the dependent variable and x1, x2... is the independent variables.
However, I want to regress each variable x1, x2... on the independent variable y. Variable y in my data set is the market return, and I want to regress all the funds individually on the market return to obtain alphas (intercept). Is there a solution for this?

Have a nice day.

Here is an example of the dataset where I have date, two mutual funds and the market excess return.

datapasta::df_paste(head(dataframe, 5)[, c('Date','Alfred Berg Global C (NOK)', 'APS Global Equity R', 'mkt-rf')])
check.names = FALSE,
Date = c("2016-01-01","2016-02-01","2016-03-01",
Alfred Berg Global C (NOK) = c(-0.0708041333333334,0.0115925,0.00619040000000006,
APS Global Equity R = c(-0.0681767333333333,0.0077110000000001,
mkt-rf = c(-0.0743514443678082,-0.00884050754326752,

I want to regress both of the funds individually on the market return.

I have written the following code for the linear model:

summary(lm(df.t18$Alfred Berg Global C (NOK~ df.t18$mkt-rf))

Is there a way I can do this regression on all the mutual funds at the same time? Now I manually change the name of the dependent variable for every regression, but this is very time consuming.

I appreciate all help I can get!

Is this what you need?

df <- data.frame( y1 = runif(30), y2 = runif(30), x = runif(30) ) # hypothetical data
summary <- data.frame()
n <- ncol(df) - 1
depvars <- colnames(df[1:n])

for(i in 1:n) {
formula <- paste0(depvars[i], " ~ x")

model <- lm(formula, df)
summary[i,1] <- colnames(df[i])
summary[i,2] <- model$coefficients[1]
summary[i,3] <- model$coefficients[2]
s <- summary(model)
summary[i,4] <- s$adj.r.squared
colnames(summary) <- c("Dep var", "Intercept", "Slope", "Adj R^2")

How about something like this?


sample_df <- data.frame(
    check.names = FALSE,
    Date = c("2016-01-01","2016-02-01","2016-03-01",
    `Alfred Berg Global C (NOK)` = c(-0.0708041333333334,0.0115925,0.00619040000000006,
    `APS Global Equity R` = c(-0.0681767333333333,0.0077110000000001,
    `mkt-rf` = c(-0.0743514443678082,-0.00884050754326752,

sample_df %>% 
    pivot_longer(-c(`mkt-rf`, Date), names_to = 'Fund', values_to = 'Value') %>% 
    group_by(Fund) %>% 
    group_modify(~broom::glance(lm(Value ~ `mkt-rf`, data = .x)))
#> # A tibble: 2 × 13
#> # Groups:   Fund [2]
#>   Fund   r.squ…¹ adj.r…²  sigma stati…³ p.value    df logLik   AIC   BIC devia…⁴
#>   <chr>    <dbl>   <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 Alfre…   0.857   0.809 0.0193    17.9  0.0241     1   13.9 -21.8 -23.0 1.12e-3
#> 2 APS G…   0.904   0.872 0.0127    28.2  0.0130     1   16.0 -26.0 -27.2 4.85e-4
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> #   variable names ¹​r.squared, ²​adj.r.squared, ³​statistic, ⁴​deviance

Created on 2022-10-26 with reprex v2.0.2

Thank you very much guys.

With your code @andresrcs , is it possible to see the intercept as well?



sample_df <- data.frame(
    check.names = FALSE,
    Date = c("2016-01-01","2016-02-01","2016-03-01",
    `Alfred Berg Global C (NOK)` = c(-0.0708041333333334,0.0115925,0.00619040000000006,
    `APS Global Equity R` = c(-0.0681767333333333,0.0077110000000001,
    `mkt-rf` = c(-0.0743514443678082,-0.00884050754326752,

sample_df %>% 
    pivot_longer(-c(`mkt-rf`, Date), names_to = 'Fund', values_to = 'Value') %>% 
    group_by(Fund) %>% 
    group_modify(~broom::tidy(lm(Value ~ `mkt-rf`, data = .x)))
#> # A tibble: 4 × 6
#> # Groups:   Fund [2]
#>   Fund                       term         estimate std.error statistic p.value
#>   <chr>                      <chr>           <dbl>     <dbl>     <dbl>   <dbl>
#> 1 Alfred Berg Global C (NOK) (Intercept) -0.00229    0.00880   -0.260   0.812 
#> 2 Alfred Berg Global C (NOK) `mkt-rf`     0.953      0.225      4.23    0.0241
#> 3 APS Global Equity R        (Intercept) -0.000346   0.00580   -0.0596  0.956 
#> 4 APS Global Equity R        `mkt-rf`     0.789      0.148      5.31    0.0130

Created on 2022-10-27 with reprex v2.0.2

Thank you very much @andresrcs

Your code did the job!

One last question:
Is there an easy way to add several explanatory variables in the regression? I want to check for other factors in the stock market, in addition to the market return (mkt-rf). In my dataset, I have added two more explanatory variables, called "SMB" and "HML". These two columns is to the right of the variable "mkt-rf" and I tried running the following code without luck:

dataframe %>%
pivot_longer(-c(mkt-rf, Date), names_to = 'Fund', values_to = 'Value') %>%
group_by(Fund) %>%
group_modify(~broom::tidy(lm(Value ~ mkt-rf+SMB+HML, data = .x)))

Do you have a quick answer to this?

Thanks again.

Those variables need to be present in the data as columns not levels of a factor so you need to exclude them from the pivoting as well.

dataframe %>%
pivot_longer(-c(`mkt-rf`, Date, SMB, HML), names_to = 'Fund', values_to = 'Value')
