Multiple individual regressions

Jorgenh · October 25, 2022, 12:17pm

Hello everyone, I am new to R studio and I cannot find the right code to solve my issue.

I have time series data and I want to use fund's returns as dependent variable and some factors as indipendent variables. Given the fact that I have hundreds of funds, I would lose a lot of time performing each regression manually. Is possible fixing the indipendent variables, performing the regression for each fund automatically and saving the output?

Thank you!

andresrcs · October 25, 2022, 12:27pm

Yes, it is possible, but to help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

fcas80 · October 25, 2022, 9:44pm

I created some hypothetical data. Does this work for you?

df <- data.frame( x1 = runif(30), x2 = runif(30), y = runif(30) ) # hypothetical data
summary <- data.frame()
n <- ncol(df) - 1
indepvars <- colnames(df[1:n])
indepvars
for(i in 1:n) {
formula <- paste0("y ~ ", indepvars[i])
model <- lm(formula, df)
summary[i,1] <- colnames(df[i])
summary[i,2] <- model$coefficients[1]
summary[i,3] <- model$coefficients[2]
s <- summary(model)
summary[i,4] <- s$adj.r.squared
}
colnames(summary) <- c("Indep var", "Intercept", "Slope", "Adj R^2")
summary

[1] "x1" "x2"
Indep var Intercept Slope Adj R^2
1 x1 0.8357128 -0.38962175 0.12795058
2 x2 0.6683086 -0.07937175 -0.02845706

Jorgenh · October 26, 2022, 8:16am

Dear fcas80,

Many thanks for your answer.

With your code, I see that y is the dependent variable and x1, x2... is the independent variables.
However, I want to regress each variable x1, x2... on the independent variable y. Variable y in my data set is the market return, and I want to regress all the funds individually on the market return to obtain alphas (intercept). Is there a solution for this?

Have a nice day.

Jorgenh · October 26, 2022, 10:35am

Here is an example of the dataset where I have date, two mutual funds and the market excess return.

datapasta::df_paste(head(dataframe, 5)[, c('Date','Alfred Berg Global C (NOK)', 'APS Global Equity R', 'mkt-rf')])
data.frame(
check.names = FALSE,
Date = c("2016-01-01","2016-02-01","2016-03-01",
"2016-04-01","2016-05-01"),
Alfred Berg Global C (NOK) = c(-0.0708041333333334,0.0115925,0.00619040000000006,
-0.0366933,0.0415465333333334),
APS Global Equity R = c(-0.0681767333333333,0.0077110000000001,
0.00870629999999997,-0.00214280000000002,
0.0217895333333334),
mkt-rf = c(-0.0743514443678082,-0.00884050754326752,
0.018302499067988,-0.0128197204370558,
0.0391929254039709)
)

I want to regress both of the funds individually on the market return.

I have written the following code for the linear model:

summary(lm(df.t18$Alfred Berg Global C (NOK~ df.t18$mkt-rf))

Is there a way I can do this regression on all the mutual funds at the same time? Now I manually change the name of the dependent variable for every regression, but this is very time consuming.

I appreciate all help I can get!

fcas80 · October 26, 2022, 2:51pm

Is this what you need?

df <- data.frame( y1 = runif(30), y2 = runif(30), x = runif(30) ) # hypothetical data
summary <- data.frame()
n <- ncol(df) - 1
depvars <- colnames(df[1:n])

for(i in 1:n) {
formula <- paste0(depvars[i], " ~ x")

model <- lm(formula, df)
summary[i,1] <- colnames(df[i])
summary[i,2] <- model$coefficients[1]
summary[i,3] <- model$coefficients[2]
s <- summary(model)
summary[i,4] <- s$adj.r.squared
}
colnames(summary) <- c("Dep var", "Intercept", "Slope", "Adj R^2")
summary

andresrcs · October 26, 2022, 11:34pm

How about something like this?

library(tidyverse)

sample_df <- data.frame(
    check.names = FALSE,
    Date = c("2016-01-01","2016-02-01","2016-03-01",
             "2016-04-01","2016-05-01"),
    `Alfred Berg Global C (NOK)` = c(-0.0708041333333334,0.0115925,0.00619040000000006,
                                   -0.0366933,0.0415465333333334),
    `APS Global Equity R` = c(-0.0681767333333333,0.0077110000000001,
                            0.00870629999999997,-0.00214280000000002,
                            0.0217895333333334),
    `mkt-rf` = c(-0.0743514443678082,-0.00884050754326752,
               0.018302499067988,-0.0128197204370558,
               0.0391929254039709)
)

sample_df %>% 
    pivot_longer(-c(`mkt-rf`, Date), names_to = 'Fund', values_to = 'Value') %>% 
    group_by(Fund) %>% 
    group_modify(~broom::glance(lm(Value ~ `mkt-rf`, data = .x)))
#> # A tibble: 2 × 13
#> # Groups:   Fund [2]
#>   Fund   r.squ…¹ adj.r…²  sigma stati…³ p.value    df logLik   AIC   BIC devia…⁴
#>   <chr>    <dbl>   <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 Alfre…   0.857   0.809 0.0193    17.9  0.0241     1   13.9 -21.8 -23.0 1.12e-3
#> 2 APS G…   0.904   0.872 0.0127    28.2  0.0130     1   16.0 -26.0 -27.2 4.85e-4
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> #   variable names ¹r.squared, ²adj.r.squared, ³statistic, ⁴deviance

^{Created on 2022-10-26 with reprex v2.0.2}

Jorgenh · October 27, 2022, 6:21am

Thank you very much guys.

With your code @andresrcs , is it possible to see the intercept as well?

andresrcs · October 27, 2022, 4:03pm

yes

library(tidyverse)

sample_df <- data.frame(
    check.names = FALSE,
    Date = c("2016-01-01","2016-02-01","2016-03-01",
             "2016-04-01","2016-05-01"),
    `Alfred Berg Global C (NOK)` = c(-0.0708041333333334,0.0115925,0.00619040000000006,
                                     -0.0366933,0.0415465333333334),
    `APS Global Equity R` = c(-0.0681767333333333,0.0077110000000001,
                              0.00870629999999997,-0.00214280000000002,
                              0.0217895333333334),
    `mkt-rf` = c(-0.0743514443678082,-0.00884050754326752,
                 0.018302499067988,-0.0128197204370558,
                 0.0391929254039709)
)

sample_df %>% 
    pivot_longer(-c(`mkt-rf`, Date), names_to = 'Fund', values_to = 'Value') %>% 
    group_by(Fund) %>% 
    group_modify(~broom::tidy(lm(Value ~ `mkt-rf`, data = .x)))
#> # A tibble: 4 × 6
#> # Groups:   Fund [2]
#>   Fund                       term         estimate std.error statistic p.value
#>   <chr>                      <chr>           <dbl>     <dbl>     <dbl>   <dbl>
#> 1 Alfred Berg Global C (NOK) (Intercept) -0.00229    0.00880   -0.260   0.812 
#> 2 Alfred Berg Global C (NOK) `mkt-rf`     0.953      0.225      4.23    0.0241
#> 3 APS Global Equity R        (Intercept) -0.000346   0.00580   -0.0596  0.956 
#> 4 APS Global Equity R        `mkt-rf`     0.789      0.148      5.31    0.0130

^{Created on 2022-10-27 with reprex v2.0.2}

Jorgenh · October 28, 2022, 8:20am

Thank you very much @andresrcs

Your code did the job!

One last question:
Is there an easy way to add several explanatory variables in the regression? I want to check for other factors in the stock market, in addition to the market return (mkt-rf). In my dataset, I have added two more explanatory variables, called "SMB" and "HML". These two columns is to the right of the variable "mkt-rf" and I tried running the following code without luck:

dataframe %>%
pivot_longer(-c(mkt-rf, Date), names_to = 'Fund', values_to = 'Value') %>%
group_by(Fund) %>%
group_modify(~broom::tidy(lm(Value ~ mkt-rf+SMB+HML, data = .x)))

Do you have a quick answer to this?

Thanks again.

andresrcs · October 28, 2022, 11:38am

Those variables need to be present in the data as columns not levels of a factor so you need to exclude them from the pivoting as well.

dataframe %>%
pivot_longer(-c(`mkt-rf`, Date, SMB, HML), names_to = 'Fund', values_to = 'Value')

system · November 18, 2022, 11:39am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.