Automate Regression in R - Calculate FamaFrench 3 Factor alpha

Dear community,

for an university project i am analyzing a dataset of 50000 mutual funds within the period of 2016-2020.
As a first step i want to calculate the FamaFrench3-factor alpha for all of the funds.
I can get the data i need using a regression for one fund, but i am struggling to scale this for such a large database

Here you see a sceenshot of my database. The dataset consists of ~1.9mn observations and performance data for 50000 funds. Each of this fund has a individual number (crsp_fundno) and i want to calculate the Famafrench3-factor alpha for each of this funds.

I have already matched the KennethFrench Factors to the data. Now i want to perform the regressions unsing the funds excess return (mexret) and the 3 factors from FamaFrench(Mkt-RF, SMB, HML)

So the regression should look something like this
lm(mretFFr$mexret ~ mretFFr$Mkt-RF + mretFFr$SMB + mretFFr$HML)

How can i perform this kind of regression for each of the fund numbers(crsp_fundno)? So that there are 60 values for each fund with complete data as a basis for an individual regression

And then i want to save the outcome of the regression in a line next to each of the specific fund, namely the intercept value

So to summarize:

  1. only look at data with a specific fund number (crsp_fundno)
  2. perform the regression with the data for this fund
  3. save the intercept value in an extra column for all of the specific funds
  4. repeat these 3 steps for every fund number in the list

I am afraid this request is confusing, i did my best to make it understandable as this is my first time posting here

Thanks for any advice

Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

In this case, x is your database, y is your database augmented by an additional variable an intercept value from a regression model. Both x and y are data frames—each contains observations of an object of interest, crsp_fundno arranged row-wise and containing variables, some of which will be used as arguments to lm, which will return an object of class lm, call it fit, containing the value of interest, the intercept, fit$coefficients[1].

Using these pieces we can construct f.

The first thing to note is that functions are first-class objects, which means that they can be given as arguments to other functions. It is convenient to work inside outwards and to create an auxiliary function:

get_intercept <- function(x) {
  (lm(mretFFr$mexret ~ Mkt_RF + SMB + HML, 
      data = your_data[x,]))$coefficients[1]

NB: variable names cannot contain blanks or operators; Mkt-RF changed to Mkt_RF. Also, we would normally parameterize your_data and the other arguments, rather than hardwiring them.

get_intercept takes an argument, x (the crsp_fundno of interest, distinct from the nomenclature for the formal object x) and returns the value of a linear regression's intercept coefficient, which is the desired portion of fit to add to each selected crsp_fundno.



will return the value for the intercept to be placed, FamaFrench3-factor alpha, which I'll call ff3fa. It would be best for this new variable to be provisioned beforehand.

your_database[,"ff3fa"] <- NA

Another helper function will make the placement

place_intercept <- function(x) your_data[x,"ff3fa"] = get_intercept(x)

We now have a way to place a single crsp_fundno into y


An auxiliary object, fund_list can be used to identify the specific crsp_fundno to be so processed.

fund_list <- c(

From there

lapply(fund_list, place_intercept)

which leads to f and its application

add_intercepts <- function(x) lapply(x, place_intercept)

See the FAQ: How to do a minimal reproducible example reprex for beginners to illuminate why the specific code may not be reliable in the absence of a representative data object on which to test. Also, I express no opinion as to the appropriateness of any intended application of the intercept in this case.

1 Like

Dear technocrat,

thank you very much for your detailed answer, this is helping me a lot

If i run the code until
i get the following error code:
Error in model.frame.default(formula = mretFFr$mexret ~ mretFFr$Mkt_ :
variable lengths differ (found for 'RF')

Do you know what i have to change or is this not possible without the data?

And can you explain how the fund_list works?

<- c(

as in my case it are ~50000 entries, so i cant type them right?

Thanks again!

Second question is easier

fund_list <- your_data$crsp_fundnop

First one: did you

your_database[,"ff3fa"] <- NA


Yes, i used this line of code before

Okay i understand the second question, thank you

For the first one, yes i used that line of code before
Gives me the mentioned error

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.