for an university project i am analyzing a dataset of 50000 mutual funds within the period of 2016-2020.
As a first step i want to calculate the FamaFrench3-factor alpha for all of the funds.
I can get the data i need using a regression for one fund, but i am struggling to scale this for such a large database
Here you see a sceenshot of my database. The dataset consists of ~1.9mn observations and performance data for 50000 funds. Each of this fund has a individual number (crsp_fundno) and i want to calculate the Famafrench3-factor alpha for each of this funds.
I have already matched the KennethFrench Factors to the data. Now i want to perform the regressions unsing the funds excess return (mexret) and the 3 factors from FamaFrench(Mkt-RF, SMB, HML)
So the regression should look something like this
lm(mretFFr$mexret ~ mretFFr$Mkt-RF + mretFFr$SMB + mretFFr$HML)
How can i perform this kind of regression for each of the fund numbers(crsp_fundno)? So that there are 60 values for each fund with complete data as a basis for an individual regression
And then i want to save the outcome of the regression in a line next to each of the specific fund, namely the intercept value
So to summarize:
- only look at data with a specific fund number (crsp_fundno)
- perform the regression with the data for this fund
- save the intercept value in an extra column for all of the specific funds
- repeat these 3 steps for every fund number in the list
I am afraid this request is confusing, i did my best to make it understandable as this is my first time posting here
Thanks for any advice