# Automate Regression in R - Calculate FamaFrench 3 Factor alpha

Dear community,

for an university project i am analyzing a dataset of 50000 mutual funds within the period of 2016-2020.
As a first step i want to calculate the FamaFrench3-factor alpha for all of the funds.
I can get the data i need using a regression for one fund, but i am struggling to scale this for such a large database

Here you see a sceenshot of my database. The dataset consists of ~1.9mn observations and performance data for 50000 funds. Each of this fund has a individual number (crsp_fundno) and i want to calculate the Famafrench3-factor alpha for each of this funds.

I have already matched the KennethFrench Factors to the data. Now i want to perform the regressions unsing the funds excess return (mexret) and the 3 factors from FamaFrench(Mkt-RF, SMB, HML)

So the regression should look something like this
lm(mretFFr\$mexret ~ mretFFr\$Mkt-RF + mretFFr\$SMB + mretFFr\$HML)

How can i perform this kind of regression for each of the fund numbers(crsp_fundno)? So that there are 60 values for each fund with complete data as a basis for an individual regression

And then i want to save the outcome of the regression in a line next to each of the specific fund, namely the intercept value

So to summarize:

1. only look at data with a specific fund number (crsp_fundno)
2. perform the regression with the data for this fund
3. save the intercept value in an extra column for all of the specific funds
4. repeat these 3 steps for every fund number in the list

I am afraid this request is confusing, i did my best to make it understandable as this is my first time posting here

Every `R` problem can be thought of with advantage as the interaction of three objectsâ€” an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebraâ€” f(x) = y. Any of the objects can be composites.

In this case, x is your database, y is your database augmented by an additional variable an intercept value from a regression model. Both x and y are data framesâ€”each contains observations of an object of interest, `crsp_fundno` arranged row-wise and containing variables, some of which will be used as arguments to `lm`, which will return an object of class `lm`, call it `fit`, containing the value of interest, the intercept, `fit\$coefficients[1]`.

Using these pieces we can construct f.

The first thing to note is that functions are first-class objects, which means that they can be given as arguments to other functions. It is convenient to work inside outwards and to create an auxiliary function:

``````get_intercept <- function(x) {
(lm(mretFFr\$mexret ~ Mkt_RF + SMB + HML,
data = your_data[x,]))\$coefficients[1]
}
``````

NB: variable names cannot contain blanks or operators; Mkt-RF changed to Mkt_RF. Also, we would normally parameterize `your_data` and the other arguments, rather than hardwiring them.

`get_intercept` takes an argument, `x` (the `crsp_fundno` of interest, distinct from the nomenclature for the formal object x) and returns the value of a linear regression's intercept coefficient, which is the desired portion of `fit` to add to each selected `crsp_fundno`.

Thus

``````get_intercept(64487)
``````

will return the value for the intercept to be placed, FamaFrench3-factor alpha, which I'll call `ff3fa`. It would be best for this new variable to be provisioned beforehand.

``````your_database[,"ff3fa"] <- NA
``````

Another helper function will make the placement

``````place_intercept <- function(x) your_data[x,"ff3fa"] = get_intercept(x)
``````

We now have a way to place a single `crsp_fundno` into y

``````place_intercept(64487)
``````

An auxiliary object, `fund_list` can be used to identify the specific `crsp_fundno` to be so processed.

``````fund_list <- c(
97403,62638,98168,92509,93172,69885,87073,51929,
81727,64998,68432,87733,78200,92599,59821,59391,
51450,56856,94761,65606,60274,94622,50572,65734,
91201,59542,72588,87752,97495,62544,90312,81084,
83960,84608,70966,80280,74213,98558,66360,61703,
96572,98795,71403,94230,90321,81786,85710,92169
)
``````

From there

``````lapply(fund_list, place_intercept)
``````

which leads to f and its application

``````add_intercepts <- function(x) lapply(x, place_intercept)
``````

See the FAQ: How to do a minimal reproducible example `reprex` for beginners to illuminate why the specific code may not be reliable in the absence of a representative data object on which to test. Also, I express no opinion as to the appropriateness of any intended application of the intercept in this case.

1 Like

Dear technocrat,

thank you very much for your detailed answer, this is helping me a lot

If i run the code until
`get_intercept(64487)`
i get the following error code:
Error in model.frame.default(formula = mretFFr\$mexret ~ mretFFr\$Mkt_ :
variable lengths differ (found for 'RF')

Do you know what i have to change or is this not possible without the data?

And can you explain how the fund_list works?

``````<- c(
97403,62638,98168,92509,93172,69885,87073,51929,
81727,64998,68432,87733,78200,92599,59821,59391,
51450,56856,94761,65606,60274,94622,50572,65734,
91201,59542,72588,87752,97495,62544,90312,81084,
83960,84608,70966,80280,74213,98558,66360,61703,
96572,98795,71403,94230,90321,81786,85710,92169
)
``````

as in my case it are ~50000 entries, so i cant type them right?

Thanks again!

Second question is easier

``````fund_list <- your_data\$crsp_fundnop
``````

First one: did you

``````your_database[,"ff3fa"] <- NA
``````

first?

Yes, i used this line of code before

Okay i understand the second question, thank you

For the first one, yes i used that line of code before
Gives me the mentioned error

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.