I can't answer with confidence simply by reading this explainer.
A reprex
(see the FAQ) would be helpful.
The problem that felm()
addresses is that an lm()
model in the form
lm(y ~ x1+x2+x3 + f1+f2+f3)
where f1,f2,f3 are arbitrary factors, and x1,x2,x3 are other covariates
that performs satisfactorily when the number of factor levels is not large may not when the number of levels is large because of collinearities between factors and other covariants. When modeling a high-N
model with a number of levels equal to the number of subjects (observations) in a large dataset, for example, neither lm()
nor sparse matrix approaches in {Matrix}
are computationally feasible. That implies that felm()
may not be suitable for datasets with a relatively small number of levels in factors.
The case of a single-factor model, likewise, does not appear to call for felm()
as the factor can be eliminated through the within groups transformation. It is the case with two or more factors in the presence of non-factor covariates that felm()
is intended to address. It does so through "projecting" out the factor with the highest number of levels, coding the others as dummy variables. As can be seen in the following reprex
the effect is to omit coefficients for factor (categorical) variables from the model , leaving only the non-factor covariates. Compared to the full model, the projected model has only as many coefficients as the non-factor variables, corresponding to fewer degrees of freedom in equal measure.
library(lfe)
#> Loading required package: Matrix
## Simulate data
set.seed(42)
n <- 1e3
d <- data.frame(
# Covariates
x1 = rnorm(n),
x2 = rnorm(n),
# Individuals and firms
id = factor(sample(20, n, replace = TRUE)),
firm = factor(sample(13, n, replace = TRUE)),
# Noise
u = rnorm(n)
)
# Effects for individuals and firms
id.eff <- rnorm(nlevels(d$id))
firm.eff <- rnorm(nlevels(d$firm))
# Left hand side
d$y <- d$x1 + 0.5 * d$x2 + id.eff[d$id] + firm.eff[d$firm] + d$u
## Estimate the model and print the results
est <- felm(y ~ x1 + x2 | id + firm, data = d)
summary(est)
#>
#> Call:
#> felm(formula = y ~ x1 + x2 | id + firm, data = d)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.3751 -0.6768 0.0088 0.6883 2.7803
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1 1.04326 0.03228 32.32 <2e-16 ***
#> x2 0.49041 0.03254 15.07 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.005 on 966 degrees of freedom
#> Multiple R-squared(full model): 0.7539 Adjusted R-squared: 0.7455
#> Multiple R-squared(proj model): 0.5696 Adjusted R-squared: 0.5549
#> F-statistic(full model):89.69 on 33 and 966 DF, p-value: < 2.2e-16
#> F-statistic(proj model): 639.2 on 2 and 966 DF, p-value: < 2.2e-16
# Compare with lm
summary(lm(y ~ x1 + x2 + id + firm - 1, data = d))
#>
#> Call:
#> lm(formula = y ~ x1 + x2 + id + firm - 1, data = d)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.3751 -0.6768 0.0088 0.6883 2.7803
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1 1.04326 0.03228 32.319 < 2e-16 ***
#> x2 0.49041 0.03254 15.072 < 2e-16 ***
#> id1 3.74166 0.17650 21.199 < 2e-16 ***
#> id2 0.96200 0.17927 5.366 1.01e-07 ***
#> id3 1.02686 0.20249 5.071 4.74e-07 ***
#> id4 2.13960 0.17190 12.447 < 2e-16 ***
#> id5 1.12131 0.17503 6.406 2.32e-10 ***
#> id6 0.85863 0.18845 4.556 5.87e-06 ***
#> id7 0.85256 0.17839 4.779 2.03e-06 ***
#> id8 1.25744 0.18396 6.835 1.45e-11 ***
#> id9 -0.95332 0.19765 -4.823 1.64e-06 ***
#> id10 0.50332 0.18943 2.657 0.008014 **
#> id11 1.29660 0.18697 6.935 7.44e-12 ***
#> id12 2.00367 0.17489 11.457 < 2e-16 ***
#> id13 -0.02849 0.20090 -0.142 0.887257
#> id14 0.66788 0.18563 3.598 0.000337 ***
#> id15 -0.07461 0.17510 -0.426 0.670153
#> id16 1.51743 0.17799 8.525 < 2e-16 ***
#> id17 2.10649 0.18372 11.466 < 2e-16 ***
#> id18 1.18966 0.17464 6.812 1.69e-11 ***
#> id19 1.34483 0.18893 7.118 2.13e-12 ***
#> id20 -1.20084 0.18328 -6.552 9.21e-11 ***
#> firm2 -1.50725 0.17093 -8.818 < 2e-16 ***
#> firm3 -1.87472 0.17236 -10.877 < 2e-16 ***
#> firm4 -1.24848 0.16611 -7.516 1.29e-13 ***
#> firm5 -0.74181 0.15959 -4.648 3.81e-06 ***
#> firm6 0.11010 0.16544 0.665 0.505893
#> firm7 -1.01232 0.16797 -6.027 2.37e-09 ***
#> firm8 -2.48896 0.16741 -14.868 < 2e-16 ***
#> firm9 -1.52025 0.16137 -9.421 < 2e-16 ***
#> firm10 -1.31793 0.15813 -8.334 2.66e-16 ***
#> firm11 -1.14281 0.15977 -7.153 1.68e-12 ***
#> firm12 -0.60866 0.17645 -3.449 0.000586 ***
#> firm13 -1.28568 0.16513 -7.786 1.78e-14 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.005 on 966 degrees of freedom
#> Multiple R-squared: 0.7542, Adjusted R-squared: 0.7455
#> F-statistic: 87.17 on 34 and 966 DF, p-value: < 2.2e-16
Created on 2023-05-23 with reprex v2.0.2