Interpreting results from pdynmc package

technocrat · September 7, 2023, 3:13am

There don't seem to be many people participating in this community who work in econometrics, so you may get more specific advice asking elsewhere. For anyone working in general data science or statistics my best advice is to flee from "panel data", the econometrician term for what is known otherwise as a "time series."

My own take after reading the release paper pdynmc: A Package for Estimating Linear Dynamic Panel Data Models Based on Nonlinear Moment Conditions, I'm quite skeptical based on their use of

data(EmplUK, package = "plm")

to validate their model.

This is a partially incomplete ("unbalanced") collection of time series of economic measures of output, employment, wages and capital from 140 companies in 9 sectors over 8 years.

Let's look at one of the cases that does have complete data.

library(fpp3)
#> ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
#> ✔ tibble      3.2.1     ✔ tsibble     1.1.3
#> ✔ dplyr       1.1.2     ✔ tsibbledata 0.4.1
#> ✔ tidyr       1.3.0     ✔ feasts      0.3.1
#> ✔ lubridate   1.9.2     ✔ fable       0.3.3
#> ✔ ggplot2     3.4.3     ✔ fabletools  0.3.3
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date()    masks base::date()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval()  masks lubridate::interval()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ tsibble::setdiff()   masks base::setdiff()
#> ✖ tsibble::union()     masks base::union()
data(EmplUK, package = "plm")
summary(EmplUK)
#>       firm            year          sector           emp         
#>  Min.   :  1.0   Min.   :1976   Min.   :1.000   Min.   :  0.104  
#>  1st Qu.: 37.0   1st Qu.:1978   1st Qu.:3.000   1st Qu.:  1.181  
#>  Median : 74.0   Median :1980   Median :5.000   Median :  2.287  
#>  Mean   : 73.2   Mean   :1980   Mean   :5.123   Mean   :  7.892  
#>  3rd Qu.:110.0   3rd Qu.:1981   3rd Qu.:8.000   3rd Qu.:  7.020  
#>  Max.   :140.0   Max.   :1984   Max.   :9.000   Max.   :108.562  
#>       wage           capital            output     
#>  Min.   : 8.017   Min.   : 0.0119   Min.   : 86.9  
#>  1st Qu.:20.636   1st Qu.: 0.2210   1st Qu.: 97.1  
#>  Median :24.006   Median : 0.5180   Median :100.6  
#>  Mean   :23.919   Mean   : 2.5074   Mean   :103.8  
#>  3rd Qu.:27.494   3rd Qu.: 1.5010   3rd Qu.:110.6  
#>  Max.   :45.232   Max.   :47.1079   Max.   :128.4
# arbitrary example
d <- EmplUK[70:77,]
summary(d)
#>       firm            year          sector         emp             wage      
#>  Min.   :10.00   Min.   :1976   Min.   :3.0   Min.   :1.158   Min.   :20.23  
#>  1st Qu.:11.00   1st Qu.:1978   1st Qu.:3.0   1st Qu.:1.242   1st Qu.:22.45  
#>  Median :11.00   Median :1980   Median :3.0   Median :1.342   Median :22.82  
#>  Mean   :10.88   Mean   :1979   Mean   :3.5   Mean   :1.532   Mean   :23.32  
#>  3rd Qu.:11.00   3rd Qu.:1981   3rd Qu.:3.0   3rd Qu.:1.355   3rd Qu.:24.52  
#>  Max.   :11.00   Max.   :1982   Max.   :7.0   Max.   :3.262   Max.   :26.04  
#>     capital           output      
#>  Min.   :0.4099   Min.   : 99.29  
#>  1st Qu.:0.4725   1st Qu.: 99.67  
#>  Median :0.5354   Median :102.38  
#>  Mean   :0.5826   Mean   :103.90  
#>  3rd Qu.:0.5779   3rd Qu.:107.84  
#>  Max.   :1.0958   Max.   :111.56
# pick another because this firm
# changed sectors
# this firm is entirely within the same
d <- EmplUK[15:21,]
summary(d)
#>       firm        year          sector       emp             wage      
#>  Min.   :3   Min.   :1977   Min.   :7   Min.   :16.85   Min.   :20.69  
#>  1st Qu.:3   1st Qu.:1978   1st Qu.:7   1st Qu.:18.64   1st Qu.:21.70  
#>  Median :3   Median :1980   Median :7   Median :19.44   Median :22.69  
#>  Mean   :3   Mean   :1980   Mean   :7   Mean   :19.04   Mean   :23.63  
#>  3rd Qu.:3   3rd Qu.:1982   3rd Qu.:7   3rd Qu.:19.73   3rd Qu.:24.86  
#>  Max.   :3   Max.   :1983   Max.   :7   Max.   :20.24   Max.   :28.91  
#>     capital          output      
#>  Min.   :5.715   Min.   : 95.71  
#>  1st Qu.:6.434   1st Qu.: 97.99  
#>  Median :6.856   Median : 99.56  
#>  Mean   :6.690   Mean   : 98.78  
#>  3rd Qu.:7.022   3rd Qu.: 99.82  
#>  Max.   :7.343   Max.   :100.55
# remove firm and sector variable
# because values are constant
d <- d[,-c(1,3)]
# create a tsibble time series
ds <- as_tsibble(d, index = year)

autoplot(ds,.vars = emp)     + theme_minimal()

autoplot(ds,.vars = wage)    + theme_minimal()

autoplot(ds,.vars = capital) + theme_minimal()

autoplot(ds,.vars = output)  + theme_minimal()


# correlations
GGally::ggpairs(d[,-1])
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2



# fit a time series linear regression model
# fully saturated

fit <- ds |> model(TSLM(output ~ emp + wage + capital))

# results

report(fit)
#> Series: output 
#> Model: TSLM 
#> 
#> Residuals:
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  55.8913    39.5774   1.412    0.253
#> emp           1.4265     1.3052   1.093    0.354
#> wage          0.7101     0.5349   1.327    0.276
#> capital      -0.1579     1.4220  -0.111    0.919
#> 
#> Residual standard error: 1.888 on 3 degrees of freedom
#> Multiple R-squared: 0.3882,  Adjusted R-squared: -0.2236
#> F-statistic: 0.6345 on 3 and 3 DF, p-value: 0.64117
# residuals are normally distributed and
# no autocorrelation in the residuals
fit |> gg_tsresiduals()


# appears to be homoskedastic
ds |>
  left_join(residuals(fit), by = "year") |>
  pivot_longer(emp:capital,
               names_to = "regressor", values_to = "x") |>
  ggplot(aes(x = x, y = .resid)) +
  geom_point() +
  facet_wrap(. ~ regressor, scales = "free_x") +
  labs(y = "Residuals", x = "")

  
augment(fit) |>
  ggplot(aes(x = .fitted, y = .resid)) +
  geom_point() + labs(x = "Fitted", y = "Residuals")


augment(fit) |>
  ggplot(aes(x = year)) +
  geom_line(aes(y = output, colour = "Data")) +
  geom_line(aes(y = .fitted, colour = "Fitted")) +
  labs(y = NULL,
       title = "TSML Model of Output"
  ) +
  scale_colour_manual(values=c(Data="black",Fitted="#D55E00")) +
  guides(colour = guide_legend(title = NULL))


augment(fit) |>
  ggplot(aes(x = output, y = .fitted)) +
  geom_point() +
  labs(
    y = "Fitted (predicted values)",
    x = "Data (actual values)",
    title = "Output"
  ) +
  geom_abline(intercept = 0, slope = 1)

^{Created on 2023-09-06 with reprex v2.0.2}

That's not an encouraging start to putting this together with 139 other firms, many of which are in different sectors.

Without [a reprex (see the FAQ)(FAQ: How to do a minimal reproducible example ( reprex ) for beginners) I can't help you interpret your results, but I can't say I'm surprised that they should appear non-sensical.

Apologies to any econometricians who run across this. I'd rather be shown wrong than believe that panel data as practiced in your field is magical thinking. Convince me?