There don't seem to be many people participating in this community who work in econometrics, so you may get more specific advice asking elsewhere. For anyone working in general data science or statistics my best advice is to flee from "panel data", the econometrician term for what is known otherwise as a "time series."
My own take after reading the release paper pdynmc: A Package for Estimating Linear Dynamic Panel Data Models Based on Nonlinear Moment Conditions, I'm quite skeptical based on their use of
data(EmplUK, package = "plm")
to validate their model.
This is a partially incomplete ("unbalanced") collection of time series of economic measures of output, employment, wages and capital from 140 companies in 9 sectors over 8 years.
Let's look at one of the cases that does have complete data.
library(fpp3)
#> ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
#> ✔ tibble 3.2.1 ✔ tsibble 1.1.3
#> ✔ dplyr 1.1.2 ✔ tsibbledata 0.4.1
#> ✔ tidyr 1.3.0 ✔ feasts 0.3.1
#> ✔ lubridate 1.9.2 ✔ fable 0.3.3
#> ✔ ggplot2 3.4.3 ✔ fabletools 0.3.3
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
data(EmplUK, package = "plm")
summary(EmplUK)
#> firm year sector emp
#> Min. : 1.0 Min. :1976 Min. :1.000 Min. : 0.104
#> 1st Qu.: 37.0 1st Qu.:1978 1st Qu.:3.000 1st Qu.: 1.181
#> Median : 74.0 Median :1980 Median :5.000 Median : 2.287
#> Mean : 73.2 Mean :1980 Mean :5.123 Mean : 7.892
#> 3rd Qu.:110.0 3rd Qu.:1981 3rd Qu.:8.000 3rd Qu.: 7.020
#> Max. :140.0 Max. :1984 Max. :9.000 Max. :108.562
#> wage capital output
#> Min. : 8.017 Min. : 0.0119 Min. : 86.9
#> 1st Qu.:20.636 1st Qu.: 0.2210 1st Qu.: 97.1
#> Median :24.006 Median : 0.5180 Median :100.6
#> Mean :23.919 Mean : 2.5074 Mean :103.8
#> 3rd Qu.:27.494 3rd Qu.: 1.5010 3rd Qu.:110.6
#> Max. :45.232 Max. :47.1079 Max. :128.4
# arbitrary example
d <- EmplUK[70:77,]
summary(d)
#> firm year sector emp wage
#> Min. :10.00 Min. :1976 Min. :3.0 Min. :1.158 Min. :20.23
#> 1st Qu.:11.00 1st Qu.:1978 1st Qu.:3.0 1st Qu.:1.242 1st Qu.:22.45
#> Median :11.00 Median :1980 Median :3.0 Median :1.342 Median :22.82
#> Mean :10.88 Mean :1979 Mean :3.5 Mean :1.532 Mean :23.32
#> 3rd Qu.:11.00 3rd Qu.:1981 3rd Qu.:3.0 3rd Qu.:1.355 3rd Qu.:24.52
#> Max. :11.00 Max. :1982 Max. :7.0 Max. :3.262 Max. :26.04
#> capital output
#> Min. :0.4099 Min. : 99.29
#> 1st Qu.:0.4725 1st Qu.: 99.67
#> Median :0.5354 Median :102.38
#> Mean :0.5826 Mean :103.90
#> 3rd Qu.:0.5779 3rd Qu.:107.84
#> Max. :1.0958 Max. :111.56
# pick another because this firm
# changed sectors
# this firm is entirely within the same
d <- EmplUK[15:21,]
summary(d)
#> firm year sector emp wage
#> Min. :3 Min. :1977 Min. :7 Min. :16.85 Min. :20.69
#> 1st Qu.:3 1st Qu.:1978 1st Qu.:7 1st Qu.:18.64 1st Qu.:21.70
#> Median :3 Median :1980 Median :7 Median :19.44 Median :22.69
#> Mean :3 Mean :1980 Mean :7 Mean :19.04 Mean :23.63
#> 3rd Qu.:3 3rd Qu.:1982 3rd Qu.:7 3rd Qu.:19.73 3rd Qu.:24.86
#> Max. :3 Max. :1983 Max. :7 Max. :20.24 Max. :28.91
#> capital output
#> Min. :5.715 Min. : 95.71
#> 1st Qu.:6.434 1st Qu.: 97.99
#> Median :6.856 Median : 99.56
#> Mean :6.690 Mean : 98.78
#> 3rd Qu.:7.022 3rd Qu.: 99.82
#> Max. :7.343 Max. :100.55
# remove firm and sector variable
# because values are constant
d <- d[,-c(1,3)]
# create a tsibble time series
ds <- as_tsibble(d, index = year)
autoplot(ds,.vars = emp) + theme_minimal()
autoplot(ds,.vars = wage) + theme_minimal()
autoplot(ds,.vars = capital) + theme_minimal()
autoplot(ds,.vars = output) + theme_minimal()
# correlations
GGally::ggpairs(d[,-1])
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
# fit a time series linear regression model
# fully saturated
fit <- ds |> model(TSLM(output ~ emp + wage + capital))
# results
report(fit)
#> Series: output
#> Model: TSLM
#>
#> Residuals:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 55.8913 39.5774 1.412 0.253
#> emp 1.4265 1.3052 1.093 0.354
#> wage 0.7101 0.5349 1.327 0.276
#> capital -0.1579 1.4220 -0.111 0.919
#>
#> Residual standard error: 1.888 on 3 degrees of freedom
#> Multiple R-squared: 0.3882, Adjusted R-squared: -0.2236
#> F-statistic: 0.6345 on 3 and 3 DF, p-value: 0.64117
# residuals are normally distributed and
# no autocorrelation in the residuals
fit |> gg_tsresiduals()
# appears to be homoskedastic
ds |>
left_join(residuals(fit), by = "year") |>
pivot_longer(emp:capital,
names_to = "regressor", values_to = "x") |>
ggplot(aes(x = x, y = .resid)) +
geom_point() +
facet_wrap(. ~ regressor, scales = "free_x") +
labs(y = "Residuals", x = "")
augment(fit) |>
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() + labs(x = "Fitted", y = "Residuals")
augment(fit) |>
ggplot(aes(x = year)) +
geom_line(aes(y = output, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y = NULL,
title = "TSML Model of Output"
) +
scale_colour_manual(values=c(Data="black",Fitted="#D55E00")) +
guides(colour = guide_legend(title = NULL))
augment(fit) |>
ggplot(aes(x = output, y = .fitted)) +
geom_point() +
labs(
y = "Fitted (predicted values)",
x = "Data (actual values)",
title = "Output"
) +
geom_abline(intercept = 0, slope = 1)
Created on 2023-09-06 with reprex v2.0.2
That's not an encouraging start to putting this together with 139 other firms, many of which are in different sectors.
Without [a reprex
(see the FAQ)(FAQ: How to do a minimal reproducible example ( reprex ) for beginners) I can't help you interpret your results, but I can't say I'm surprised that they should appear non-sensical.
Apologies to any econometricians who run across this. I'd rather be shown wrong than believe that panel data as practiced in your field is magical thinking. Convince me?