subset paneldata - error "empty model"

StativBus · June 10, 2023, 10:56pm

Hey everyone,

First of all, I want to say that I am very inexpierienced in both econometric methods as well as the coding. I am currently learning a lot though and would love to really understand what is going on here.

I think I have already gotten to the point that I have figured out what the problem is roughly, but I don't properly understand why that is.

I have a panel data set which contains information about death rates for specific occupations in specific districts in specific years i.e., there are basically three dimensions. Now I am supposed to run fixed effects regressions with district-by-group and district-by-year fixed effects.
Although I could never create something like this on my own, I think that I do understand what this does: the district by occupations fixed effects control for unobserved time-invariant heterogeneity between the occupation groups within districts. That is, it may be that certain occupational groups are just fundamentally different to others in relevant aspects independent of time. They might be more productive, have a higher wage level etc.
The district by year fixed effects control for unobserved time-variant heterogeneity between districts i.e., some shock might affect district A, but not district B.

What I had to estimate here worked after figuring out the code. I created a panel dataset with the appropriate indices and ran the fixed effects regression:

panel_data <- pdata.frame(df, index = c("id_year", "id_occ"))

model <- plm(deaths_tot_pc ~ factor(year) * bluecollar, 
                    data = panel_data,
                    model = "within")

Now in a next step I am supposed to check for heterogeneous effects by evaluating the same effect, but on each group individually and this is where I now struggle.

If I run the same model on a subgroup like occupational group 1, then I get the error message "empty model". This is - according to my research - due to the fact that there is no within variation leading to R dropping the variables and thus an empty model.

How would I have to change the following code to get around this problem? I would love if someone could also explain the statistical reasoning behind this. I cannot figure it out on my own right now...

  subset_data <- subset(panel_data, occ == 1)

  reg <- plm(formula = deaths_tot_pc ~ bluecollar*factor(year), 
            data = subset_data, 
            model = "within")

I will try to create a dataframe tomorrow such that you can replicate the problem.

Thank you in advance!

All the best!

StativBus · June 11, 2023, 7:39am

The following code should create a dataset similar to the original one and then runs the two regressions as described in the initial post:

library(plm)

# Generate the sequence of years, districts, and occupations
years <- 1800:1820
districts <- 900:920
occupations <- 1:14

# Create an empty data frame to store the panel dataset
df <- data.frame()

# Create the panel dataset
for (year in years) {
  for (district in districts) {
    for (occ in occupations) {
      id_year <- as.numeric(paste(district, year, sep = ""))
      id_occ <- as.numeric(paste(sprintf("%03d", district), sprintf("%02d", occ), sep = ""))      
      observation <- data.frame(
        year = year,
        district = district,
        occ = occ,
        deaths_tot_pc = runif(1, min = 0, max = 1),
        id_year = id_year,
        id_occ = id_occ
      )
      df <- rbind(df, observation)
    }
  }
}

panel_data <- pdata.frame(df, index = c("id_year", "id_occ"))

panel_data$bluecollar <- ifelse(panel_data$occ <= 13, 1, 0)

#case 1: regressing on full dataset: works

reg <- plm(formula = deaths_tot_pc ~ bluecollar*factor(year), 
           data = panel_data, 
           model = "within")

summary(reg)

#case 2: regressing on subset: throws error empty set

#can be imiplemented differently but always throws the same error
#creating a subset beforehand does not change anything, the issue lies in my
#understanding of the fixed effects given that the subsetting basically removes 
#a dimension of the panel data

reg <- plm(formula = deaths_tot_pc ~ bluecollar*factor(year), 
           data = panel_data, 
           subset = occ == 1,
           model = "within")

The first regression runs without problems and delivers the right results.
The second is run on a subset of the dataset and then throws the error that the model is empty.

Thank you in advance!

technocrat · June 11, 2023, 7:39pm

subset needs a vector argument, which occ is not

 str(panel_data$occ)
 'pseries' Named int [1:6174] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "names")= chr [1:6174] "9001800-90001" "9001800-90002" "9001800-90003" "9001800-90004" ...
 - attr(*, "index")=Classes ‘pindex’ and 'data.frame':	6174 obs. of  2 variables:
  ..$ id_year: Factor w/ 441 levels "9001800","9001801",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ id_occ : Factor w/ 294 levels "90001","90002",..: 1 2 3 4 5 6 7 8 9 10 ...

StativBus · June 12, 2023, 6:41pm

Hey technocrat,

could you elaborate on what that means for me?
Since the data I provided was just an example it is crucial for me to understand what you mean in order to implement it in my data.

Thank you!

technocrat · June 13, 2023, 7:38am

The error means that you have to pick either the id_year attribute or the id_occ attribute of pseries , not pseries itself.

system · July 25, 2023, 7:39am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.