Hello! I had a weird problem in plm() function. Below is the code:
library(data.table)
library(tidyverse)
library(plm)
#Data Generation
n <- 500
set.seed(75080)
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 50
y <- -100*z+ 1100 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 80
y <- -80*z+ 1200 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 30
y <- -120*z+ 1000 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
# Model
dtable_p <- pdata.frame(dtable, index = "group")
mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
Error in [.data.frame (x, , which) : undefined columns selected
Usually it is no need to convert data set into data.frame in plm() function. But I don't know why it doesn't work only for this data set. I tested for other data sets, all works except this manually generated data. Thank you!
Sorry I am not familiar with reprex package. But I will learn...
I guess you want to see another data set which has same structure with my manually generated data set but works for plm(), code below:
data("Grunfeld", package = "plm")
class(Grunfeld)
#convert to panel data frame
pgrun <- pdata.frame(Grunfeld, index = 'firm')
class(pgrun)
# randomly run a pooled OLS
tst_mod <- plm(capital ~ value, data = pgrun, model = "pooling" )
summary(tst_mod)
It works. I still can not figure out what is wrong with the manually generated data set.
Great! We've shot down my theory about plm not liking class pdata.frame, proving you're right about this being weird.
Even weirder, is that coercing dtable_p into a simple data frame somehow works, at least to the extent of being able to find the variables.
> df_ver <- as.data.frame(dtable_p)
> head(df_ver)
id sat income group time
1 1 1100 48.22 1 1
2 2 1130 47.74 1 2
3 3 990 49.78 1 3
4 4 1330 42.27 1 4
5 5 1300 35.66 1 5
6 6 1170 47.57 1 6
> tst_mod <- plm(sat ~ income, data = df_ver, model = "pooling" )
Warning in model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
Warning in Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
Warning in Ops.factor(r, 2) : ‘^’ not meaningful for factors
> tst_mod
Model Formula: sat ~ income
Coefficients:
(Intercept) income
29.127 0.279
It should be possible, according to the documentation to start with a data frame, such as Grunfeld , which will be silently converted to a pdata.frame or to use a pdata.frame directly.
Why it isn't I frankly can't tell, and I'd suggest an appeal to the maintainer, Yves Croissant <yves.croissant at univ-reunion.fr>