A weird problem in plm() function

Hello! I had a weird problem in plm() function. Below is the code:

library(data.table)
library(tidyverse)
library(plm)


#Data Generation
n <- 500
set.seed(75080)

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 50
y   <- -100*z+ 1100 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 80
y   <- -80*z+ 1200 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 30
y   <- -120*z+ 1000 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)


# Model 
dtable_p <- pdata.frame(dtable, index = "group")

mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")

Error in [.data.frame (x, , which) : undefined columns selected

Usually it is no need to convert data set into data.frame in plm() function. But I don't know why it doesn't work only for this data set. I tested for other data sets, all works except this manually generated data. Thank you!

I can speak to your other data sets, but

str(dtable_p)

reveals quite a different structure from a plain data frame. Can you post a reprex with another pdata.frame argument?

Hello,

Sorry I am not familiar with reprex package. But I will learn...:rofl:

I guess you want to see another data set which has same structure with my manually generated data set but works for plm(), code below:

data("Grunfeld", package = "plm") 
class(Grunfeld)

#convert to panel data frame 
pgrun <- pdata.frame(Grunfeld, index = 'firm')
class(pgrun) 

# randomly run a pooled OLS 
tst_mod <- plm(capital ~ value, data = pgrun, model = "pooling" )
summary(tst_mod) 

It works. I still can not figure out what is wrong with the manually generated data set.

1 Like

Great! We've shot down my theory about plm not liking class pdata.frame, proving you're right about this being weird.

Even weirder, is that coercing dtable_p into a simple data frame somehow works, at least to the extent of being able to find the variables.

> df_ver <- as.data.frame(dtable_p)
> head(df_ver)
  id  sat income group time
1  1 1100  48.22     1    1
2  2 1130  47.74     1    2
3  3  990  49.78     1    3
4  4 1330  42.27     1    4
5  5 1300  35.66     1    5
6  6 1170  47.57     1    6
> tst_mod <- plm(sat ~ income, data = df_ver, model = "pooling" )
Warning in model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
Warning in Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
Warning in Ops.factor(r, 2) : ‘^’ not meaningful for factors
> tst_mod

Model Formula: sat ~ income

Coefficients:
(Intercept)      income 
     29.127       0.279 

It should be possible, according to the documentation to start with a data frame, such as Grunfeld , which will be silently converted to a pdata.frame or to use a pdata.frame directly.

Why it isn't I frankly can't tell, and I'd suggest an appeal to the maintainer, Yves Croissant <yves.croissant at univ-reunion.fr>

1 Like

Thank you! Yes, I got the same result.

If you run:

summary(tst_mod)  

Error in cor(y, haty) : 'x' must be numeric :dizzy_face:

Anyway ,I will try to ask the author of this package if no solutions from other source.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.