# A weird problem in plm() function

Hello! I had a weird problem in plm() function. Below is the code:

``````library(data.table)
library(tidyverse)
library(plm)

#Data Generation
n <- 500
set.seed(75080)

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 50
y   <- -100*z+ 1100 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 80
y   <- -80*z+ 1200 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 30
y   <- -120*z+ 1000 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)

# Model
dtable_p <- pdata.frame(dtable, index = "group")

mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
``````

Error in `[.data.frame` (x, , which) : undefined columns selected

Usually it is no need to convert data set into data.frame in plm() function. But I don't know why it doesn't work only for this data set. I tested for other data sets, all works except this manually generated data. Thank you!

I can speak to your other data sets, but

``````str(dtable_p)
``````

reveals quite a different structure from a plain data frame. Can you post a `reprex` with another `pdata.frame` argument?

Hello,

Sorry I am not familiar with reprex package. But I will learn... I guess you want to see another data set which has same structure with my manually generated data set but works for plm(), code below:

``````data("Grunfeld", package = "plm")
class(Grunfeld)

#convert to panel data frame
pgrun <- pdata.frame(Grunfeld, index = 'firm')
class(pgrun)

# randomly run a pooled OLS
tst_mod <- plm(capital ~ value, data = pgrun, model = "pooling" )
summary(tst_mod)

``````

It works. I still can not figure out what is wrong with the manually generated data set.

1 Like

Great! We've shot down my theory about `plm` not liking `class pdata.frame`, proving you're right about this being weird.

Even weirder, is that coercing dtable_p into a simple data frame somehow works, at least to the extent of being able to find the variables.

``````> df_ver <- as.data.frame(dtable_p)
id  sat income group time
1  1 1100  48.22     1    1
2  2 1130  47.74     1    2
3  3  990  49.78     1    3
4  4 1330  42.27     1    4
5  5 1300  35.66     1    5
6  6 1170  47.57     1    6
> tst_mod <- plm(sat ~ income, data = df_ver, model = "pooling" )
Warning in model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
Warning in Ops.factor(y, z\$residuals) : ‘-’ not meaningful for factors
Warning in Ops.factor(r, 2) : ‘^’ not meaningful for factors
> tst_mod

Model Formula: sat ~ income

Coefficients:
(Intercept)      income
29.127       0.279
``````

It should be possible, according to the documentation to start with a data frame, such as `Grunfeld` , which will be silently converted to a pdata.frame or to use a pdata.frame directly.

Why it isn't I frankly can't tell, and I'd suggest an appeal to the maintainer, Yves Croissant <yves.croissant at univ-reunion.fr>

1 Like

Thank you! Yes, I got the same result.

If you run:

``````summary(tst_mod)

``````

Error in cor(y, haty) : 'x' must be numeric Anyway ,I will try to ask the author of this package if no solutions from other source.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.