creating variable in a pseries object (package plm)

Hello, i'm having trouble with the plm package. I am trying to create predictors for further analysis on political elections data. When i'm estimating my exogenous variables my code works well but when i am trying to verify them to separate them, and even to get basic summary information, the class of the object seems to mess things up. Here is a reproductible example with the data from the plm package :

library(plm)
#> Warning: le package 'plm' a été compilé avec la version R 4.0.5
data("Cigar")
Cigar<-pdata.frame(Cigar)
Cigar$salesbypop<-NA
for (i in 1:nrow(Cigar)) {
  if (Cigar$pop[i]>3500) {
    Cigar$salesbypop[i]<-Cigar$sales[i]/Cigar$pop[i]
  }
}
sum(Cigar$salesbypop>0.03)
#> [1] NA
Cigar[Cigar$salesbypop>0.03,]
#>        state year price    pop  pop16   cpi       ndi sales pimin salesbypop
#> NA      <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA
#> NA.1    <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA
#> NA.2    <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA
#> NA.3    <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA
#> NA.4    <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA
#> 1-74       1   74  43.1 3574.0 2573.9  49.3  3718.867 108.2  41.4 0.03027420
#> 1-75       1   75  46.6 3614.0 2623.7  53.8  4087.993 111.7  43.0 0.03090758
#> 1-76       1   76  50.4 3657.0 2677.4  56.9  4486.772 116.2  46.4 0.03177468
#> 1-77       1   77  50.1 3690.0 2719.6  60.6  4899.866 117.1  48.8 0.03173442
#> 1-78       1   78  55.1 3728.0 2764.6  65.2  5450.998 123.0  53.6 0.03299356
#> 1-79       1   79  56.8 3769.0 2810.7  72.6  5957.141 121.4  56.5 0.03221014
#> 1-80       1   80  60.6 3894.0 2898.9  82.4  6466.350 123.2  59.3 0.03163842
#> 1-81       1   81  68.8 3917.0 2924.7  90.9  7042.023 119.6  62.6 0.03053357
#> 1-82       1   82  73.1 3943.0 2953.5  96.5  7505.220 119.1  67.8 0.03020543
#> NA.5    <NA> <NA>    NA     NA     NA    NA        NA    NA    NA         NA



class(Cigar$salesbypop)
#> [1] "pseries" "logical"

Cigar$test<-as.numeric(Cigar$salesbypop)
sum(Cigar$test>0.03)
#> [1] NA

summary(Cigar$test)
#> total sum of squares: 0.06052067 
#>         id       time 
#> 0.91745578 0.02537065

summary(as.numeric(Cigar$salesbypop))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>  0.0022  0.0116  0.0215  0.0203  0.0273  0.0609     748

So I loaded the data and procede to a silly example of code that estimate a potential explicative variable with a similar way than what i'm doing in my original code. Then when i try to know how many case have a salesbypop greater than 0.03 the function return NA when in fact there should be quite a few. The next code line is something I do quite often to print the rows that i'm interested in but there it returns a quantity of NA too. Underneath I printed the class of the column which is logical. So i thought this was the problem and try as numeric but no result. I think it might be because of the way i set up things with my initialization of the salesbypop column. is there a better way to do it ?

sum has an optional parameter to allow you to remove NA if that is appropriate.
The default to include NA is sensible as what is the sum, of 1,2 and a mystery number ? the sum is unkown...
if we want to ignore unknown quantities, then we can do like so

sum(c(1,2,NA))

sum(c(1,2,NA),na.rm=TRUE)

you can also replace the above code with

indx <- Cigar$pop>3500
Cigar$salesbypop[indx ]<-Cigar$sales[indx ]/Cigar$pop[indx ]

thank you, I didn't know about this way of writing it

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.