Running multiple pglm models using purrr::map()

I'm in the process of fitting multiple conditional (fixed-effects) Poisson models using the pglm package in R but I've not been able to combine it with purrr:map().

Running standard Poisson models using glm() in combination with purrr::map() works fine:

library(tidyverse)
library(haven)

ships <- read_dta("http://www.stata-press.com/data/r13/ships.dta") %>%
  mutate(lnservice=log(service)) %>%
  filter(yr_con<3)

fitStandardPoisson <- function(x) {
  glm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,
      family = "poisson", data = x) 
}

resPoisson <- ships %>%
  nest(-yr_op) %>%
  mutate(model = map(data, fitStandardPoisson)) 

When I attempt to fit equivalent fixed-effect models:

library(pglm)

fitFEPoisson <- function(x) {
  pglm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,
       family = poisson, data = x, effect = "individual", model="within", index = "ship")
}

resFEPoisson <- ships %>%
  nest(-yr_op) %>%
  mutate(model = map(data, fitFEPoisson)) 

I get the following error message:

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'x' not found.

The traceback wasn't particularly informative to me:

> traceback()
12: stop(list(message = "Evaluation error: object 'x' not found.", 
        call = mutate_impl(.data, dots), cppstack = NULL))
11: mutate_impl(.data, dots)
10: mutate.tbl_df(., model = map(data, fitFEPoisson))
9: mutate(., model = map(data, fitFEPoisson))
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: ships %>% nest(-yr_op) %>% mutate(model = map(data, fitFEPoisson))
> 
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pglm_0.2-1       plm_1.7-0        Formula_1.2-3    maxLik_1.3-4     miscTools_0.6-22 bindrcpp_0.2.2   haven_2.0.0      forcats_0.3.0    stringr_1.3.1    dplyr_0.7.8      readr_1.3.1      tidyr_0.8.2      tibble_2.0.1     ggplot2_3.1.0   
[15] tidyverse_1.2.1  purrr_0.3.2     

loaded via a namespace (and not attached):
 [1] statmod_1.4.30   zoo_1.8-4        tidyselect_0.2.5 lattice_0.20-38  colorspace_1.4-0 generics_0.0.2   yaml_2.2.0       rlang_0.3.1      pillar_1.3.1     glue_1.3.0       withr_2.1.2      modelr_0.1.3     readxl_1.3.0     bindr_0.1.1     
[15] plyr_1.8.4       munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      bdsmatrix_1.3-3  lmtest_0.9-36    curl_3.3         broom_0.5.1      Rcpp_1.0.0       scales_1.0.0     backports_1.1.3  jsonlite_1.6     hms_0.4.2       
[29] stringi_1.2.4    grid_3.5.2       cli_1.0.1        tools_3.5.2      sandwich_2.5-0   magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.2  MASS_7.3-51.1    xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.0 httr_1.4.0      
[43] rstudioapi_0.9.0 R6_2.3.0         nlme_3.1-137     compiler_3.5.2  

Any solutions or suggestions are highly appreciated.

I don't know how helpful this is, but maybe it can send you in a different direction... I don't think purrr is the issue here at all.

When I try to run your function on its own, I get the same error.

library(dplyr, warn.conflicts = FALSE)
library(haven)
library(pglm)

ships <- read_dta("http://www.stata-press.com/data/r13/ships.dta") %>%
  mutate(lnservice=log(service)) %>%
  filter(yr_con<3)

# Runs fine outside of function definition

pglm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,
     family = poisson, 
     data = ships %>% filter(yr_op == 1), 
     effect = "individual", model="within", index = "ship")
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 7 iterations
#> Return code 1: gradient close to zero
#> Log-Likelihood: -3.460666 (5 free parameter(s))
#> Estimate(s): 0 1.390816 0 0 2.264595

# Function definition runs into issues!

fitFEPoisson <- function(x) {
  pglm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,
       family = poisson, 
       data = x, effect = "individual", model="within", index = "ship")
}

fitFEPoisson(x = ships %>% filter(yr_op == 1))
#> Error in pglm(formula = accident ~ op_75_79 + co_65_69 + co_70_74 + co_75_79 + : object 'x' not found

So something in actually creating a function with pglm is going wrong.

2 Likes

Thank you, @sharlagelfand. I figured that it was something along those lines as the glm() variant worked out. I have contacted the maintainer of the package but it looks like that I will have to use for loops in the meantime.

I've tried debugging the pglm function itself and am totally out of my depth on it -- something about pglm:::starting.values() within it is causing the function to restart and when it does, the scoping is such that it can't find the object x. i hope you hear back!

1 Like

That makes a lot of sense, actually. Thank you for taking the time and looking further into this.

1 Like

This led me to reading more of Advanced R and I think I've found a solution...

An option is, within your function, to set x to the global environment using super assignment.

library(dplyr, warn.conflicts = FALSE)
library(haven)
library(pglm)
library(purrr)
library(tidyr)

ships <- read_dta("http://www.stata-press.com/data/r13/ships.dta") %>%
  mutate(lnservice=log(service)) %>%
  filter(yr_con<3)

fitFEPoisson <- function(x) {
  x <<- x
  pglm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,
       family = poisson, data = x, effect = "individual", model="within", index = "ship")
}

resFEPoisson <- ships %>%
  nest(-yr_op) %>%
  mutate(model = map(data, fitFEPoisson)) 

resFEPoisson
#> # A tibble: 2 x 3
#>   yr_op data              model       
#>   <dbl> <list>            <list>      
#> 1     1 <tibble [10 × 9]> <S3: maxLik>
#> 2     2 <tibble [10 × 9]> <S3: maxLik>

Because x is created in the global environment each time the function is run, when starting.values() is run and pglm is started over, it's still able to find x!

8 Likes

You're an absolute star! It worked like a charm. Many thanks for this. I will make sure to read up on the more advanced features.

my pleasure! figured if i sent you here to post your question, i should at least try to help you out :slight_smile:

3 Likes

Seems like this was answered but you may want to wrap the pglm function in map :: partial().

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.