Strange error from group_map

I'm using group_map with geepack::geeglm. I would like my formula argument to be a variable (for reasons not included here) but I find that when I try to do so the code throws an error -- unexpectedly to me. I believe there is an issue with group_map here due to my Example 3 below in my example below. Any thoughts would be appreciated. Thank you!

library(tidyverse)
library(geepack)

# Create fake data
set.seed(1)
foo = 
  tibble(group = rep(1:2, each = 50), 
         cluster_id = sample(3, size = 100, replace = T), 
         outcome = rbinom(100, 1, 0.5), 
         treatment = rbinom(100, 1, 0.5), 
         weights = runif(100)) %>%
  arrange(group, cluster_id)

# Example 1: when I explicitly write the formula, everything works
foo  %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = outcome ~ treatment,
             data = .x,
             id = cluster_id,
             family = "binomial",
             weights = .x[["weights"]]))

# Example 2: but the code throws an error when I try to input a pre-created formula
geeglm_formula = as.formula("outcome ~ treatment")
foo  %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = geeglm_formula,
             data = .x,
             id = cluster_id,
             family = "binomial",
             weights = .x[["weights"]]))
# Error in eval(extras, data, env) : object '.x' not found

# Example 3: but when I drop the weights argument, it works, even
# with the pre-calculated formula
foo %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = geeglm_formula,
             data = .x,
             id = cluster_id,
             family = "binomial"))

Hi @PBoonstra,

Thanks for the full example. Would you mind just running it through as a reprex? I'm pretty sure it should work as-is, just asking on the off chance that the problem somehow becomes self-evident through error messages, etc.

I don't think you'll need this, but there's a nice FAQ on how to do a minimal reprex for beginners, below:

Hi @mara, sorry about that oversight. Below is the output as a reprex.

library(tidyverse)
library(geepack)

# Create fake data
set.seed(1)
foo = 
  tibble(group = rep(1:2, each = 50), 
         cluster_id = sample(3, size = 100, replace = T), 
         outcome = rbinom(100, 1, 0.5), 
         treatment = rbinom(100, 1, 0.5), 
         weights = runif(100)) %>%
  arrange(group, cluster_id)

# Example 1: when I explicitly write the formula, everything works
foo  %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = outcome ~ treatment,
             data = .x,
             id = cluster_id,
             family = "binomial",
             weights = .x[["weights"]]))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!

#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> [[1]]
#> 
#> Call:
#> geeglm(formula = outcome ~ treatment, family = "binomial", data = .x, 
#>     weights = .x[["weights"]], id = cluster_id)
#> 
#> Coefficients:
#> (Intercept)   treatment 
#>   0.1586733   0.7470728 
#> 
#> Degrees of Freedom: 50 Total (i.e. Null);  48 Residual
#> 
#> Scale Link:                   identity
#> Estimated Scale Parameters:  [1] 1
#> 
#> Correlation:  Structure = independence  
#> Number of clusters:   3   Maximum cluster size: 19 
#> 
#> 
#> [[2]]
#> 
#> Call:
#> geeglm(formula = outcome ~ treatment, family = "binomial", data = .x, 
#>     weights = .x[["weights"]], id = cluster_id)
#> 
#> Coefficients:
#> (Intercept)   treatment 
#>   0.5988678  -0.3821662 
#> 
#> Degrees of Freedom: 50 Total (i.e. Null);  48 Residual
#> 
#> Scale Link:                   identity
#> Estimated Scale Parameters:  [1] 1
#> 
#> Correlation:  Structure = independence  
#> Number of clusters:   3   Maximum cluster size: 18

# Example 2: but the code throws an error when I try to input a pre-created formula
geeglm_formula = as.formula("outcome ~ treatment")
foo  %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = geeglm_formula,
             data = .x,
             id = cluster_id,
             family = "binomial",
             weights = .x[["weights"]]))
#> Error in eval(extras, data, env): object '.x' not found

# Example 3: but when I drop the weights argument, it works, even
# with the pre-calculated formula
foo %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = geeglm_formula,
             data = .x,
             id = cluster_id,
             family = "binomial"))
#> [[1]]
#> 
#> Call:
#> geeglm(formula = geeglm_formula, family = "binomial", data = .x, 
#>     id = cluster_id)
#> 
#> Coefficients:
#> (Intercept)   treatment 
#>  -0.1823216   1.0577903 
#> 
#> Degrees of Freedom: 50 Total (i.e. Null);  48 Residual
#> 
#> Scale Link:                   identity
#> Estimated Scale Parameters:  [1] 1
#> 
#> Correlation:  Structure = independence  
#> Number of clusters:   3   Maximum cluster size: 19 
#> 
#> 
#> [[2]]
#> 
#> Call:
#> geeglm(formula = geeglm_formula, family = "binomial", data = .x, 
#>     id = cluster_id)
#> 
#> Coefficients:
#>  (Intercept)    treatment 
#>  0.325422400 -0.006968669 
#> 
#> Degrees of Freedom: 50 Total (i.e. Null);  48 Residual
#> 
#> Scale Link:                   identity
#> Estimated Scale Parameters:  [1] 1
#> 
#> Correlation:  Structure = independence  
#> Number of clusters:   3   Maximum cluster size: 18

Created on 2021-05-10 by the reprex package (v2.0.0)

1 Like

No problem. I've asked a colleague for an assist—I'm guessing it has to do with data masking. Diff TZ, but should be able to help once they get a chance to take a look.

1 Like

This really isn't a problem with group_map, this really has more to do with how geeglm treats the weights= parameter. The geeglm function actually uses non-standard evaluation for weights=. This means that it will try to resolve the variable on it's own using the data= environment and the environment from the formula (formulas retain a reference to the environment where they were created). When the formula is created in the mapping function, that environment can see the .x variable, but when the formula is created outside the map, the global environment does not know what .x is.

The easiest workaround is to actually use just the column name and let the geeglm function evaluate it on it's own using the data= parameter. You can do

geeglm_formula = as.formula("outcome ~ treatment")
foo  %>%
  group_by(group) %>%
  group_map(
    ~ geeglm(formula = geeglm_formula,
             data = .x,
             id = cluster_id,
             family = "binomial",
             weights = weights))

This will work because foo contains a column named weights.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.