emmeans() and dplyr

I'm trying to reproduce a graph from Clause Willke's data visualization guide (https://clauswilke.com/dataviz/visualizing-uncertainty.html#fig:cocoa-data-vs-CI), and the code he wrote is choking for me.

I apologize in advance if this is long, but trying to explain what I think is going on. (Also, you can find the cacao data at https://github.com/clauswilke/dviz.supp/blob/master/data/cacao.rda.)

Full code chunk:

library(forcats)
library(lubridate)
library(mgcv)
library(tidyr)
library(purrr)
library(broom)
library(emmeans)
library(ungeviz)
library(ggridges)
library(tidybayes)

cacao <- load(file=url("https://github.com/clauswilke/dviz.supp/blob/master/data/cacao.rda"))
cacao %>% 
  filter(location == "Canada") -> cacao_single

fit <- lm(rating ~ 1, data = cacao_single)
CI_df <- data.frame(type = c(0.8, 0.95, 0.99)) %>%   
   mutate(df = map(type, ~tidy(emmeans(fit, ~ 1, options = list(level = .x)   )))) %>%
   unnest(df) %>%   select(type, estimate, std.error, conf.low, conf.high) %>%
   mutate(type = paste0(signif(100*type, 2), "% confidence interval")) %>%
  select(type, estimate, std.error, conf.low, conf.high) %>%
  mutate(type = paste0(signif(100*type, 2), "% confidence interval"))

 R returns: Error: Can't subset columns that don't exist. x Column `conf.low` doesn't exist.

I've isolated some of this line-by-line, and the "problem" seems to be with the interaction of emmeans and mutate. When I use the code above, the returned object lacks the CIs; when I surround emmeans with a summary(), I get the CIs, but then I lack the SEs.

Without this object, the problem can't be reproduced. Also, a proper reprex would include

suppressPackageStartupMessages({
  library(broom)
  library(dplyr)
  library(emmeans)
  library(purrr)
  library(tidyr)})

Yep. This is what copy/pasting from another place does to me. Code above edited to be more MWE.

The immediate problem is here

  data.frame(type = c(0.8, 0.95, 0.99)) %>%
  mutate(df = map(type, ~ tidy(emmeans(fit, ~1, options = list(level = .x))))) %>%
  # correct warnings
  unnest(c(c(df), df))  

which sends on the following object, without the desired columns

# A tibble: 3 x 7
   type `1`     estimate std.error    df statistic   p.value
  <dbl> <chr>      <dbl>     <dbl> <dbl>     <dbl>     <dbl>
1  0.8  overall     3.32    0.0379   124      87.7 1.87e-113
2  0.95 overall     3.32    0.0379   124      87.7 1.87e-113
3  0.99 overall     3.32    0.0379   124      87.7 1.87e-113

That is the case. I have tried surrounding emmeans() with a summary() statement. That gave the same error, although on different missing columns.

So how do I get the full set of columns? If emmeans were not in this statement, the normal output gives all the desired columns.

tidy.data.frame produces a data frame with one row per original column, containing summary statistics of each:

The argument to tidy in the reprex is a data frame, but not of the kind that provides enough to work on.

  type                                                     df
1 0.80 <S4 class ‘emmGrid’ [package “emmeans”] with 13 slots>
2 0.95 <S4 class ‘emmGrid’ [package “emmeans”] with 13 slots>
3 0.99 <S4 class ‘emmGrid’ [package “emmeans”] with 13 slots>

What is wanted in the return value is in the S4 objects, but buried deeply. The output in the answer above has been assigned to a

> a[[2]][1][[1]]
 1       emmean     SE  df lower.CL upper.CL
 overall   3.32 0.0379 124     3.28     3.37

Confidence level used: 0.8 

Digging that out is possible, but will probably be tedious.

Well, poo.

That does indeed sound tedious, and probably not very portable. Oh, well, then.

1 Like