natep
October 10, 2020, 3:00pm
1
I'm trying to reproduce a graph from Clause Willke's data visualization guide (https://clauswilke.com/dataviz/visualizing-uncertainty.html#fig:cocoa-data-vs-CI ), and the code he wrote is choking for me.
I apologize in advance if this is long, but trying to explain what I think is going on. (Also, you can find the cacao
data at https://github.com/clauswilke/dviz.supp/blob/master/data/cacao.rda.)
Full code chunk:
library(forcats)
library(lubridate)
library(mgcv)
library(tidyr)
library(purrr)
library(broom)
library(emmeans)
library(ungeviz)
library(ggridges)
library(tidybayes)
cacao <- load(file=url("https://github.com/clauswilke/dviz.supp/blob/master/data/cacao.rda"))
cacao %>%
filter(location == "Canada") -> cacao_single
fit <- lm(rating ~ 1, data = cacao_single)
CI_df <- data.frame(type = c(0.8, 0.95, 0.99)) %>%
mutate(df = map(type, ~tidy(emmeans(fit, ~ 1, options = list(level = .x) )))) %>%
unnest(df) %>% select(type, estimate, std.error, conf.low, conf.high) %>%
mutate(type = paste0(signif(100*type, 2), "% confidence interval")) %>%
select(type, estimate, std.error, conf.low, conf.high) %>%
mutate(type = paste0(signif(100*type, 2), "% confidence interval"))
R returns: Error: Can't subset columns that don't exist. x Column `conf.low` doesn't exist.
I've isolated some of this line-by-line, and the "problem" seems to be with the interaction of emmeans
and mutate
. When I use the code above, the returned object lacks the CIs; when I surround emmeans
with a summary()
, I get the CIs, but then I lack the SEs.
natep:
emmeans(fit
Without this object, the problem can't be reproduced. Also, a proper reprex would include
suppressPackageStartupMessages({
library(broom)
library(dplyr)
library(emmeans)
library(purrr)
library(tidyr)})
natep
October 12, 2020, 9:48pm
3
Yep. This is what copy/pasting from another place does to me. Code above edited to be more MWE.
The immediate problem is here
data.frame(type = c(0.8, 0.95, 0.99)) %>%
mutate(df = map(type, ~ tidy(emmeans(fit, ~1, options = list(level = .x))))) %>%
# correct warnings
unnest(c(c(df), df))
which sends on the following object, without the desired columns
# A tibble: 3 x 7
type `1` estimate std.error df statistic p.value
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.8 overall 3.32 0.0379 124 87.7 1.87e-113
2 0.95 overall 3.32 0.0379 124 87.7 1.87e-113
3 0.99 overall 3.32 0.0379 124 87.7 1.87e-113
natep
October 12, 2020, 11:25pm
5
That is the case. I have tried surrounding emmeans() with a summary() statement. That gave the same error, although on different missing columns.
So how do I get the full set of columns? If emmeans were not in this statement, the normal output gives all the desired columns.
tidy.data.frame produces a data frame with one row per original column, containing summary statistics of each:
The argument to tidy
in the reprex
is a data frame, but not of the kind that provides enough to work on.
type df
1 0.80 <S4 class âemmGridâ [package âemmeansâ] with 13 slots>
2 0.95 <S4 class âemmGridâ [package âemmeansâ] with 13 slots>
3 0.99 <S4 class âemmGridâ [package âemmeansâ] with 13 slots>
What is wanted in the return value is in the S4 objects, but buried deeply. The output in the answer above has been assigned to a
> a[[2]][1][[1]]
1 emmean SE df lower.CL upper.CL
overall 3.32 0.0379 124 3.28 3.37
Confidence level used: 0.8
Digging that out is possible, but will probably be tedious.
natep
October 13, 2020, 11:46pm
7
Well, poo.
That does indeed sound tedious, and probably not very portable. Oh, well, then.
1 Like
system
Closed
November 3, 2020, 11:46pm
8
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.