I am trying to calculate pooled standard deviations across successive groups. I can get the means and standard deviations per group with summarize
. I can also get simple differences between successive groups on the summary table with a lag
.
library(tidyverse)
data = tibble(
groups = c(1,1,2,2,3,3),
vals = (c(1,2,10,20,100,200))
) %>%
group_by(groups)
summary = data %>%
summarize(mean = mean(vals),
sd = sd(vals)) %>%
mutate(mean_diff = mean-lag(mean))
Ideally, the solution will get the successive pooled standard deviations to the summary table included above.The pooled standard deviations I would need (from the example data) are:
#Group 1 and 2 pooled
sd(data %>% filter(between(groups,1,2)) %>% pull(vals))
#Group 2 and 3 pooled
sd(data %>% filter(between(groups,2,3)) %>% pull(vals))
It seems like the data should be nested in some manner and the standard deviation could be obtained on the nested data. However, the data for my middle groups (in this case group 2) need to be nested more than once because these groups go into more than one pooled standard deviation. This type of multiple nesting does not seem possible.
Any suggestions are greatly appreciated.