I am trying to calculate pooled standard deviations across successive groups. I can get the means and standard deviations per group with summarize. I can also get simple differences between successive groups on the summary table with a lag.
library(tidyverse)
data = tibble(
groups = c(1,1,2,2,3,3),
vals = (c(1,2,10,20,100,200))
) %>%
group_by(groups)
summary = data %>%
summarize(mean = mean(vals),
sd = sd(vals)) %>%
mutate(mean_diff = mean-lag(mean))
Ideally, the solution will get the successive pooled standard deviations to the summary table included above.The pooled standard deviations I would need (from the example data) are:
#Group 1 and 2 pooled
sd(data %>% filter(between(groups,1,2)) %>% pull(vals))
#Group 2 and 3 pooled
sd(data %>% filter(between(groups,2,3)) %>% pull(vals))
It seems like the data should be nested in some manner and the standard deviation could be obtained on the nested data. However, the data for my middle groups (in this case group 2) need to be nested more than once because these groups go into more than one pooled standard deviation. This type of multiple nesting does not seem possible.
you seem to ask how to calculate something, then you provide code that calculates it. Then your question becomes how to put it in a single table...
you could simply do the latter like this
#Group 1 and 2 pooled
g12_pooled_sd <- sd(data %>% filter(between(groups,1,2)) %>% pull(vals))
#Group 2 and 3 pooled
g23_pooled_sd <- sd(data %>% filter(between(groups,2,3)) %>% pull(vals))
summary$pooled_sd <- c(NA,g12_pooled_sd,g23_pooled_sd)
summary
A tibble: 3 x 5
groups mean sd mean_diff pooled_sd
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1.5 0.707 NA NA
2 2 15 7.07 13.5 8.81
3 3 150 70.7 135 88.1
If you can explain what I might have missed, I can give it another look for you.
Your point is valid, sorry for the lack of clarity on my part. I kept my example data to three groups, just enough to illustrate my question. The solution I provided on calculating the pooled SDs was just to show what the end result would be for those two specific pairwise groups. My real data has a few thousand rows, several dozen groups, and the number of rows per group is variable, so it is impractical to hand calculate every pairwise pooled standard deviation.