Combine rows where columns follow different functions

I want to combine rows based on id and treatment variables.
Each column should follow different formulas.
For example: let us set sample as n, mean as m and standard deviation as sd

$$
Sample = N_{1}+N_{2}
$$

$$
Mean = \frac{N_{1} M_{1}+N_{2} M_{2}}{N_{1}+N_{2}}
$$

$$
SD = \sqrt{\frac{\left(N_{1}-1\right) \mathrm{SD}_{1}^{2}+\left(N_{2}-1\right) \mathrm{SD}_{2}^{2}+\frac{N_{1} N_{2}}{N_{1}+N_{2}}\left(M_{1}^{2}+M_{2}^{2}-2 M_{1} M_{2}\right)}{N_{1}+N_{2}-1}}
$$
  

I tried to use the dplyr group_by and summarise functions but ended without luck.
Is there any better way to do this at once?

Here is reproducible data:

df <- read.table(textConnection('
id treatment mean std sample
1	A	10	2	20
1	B	11	3	21
1	A	12	4	22
2	A	8	5	30
2	B	9	6	31
2	A	9	4	32
3	A	9	3	40
3	B	7	4	41
3	A	6	2	42
4	A	10	3	50
4	B	8	2	51
4	A	6	4	52
'), header=TRUE)

Hi @teketo,
Does this help?

df <- read.table(textConnection('
id treatment mean std sample
1   A   10  2   20
1   B   11  3   21
1   A   12  4   22
2   A   8   5   30
2   B   9   6   31
2   A   9   4   32
3   A   9   3   40
3   B   7   4   41
3   A   6   2   42
4   A   10  3   50
4   B   8   2   51
4   A   6   4   52
'), header=TRUE)

suppressPackageStartupMessages(library(tidyverse))

df %>%
  group_by(id, treatment) %>% 
  mutate(scenario = seq(1:n())) %>% 
  ungroup() %>% 
  pivot_wider(id_cols=c(id, treatment),
              values_from=c(mean, std, sample),
              names_from=scenario) %>% 
  mutate(sample_c = sample_1 + sample_2,
         mean_c = ((sample_1 * mean_1) + (sample_2 * mean_2))/sample_c)
#> # A tibble: 8 x 10
#>      id treatment mean_1 mean_2 std_1 std_2 sample_1 sample_2 sample_c mean_c
#>   <int> <chr>      <int>  <int> <int> <int>    <int>    <int>    <int>  <dbl>
#> 1     1 A             10     12     2     4       20       22       42  11.0 
#> 2     1 B             11     NA     3    NA       21       NA       NA  NA   
#> 3     2 A              8      9     5     4       30       32       62   8.52
#> 4     2 B              9     NA     6    NA       31       NA       NA  NA   
#> 5     3 A              9      6     3     2       40       42       82   7.46
#> 6     3 B              7     NA     4    NA       41       NA       NA  NA   
#> 7     4 A             10      6     3     4       50       52      102   7.96
#> 8     4 B              8     NA     2    NA       51       NA       NA  NA

Created on 2021-09-07 by the reprex package (v2.0.1)
I'll leave you to code the combined standard deviation column. Also, not sure what you want to happen when there is only one instance of id/treatment as this results in NAs.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.