I want to combine rows based on id and treatment variables.
Each column should follow different formulas.
For example: let us set sample as n, mean as m and standard deviation as sd
I tried to use the dplyr group_by and summarise functions but ended without luck.
Is there any better way to do this at once?
Here is reproducible data:
df <- read.table(textConnection('
id treatment mean std sample
1 A 10 2 20
1 B 11 3 21
1 A 12 4 22
2 A 8 5 30
2 B 9 6 31
2 A 9 4 32
3 A 9 3 40
3 B 7 4 41
3 A 6 2 42
4 A 10 3 50
4 B 8 2 51
4 A 6 4 52
'), header=TRUE)
df <- read.table(textConnection('
id treatment mean std sample
1 A 10 2 20
1 B 11 3 21
1 A 12 4 22
2 A 8 5 30
2 B 9 6 31
2 A 9 4 32
3 A 9 3 40
3 B 7 4 41
3 A 6 2 42
4 A 10 3 50
4 B 8 2 51
4 A 6 4 52
'), header=TRUE)
suppressPackageStartupMessages(library(tidyverse))
df %>%
group_by(id, treatment) %>%
mutate(scenario = seq(1:n())) %>%
ungroup() %>%
pivot_wider(id_cols=c(id, treatment),
values_from=c(mean, std, sample),
names_from=scenario) %>%
mutate(sample_c = sample_1 + sample_2,
mean_c = ((sample_1 * mean_1) + (sample_2 * mean_2))/sample_c)
#> # A tibble: 8 x 10
#> id treatment mean_1 mean_2 std_1 std_2 sample_1 sample_2 sample_c mean_c
#> <int> <chr> <int> <int> <int> <int> <int> <int> <int> <dbl>
#> 1 1 A 10 12 2 4 20 22 42 11.0
#> 2 1 B 11 NA 3 NA 21 NA NA NA
#> 3 2 A 8 9 5 4 30 32 62 8.52
#> 4 2 B 9 NA 6 NA 31 NA NA NA
#> 5 3 A 9 6 3 2 40 42 82 7.46
#> 6 3 B 7 NA 4 NA 41 NA NA NA
#> 7 4 A 10 6 3 4 50 52 102 7.96
#> 8 4 B 8 NA 2 NA 51 NA NA NA
Created on 2021-09-07 by the reprex package (v2.0.1)
I'll leave you to code the combined standard deviation column. Also, not sure what you want to happen when there is only one instance of id/treatment as this results in NAs.