Hopefully this helps you get started. A different way of thinking about this data is that you have color "groups" that we'll identify by id (I'm using row numbers before I transform the data), and for each id you have a data frame with the id, the "type" of color (for lack of a better word), e.g., colorA_oz or colorB_oz, and then you have the observed intensity (see the first data frame in the reprex I've created below).
For the second data frame I've created below, you don't need id, because I'm isolating cases where the other color was zero (if you had multiple matches for a given value of one color where the other was equal to zero, you'd run into trouble, but, then again, the method you've described above wouldn't work anyhow in such a case).
Parts I've left to you:
You can use the zero_cases
as a sort of "lookup table" in combination with pivoted_df
to find the expected value of a given color by matching (performing a join) on color_type
and color_val
to get the expected intensity for a given color type and value. You would then calculate the expected value for the group by summarizing by rowid .
library(tidyverse)
mydf <- data.frame("colorA_oz"=c(2,4,6,8,0,0,0,0,2,4,6,8),
"colorB_oz"=c(1,5,7,11,1,5,7,11,0,0,0,0),
"observed_intensity"=c(3,5,8,8,9,12,13,11,1,10,3,15))
mydf %>%
rowid_to_column() %>%
pivot_longer(c("colorA_oz", "colorB_oz"), names_to = "color_type", values_to = "color_val")
#> # A tibble: 24 x 4
#> rowid observed_intensity color_type color_val
#> <int> <dbl> <chr> <dbl>
#> 1 1 3 colorA_oz 2
#> 2 1 3 colorB_oz 1
#> 3 2 5 colorA_oz 4
#> 4 2 5 colorB_oz 5
#> 5 3 8 colorA_oz 6
#> 6 3 8 colorB_oz 7
#> 7 4 8 colorA_oz 8
#> 8 4 8 colorB_oz 11
#> 9 5 9 colorA_oz 0
#> 10 5 9 colorB_oz 1
#> # … with 14 more rows
zero_cases <- mydf %>%
filter(colorA_oz == 0 | colorB_oz == 0) %>%
pivot_longer(!observed_intensity, names_to = "color_type", values_to = "color_val") %>%
relocate("observed_intensity", .after = everything()) %>%
filter(color_val > 0)
zero_cases
#> # A tibble: 8 x 3
#> color_type color_val observed_intensity
#> <chr> <dbl> <dbl>
#> 1 colorB_oz 1 9
#> 2 colorB_oz 5 12
#> 3 colorB_oz 7 13
#> 4 colorB_oz 11 11
#> 5 colorA_oz 2 1
#> 6 colorA_oz 4 10
#> 7 colorA_oz 6 3
#> 8 colorA_oz 8 15
Created on 2021-04-14 by the reprex package (v2.0.0)
Edit/bonus hint
You'll get NA
s in your join for cases where one of the colors is zero (since I filtered them out), so you can replace the NA
s with zero, e.g., using the code below (I'm sure there are other ways to do it as well):
pivoted_df %>%
left_join(zero_cases) %>%
mutate(expected_intensity = replace_na(expected_intensity, 0))
#> Joining, by = c("color_type", "color_val")
#> # A tibble: 24 x 5
#> rowid observed_intensity color_type color_val expected_intensity
#> <int> <dbl> <chr> <dbl> <dbl>
#> 1 1 3 colorA_oz 2 1
#> 2 1 3 colorB_oz 1 9
#> 3 2 5 colorA_oz 4 10
#> 4 2 5 colorB_oz 5 12
#> 5 3 8 colorA_oz 6 3
#> 6 3 8 colorB_oz 7 13
#> 7 4 8 colorA_oz 8 15
#> 8 4 8 colorB_oz 11 11
#> 9 5 9 colorA_oz 0 0
#> 10 5 9 colorB_oz 1 9
#> # … with 14 more rows
Created on 2021-04-14 by the reprex package (v2.0.0)
Edit 2
Realized I forgot to mention that you'd need those row ids in the original data frame if you wanted to return to form in the end:
mydf2 <- mydf %>%
rowid_to_column() # adds rowid to the data frame you started out with
mydf2 %>%
left_join(summary_df)
#> Joining, by = "rowid"
#> rowid colorA_oz colorB_oz observed_intensity expected_intensity
#> 1 1 2 1 3 10
#> 2 2 4 5 5 22
#> 3 3 6 7 8 16
#> 4 4 8 11 8 26
#> 5 5 0 1 9 9
#> 6 6 0 5 12 12
#> 7 7 0 7 13 13
#> 8 8 0 11 11 11
#> 9 9 2 0 1 1
#> 10 10 4 0 10 10
#> 11 11 6 0 3 3
#> 12 12 8 0 15 15