How to normalize data against control group

Hi there!
I have a dataset containing three factors: line, dop and conc. Each line group has four rows on which dop and conc values are "control". Below you can find a reprex:

line;dop;conc;prol
a;undop;100;0,1540
a;undop;100;0,2770
a;undop;100;0,2460
a;0,0175;100;0,2030
a;0,0175;100;0,1630
a;0,0175;100;0,2300
a;0,015;100;0,2960
a;0,015;100;0,1070
a;0,015;100;0,2450
a;0,013;100;0,1890
a;0,013;100;0,2910
a;0,013;100;0,2490
a;0,02;100;0,1250
a;0,02;100;0,2910
a;0,02;100;0,2650
a;0,01;100;0,2040
a;0,01;100;0,1030
a;0,01;100;0,1100
a;0,005;100;0,1770
a;0,005;100;0,2890
a;0,005;100;0,1920
a;0,001;100;0,2820
a;0,001;100;0,2480
a;0,001;100;0,1320
a;control;control;0,1640
a;undop;10;0,2920
a;undop;10;0,2580
a;undop;10;0,1900
a;0,0175;10;0,2060
a;0,0175;10;0,2860
a;0,0175;10;0,1010
a;0,015;10;0,2720
a;0,015;10;0,1300
a;0,015;10;0,2720
a;0,013;10;0,2760
a;0,013;10;0,2910
a;0,013;10;0,2630
a;0,02;10;0,1900
a;0,02;10;0,2710
a;0,02;10;0,1770
a;0,01;10;0,2980
a;0,01;10;0,2580
a;0,01;10;0,1500
a;0,005;10;0,3000
a;0,005;10;0,2510
a;0,005;10;0,1990
a;0,001;10;0,1270
a;0,001;10;0,2040
a;0,001;10;0,2860
a;control;control;0,1300
a;undop;1;0,2780
a;undop;1;0,1250
a;undop;1;0,2710
a;0,0175;1;0,1000
a;0,0175;1;0,2920
a;0,0175;1;0,2340
a;0,015;1;0,1620
a;0,015;1;0,1230
a;0,015;1;0,2770
a;0,013;1;0,1330
a;0,013;1;0,1880
a;0,013;1;0,2530
a;0,02;1;0,1410
a;0,02;1;0,1720
a;0,02;1;0,1780
a;0,01;1;0,2190
a;0,01;1;0,1650
a;0,01;1;0,1260
a;0,005;1;0,1210
a;0,005;1;0,1200
a;0,005;1;0,1160
a;0,001;1;0,1720
a;0,001;1;0,1320
a;0,001;1;0,2410
a;control;control;0,2590
a;undop;0,1;0,1880
a;undop;0,1;0,2340
a;undop;0,1;0,1950
a;0,0175;0,1;0,1630
a;0,0175;0,1;0,1190
a;0,0175;0,1;0,2250
a;0,015;0,1;0,2520
a;0,015;0,1;0,2890
a;0,015;0,1;0,2150
a;0,013;0,1;0,2850
a;0,013;0,1;0,1350
a;0,013;0,1;0,2550
a;0,02;0,1;0,2810
a;0,02;0,1;0,1810
a;0,02;0,1;0,2000
a;0,01;0,1;0,1320
a;0,01;0,1;0,2730
a;0,01;0,1;0,2570
a;0,005;0,1;0,1740
a;0,005;0,1;0,1830
a;0,005;0,1;0,2910
a;0,001;0,1;0,2580
a;0,001;0,1;0,1500
a;0,001;0,1;0,1480
a;control;control;0,2870
b;undop;100;0,2530
b;undop;100;0,1860
b;undop;100;0,1820
b;0,0175;100;0,2850
b;0,0175;100;0,1620
b;0,0175;100;0,2130
b;0,015;100;0,2900
b;0,015;100;0,2610
b;0,015;100;0,1900
b;0,013;100;0,1030
b;0,013;100;0,2650
b;0,013;100;0,2640
b;0,02;100;0,1580
b;0,02;100;0,2470
b;0,02;100;0,2730
b;0,01;100;0,2280
b;0,01;100;0,1850
b;0,01;100;0,2340
b;0,005;100;0,1170
b;0,005;100;0,2370
b;0,005;100;0,1160
b;0,001;100;0,2830
b;0,001;100;0,1560
b;0,001;100;0,1330
b;control;control;0,1410
b;undop;10;0,3000
b;undop;10;0,1430
b;undop;10;0,2910
b;0,0175;10;0,2350
b;0,0175;10;0,2500
b;0,0175;10;0,2100
b;0,015;10;0,1210
b;0,015;10;0,2220
b;0,015;10;0,1360
b;0,013;10;0,2070
b;0,013;10;0,2650
b;0,013;10;0,1450
b;0,02;10;0,2090
b;0,02;10;0,1060
b;0,02;10;0,2520
b;0,01;10;0,1700
b;0,01;10;0,2550
b;0,01;10;0,1570
b;0,005;10;0,1430
b;0,005;10;0,1060
b;0,005;10;0,1740
b;0,001;10;0,1980
b;0,001;10;0,1090
b;0,001;10;0,2330
b;control;control;0,2650
b;undop;1;0,2320
b;undop;1;0,2470
b;undop;1;0,2070
b;0,0175;1;0,2610
b;0,0175;1;0,2090
b;0,0175;1;0,1250
b;0,015;1;0,2780
b;0,015;1;0,2190
b;0,015;1;0,2720
b;0,013;1;0,1500
b;0,013;1;0,2400
b;0,013;1;0,2000
b;0,02;1;0,1780
b;0,02;1;0,1320
b;0,02;1;0,1680
b;0,01;1;0,1430
b;0,01;1;0,1660
b;0,01;1;0,2370
b;0,005;1;0,2040
b;0,005;1;0,2870
b;0,005;1;0,2710
b;0,001;1;0,1460
b;0,001;1;0,1150
b;0,001;1;0,2070
b;control;control;0,2200
b;undop;0,1;0,2680
b;undop;0,1;0,2620
b;undop;0,1;0,2510
b;0,0175;0,1;0,2100
b;0,0175;0,1;0,2980
b;0,0175;0,1;0,1740
b;0,015;0,1;0,2320
b;0,015;0,1;0,1230
b;0,015;0,1;0,2800
b;0,013;0,1;0,1830
b;0,013;0,1;0,1940
b;0,013;0,1;0,2580
b;0,02;0,1;0,2120
b;0,02;0,1;0,2820
b;0,02;0,1;0,1780
b;0,01;0,1;0,2470
b;0,01;0,1;0,2500
b;0,01;0,1;0,2760
b;0,005;0,1;0,1780
b;0,005;0,1;0,1880
b;0,005;0,1;0,1350
b;0,001;0,1;0,1260
b;0,001;0,1;0,2580
b;0,001;0,1;0,2840
b;control;control;0,1880

What I want, is to normalize each value of prol variable of every dop and conc row against the mean of the four control values I mentioned before.

Basically, you should divide every prol value of line a by the mean of the prol values of its controls and multiply it by 100. i.e.:
The mean of the controls belonging to line a is:

  line  dop     conc     prol
  <chr> <chr>   <chr>   <dbl>
1 a     control control 0.164
2 a     control control 0.13 
3 a     control control 0.259
4 a     control control 0.287

(0,1640+0,1300+0,2590+0,2870)/4 = 0.21

Now every prol value of line a should be divided by this number and multiplied by 100:

   line  dop    conc   prol
   <chr> <chr>  <chr> <dbl>
 1 a     undop  100   0.154
 2 a     undop  100   0.277

0.1540/0.21x100=73.33

0.2770/0.21x100=131.9

and so on.

The same should be done to line b.

With the following lines I've managed to do it, but it only normalizes the data corresponding to the controls, and skips all the useful data corresponding to the rest of the dop and conc levels:

dummy %>%
  group_by(line) %>%
  filter(dop=="control") %>%
  mutate(ctrl=prol/mean(prol)*100)

# A tibble: 8 x 5
# Groups:   line [2]
  line  dop     conc     prol  ctrl
  <chr> <chr>   <chr>   <dbl> <dbl>
1 a     control control 0.164  78.1
2 a     control control 0.13   61.9
3 a     control control 0.259 123. 
4 a     control control 0.287 137. 
5 b     control control 0.141  69.3
6 b     control control 0.265 130. 
7 b     control control 0.22  108. 
8 b     control control 0.188  92.4

You can see that ctrl column now shows the successfully calculated values, but it only does for the control values, skipping all the useful rest of the data.

How can I expand that mutation to all the rows and not only the control ones? I've tried using "cur_data()" which seems a new feature in dplyr, but haven't managed to make it work. Something tells me it could be done with rowwise() but I can't seem to understand how it works...

Thanks a lot in advance!

JP.

Sorry, I got confused about your data so did not create a reprex.
Could you edit your code as below and see if this will result in something that you are thinking of?
This way each row will be normalized based on the mean value of controls for each line.

dummy %>%
group_by(line) %>%
mutate(ctrl=prol/mean(prol[dop=="control"])*100)

2 Likes

Wow! I didn't know of that syntax. This quote below

prol[dop=="control"]

is what I was looking for. Can you please refer me to the documentacion of this syntax? I've read all tidyverse documentation looking for an answer and have never come across this.

Thanks a lot, really!

Here's the working solution:

dummy %>%
     group_by(line) %>%
     mutate(ctrl=prol/mean(prol[dop=="control"])*100)

# A tibble: 200 x 5
# Groups:   line [2]
   line  dop    conc   prol  ctrl
   <chr> <chr>  <chr> <dbl> <dbl>
 1 a     undop  100   0.154  73.3
 2 a     undop  100   0.277 132. 
 3 a     undop  100   0.246 117. 
 4 a     0,0175 100   0.203  96.7
 5 a     0,0175 100   0.163  77.6
 6 a     0,0175 100   0.23  110. 
 7 a     0,015  100   0.296 141. 
 8 a     0,015  100   0.107  51.0
 9 a     0,015  100   0.245 117. 
10 a     0,013  100   0.189  90  
# ... with 190 more rows

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.