Divide row by row beneath it

jhodzic · February 25, 2019, 5:27am

I have the following dataframe

  Type         Ca      K    Mg
   <chr>     <dbl>  <dbl> <dbl>
 1 parasite 12889. 21919. 3512.
 2 host      8825. 18178. 2640.
 3 parasite 14776. 32136. 3535.
 4 host      9087. 24764. 1812.
 5 parasite 13951. 31275. 4256.
 6 host      7336. 23079. 1834.
 7 parasite 17962. 35683. 4261.
 8 host      7798. 26001. 2573.
 9 parasite 15268. 35946. 4676.
10 host      7513. 29635. 2711.

I'd like to make a new column in which I sum across rows for the Ca, K and Mg "total" I then want to express these totals as a ratio in another column.

however, when I try to use mutate, the fact that I won't have 10 rows ( I would only have 5 since I am using 2 to get the desired division) is making it cause problems

As an example, just summing K and Ca

Parasite<- Leaf.orig %>%
  filter(Type == "parasite") %>%
  select(colnames(Leaf.orig)) %>%
  mutate(K_Ca = K + Ca)



Host<- Leaf.orig %>%
  filter(Type == "host") %>%
  select(colnames(Leaf.orig)) %>%
  mutate(K_Ca = K+Ba)

Leaf.orig %>%
  mutate(K_Ba = Parasite$K_Ca/Host$K_Ca)

does not work. I'm very new to dplyr and R in general, so any help is appreciated!

FJCC · February 25, 2019, 6:23am

I advise against using row position as what determines which numbers are associated with each other. If the pairs of rows are explicitly labeled as belonging together, the processing is much easier. I invented the fact that each host/parasite pair belong to a Site. I also broke out each intermediate step as its own data frame so you can more easily inspect the logic of the processing.

library(tidyr)
library(dplyr)
HostPara <- data.frame(Site = c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E" ),
           Type = rep(c("parasite", "host"), 5),
           Ca = c(12889, 8825, 14776, 9087, 13951, 7336, 17962, 7798, 15268, 7513),
           K = c(21919, 18178, 32136, 24764, 31275, 23079, 35683, 26001, 35946, 29635),
           Mg = c(3512, 2640, 3535, 1812, 4256, 1834, 4261, 2573, 4676, 2711))
HostPara

   Site     Type    Ca     K   Mg
1     A parasite 12889 21919 3512
2     A     host  8825 18178 2640
3     B parasite 14776 32136 3535
4     B     host  9087 24764 1812
5     C parasite 13951 31275 4256
6     C     host  7336 23079 1834
7     D parasite 17962 35683 4261
8     D     host  7798 26001 2573
9     E parasite 15268 35946 4676
10    E     host  7513 29635 2711

GathHostPara <- HostPara %>% gather(key = Metal, value = Value, Ca, K, Mg)
SummHostPara <- GathHostPara %>% group_by(Site, Type) %>% 
  summarize(Sum = sum(Value))
SpreadSummary <- SummHostPara %>% spread(key = Type, value = Sum)
SpreadSummary

# A tibble: 5 x 3
# Groups:   Site [5]
  Site   host parasite
  <fct> <dbl>    <dbl>
1 A     29643    38320
2 B     35663    50447
3 C     32249    49482
4 D     36372    57906
5 E     39859    55890

Ratio <- SpreadSummary %>% mutate(Ratio = parasite/host)
Ratio
# A tibble: 5 x 4
# Groups:   Site [5]
  Site   host parasite Ratio
  <fct> <dbl>    <dbl> <dbl>
1 A     29643    38320  1.29
2 B     35663    50447  1.41
3 C     32249    49482  1.53
4 D     36372    57906  1.59
5 E     39859    55890  1.40

cderv · February 25, 2019, 6:43am

There is no Ba, is it Ca here ?

In your example you did not use Mg. Can you provide the resulting output you want (construct by hand if needed).

We could help, but we need more precise information. I am not sure what is the output you seek exactly.

If it is like @FJCC undertand then . Else please, help us help you!

jhodzic · February 25, 2019, 5:25pm

Hey , this is a great solution. I do have something like "Site" although it is actually "Group", here is a snapshot of my data frame "Leaf.orig" I have many more nutrients than the Mg, Ca, K- I just used those as an example. Here is what I did;

  Unique.Code Block.ID Group Type  Species Al    As        B    Ba     Ca Cd   
  <chr>       <chr>    <chr> <chr> <chr>   <chr> <chr> <dbl> <dbl>  <dbl> <chr>
1 A1          B1       A     para~ CAHI    63.6~ NA     27.8  6.19 12889. NA   
2 A2          B1       A     host  ERLA    36.9~ NA     25.9  4.56  8825. NA   
3 A3          B2       A     para~ CAHI    81.0~ 0     130.   0    14776. NA   
4 A4          B2       A     host  ERLA    24.0~ NA     30.7  3.00  9087. NA   
5 A5          B3       A     para~ CAHI    158.~ NA     46.2 13.2  13951. NA   
6 A6          B3       A     host  ERLA    39.4~ 0      27.4  3.81  7336. NA

#subset for response variables that I think are the most important
Nutrients1<-Leaf.orig[, colnames(Leaf.orig)  %in%  c("K", "Na", "P", "Mg",  "Ca")]

##calculate new variables

GathLeaf <- Leaf.orig %>% gather(key = Metal, value = Value, colnames(Nutrients))

SummLeaf <- GathLeaf %>% group_by(Group, Type) %>%
  summarize(Sum = sum(Value))

SpreadSummary <- SummLeaf %>% spread(key = Type, value = Sum )
SpreadSummary

I get an error though

Error in eval_tidy(enquo(var), var_env) : object 'Type' not found

i'm not sure why it won't find Type..

jhodzic · February 25, 2019, 5:26pm

ah my apologies I was trying to create a specific example and made a typo. See my comment on @FJCC's response. Thanks for your help!

FJCC · February 25, 2019, 7:33pm

I do not see a cause for the error but it is hard to say without working with your exact data. You should be able to look at the column names of GathLeaf and SummLeaf and find which is missing the Type column. I cannot tell from your post which line of code is throwing the error.

You can also improve your method of forming a subset of the original data frame that contains only the most important nutrients. Use the select() function from the dplyr package.

GathLeaf <- Leaf.orig %>% 
             select(Unique.Code, Block.ID, Group, Type, Species, K, Na, P, Mg, Ca) %>%
                   gather(key = Metal, value = Value, K:Ca)

Proceed from there with your other code steps. If you cannot find why the Type column is not present, please post a Reproducible Example, known as a reprex, so we can work with exactly the data you are using. Search for reprex on this forum and you will find many links for instructions on how to make one.

jhodzic · February 25, 2019, 11:17pm

@FJCC I ran your code and it didn't work either. I realized the group_by function wasn't actually grouping and so I , on a whim, installed updates and now it works. Thanks so much!

system · March 4, 2019, 11:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.