summarize with multiple filters

Hello, I´m stuck trying to summarize my data with some filters.

I have 3 differents filters:

Var_1 >1000,
Var_1 >500 & Var_1 <1000,
Var_1 <500

After filtering I want the percentage of another variable (Var_2) of each group over the total of Var_2.
Now I´m doing this. It has multiple steps and I know it is not the best way.

My_Data %>%
  filter(Var_1 >= 1000) %>% 
  summarize (Percentage_Var_1_greater_1000 = sum(Var_2))

Total_Sum_Var_2 <- sum(My_Data$Var_2) 

My_Data$Percentage_Var_1_greater_1000 / My_Data$Total_Sum_Var_2 

Then the same code changing the filterin for Var_1 >500 & Var_1 <1000
And after that the same again filtering for Var_1 <500)

It kind of works but isn´t a nice code. Any help of how to put it all together?

Hello. For future questions, please consider sharing a reproducible example (reprex) as described here. That makes it really easy for community members to help you.

For this question, I've prepared some sample data to illustrate one possible solution. To avoid duplication, you can use group_by() to perform the same operation across each of your three groups like so:

library(dplyr, warn.conflicts = FALSE)

data <- tibble(Var_1 = c(900, 1500, 350, 1200, 750, 100),
               Var_2 = c(90000, 10000, 8500, 4000, 5000, 1500))

head(data)
#> # A tibble: 6 x 2
#>   Var_1 Var_2
#>   <dbl> <dbl>
#> 1   900 90000
#> 2  1500 10000
#> 3   350  8500
#> 4  1200  4000
#> 5   750  5000
#> 6   100  1500

data %>% 
  mutate(Var_1_Category = case_when(Var_1 < 500 ~ "< 500",
                                    Var_1 >= 500 & Var_1 < 1000 ~ "500-1000",
                                    TRUE ~ "> 1000")) %>% 
  group_by(Var_1_Category) %>% 
  summarize(Percentage = sum(Var_1) / sum(Var_2))
#> # A tibble: 3 x 2
#>   Var_1_Category Percentage
#>   <chr>               <dbl>
#> 1 < 500              0.045 
#> 2 > 1000             0.193 
#> 3 500-1000           0.0174

Created on 2020-04-11 by the reprex package (v0.3.0)

1 Like

Thanks for your help. It´s almost solved but in the last step I need the percentage of the sum of each group of Var_1 (<500, <1000 >500, <1000) over the the total sum of Var_1.
Now is

summarize(Percentage = sum(Var_1) / sum(Var_2))

I don´t know how to invocate the sum of each group because obviusly if I write:

sum(Var_1) / sum(Var_1)

It going to be 1

OK. You could add a mutate() call that will sum up the group-wise Var_2 sums and then calculate the percentage.

library(dplyr, warn.conflicts = FALSE)

data <- tibble(Var_1 = c(900, 1500, 350, 1200, 750, 100),
               Var_2 = c(90000, 10000, 8500, 4000, 5000, 1500))

data %>% 
  mutate(Var_1_Category = case_when(Var_1 < 500 ~ "< 500",
                                    Var_1 >= 500 & Var_1 < 1000 ~ "500-1000",
                                    TRUE ~ "> 1000")) %>% 
  group_by(Var_1_Category) %>% 
  summarize(Var_2 = sum(Var_2)) %>% 
  mutate(Var_2_Perc = Var_2 / sum(Var_2))
#> # A tibble: 3 x 3
#>   Var_1_Category Var_2 Var_2_Perc
#>   <chr>          <dbl>      <dbl>
#> 1 < 500          10000     0.0840
#> 2 > 1000         14000     0.118 
#> 3 500-1000       95000     0.798

Created on 2020-04-12 by the reprex package (v0.3.0)

1 Like

Thanks so much! It works :slightly_smiling_face:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.