calculate the n of groups after a filter

juandmaz · March 7, 2023, 7:03am

I have this data base:

# A tibble: 20 × 4
   country category work    pay
   <chr>      <dbl> <chr> <dbl>
 1 Italy          0 Yes      20
 2 France         0 False    10
 3 France         0 Yes      25
 4 Italy          0 Yes      10
 5 Italy          1 Yes      25
 6 France         0 False    10
 7 France         0 Yes      15
 8 Italy          0 False    25
 9 Italy          0 False    25
10 France         0 Yes      20
11 Italy          1 False    10
12 France         0 False    10
13 Italy          0 Yes      20
14 Italy          1 Yes      10
15 Italy          0 False    20
16 France         0 Yes      20
17 France         1 Yes      10
18 France         0 Yes      10
19 France         0 Yes      10
20 Italy          0 Yes      15

I need to group by country and category and then add the values of the "pay" category for each group and subtract 10 from the result, but only for those with a "Yes" in the work column.
How can this be done?

I have been doing this:

exa %>%
  group_by(country, category) %>%
  filter(work=="Yes") %>%
  summarize(sum= sum(pay,
                     na.rm=T) - n())

But this doesnt work because it calculate the n after the filter is applied. And I need the n() before that.
This are the n() I need to work with:

# A tibble: 4 × 3
# Groups:   country [2]
  country category     n
  <chr>      <dbl> <int>
1 France         0     9
2 France         1     1
3 Italy          0     7
4 Italy          1     3

nirgrahamuk · March 7, 2023, 9:23am

library(tidyverse)
(tdf <- tibble::tribble(
  ~country, ~category,   ~work, ~pay,
  "Italy",         0,   "Yes",   20,
  "France",         0, "False",   10,
  "France",         0,   "Yes",   25,
  "Italy",         0,   "Yes",   10,
  "Italy",         1,   "Yes",   25,
  "France",         0, "False",   10,
  "France",         0,   "Yes",   15,
  "Italy",         0, "False",   25,
  "Italy",         0, "False",   25,
  "France",         0,   "Yes",   20,
  "Italy",         1, "False",   10,
  "France",         0, "False",   10,
  "Italy",         0,   "Yes",   20,
  "Italy",         1,   "Yes",   10,
  "Italy",         0, "False",   20,
  "France",         0,   "Yes",   20,
  "France",         1,   "Yes",   10,
  "France",         0,   "Yes",   10,
  "France",         0,   "Yes",   10,
  "Italy",         0,   "Yes",   15
))

group_by(tdf,
         country,category) |> summarise(
           sum_pay_m10 = sum( (work =='Yes') * pay) - 10,
           n=n()
         )

juandmaz · March 8, 2023, 5:37am

Thanks! It works, but I have a question. How does the code work? Because I tried to apply it to other calculations and it doesn't work. Now i wont to calculate the minimum but it doesnt works.

exa %>%
group_by(country,category) %>% 
  summarise(min = min((work =='Yes') * pay))

It returns this:

# A tibble: 4 × 3
# Groups:   country [2]
  country category   min
  <chr>      <dbl> <dbl>
1 France         0     0
2 France         1    10
3 Italy          0     0
4 Italy          1     0

Wich isnt true because no value in pay is 0

FJCC · March 8, 2023, 5:47am

The code works by using the fact that TRUE is equal to 1 and FALSE is equal to 0. With the sum function, when work does not equal "Yes", you get 0 * pay and that row does not contribute to the sum. With the min() function, the 0 * pay is recorded as the minimum of the values.

juandmaz · March 8, 2023, 6:41am

Thanks for the answer. Do you have any idea how I can do to calculate the minimum once filtered but subtract the number of pre-filtered groups? With the example I can easily do it with:

exa %>%
group_by(country,category) %>% 
filter(work=="Yes") %>%
  summarise(min = min( pay))

But I would like know how to do it in others situation like the one I mentioned. Because this is not gonna work:

exa %>%
group_by(country,category) %>% 
  summarise(min = min((work =='Yes') * pay) - n())

nirgrahamuk · March 8, 2023, 10:39am


group_by(
  tdf,
  country, category) |> summarise(
  minpay_of_work_yes = min(if_else(work == "Yes",
                                   true=pay,
                                   false=NA),
                            na.rm = TRUE),
  n = n()
)

system · March 15, 2023, 10:39am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.