Dear all,
I have the following tibble my_tbl:
Groups: group_id [1]
group_id process
<chr> <chr>
1 111000820 1
2 111000820 1
3 111000820 1
4 111000820 1
5 111000820 10000
6 111000820 10000
7 111000820 10000
8 111000820 10000
9 111000820 10001
10 111000820 10001
11 111000820 10001
12 111000820 10001
13 111000820 10002
14 111000820 10002
15 111000820 10002
16 111000820 10002
17 111000820 10003
18 111000820 10003
19 111000820 10003
20 111000820 10003
It is grouped by group_id column. Here only one group is displayed for simplicity.
What I would like to do in pseudocode is:
(for every group) for the processes that are > 10000, retrieve the max process, i.e. 10003 (and also the processes that are 1)
Something like:
1 111000820 1
2 111000820 1
3 111000820 1
4 111000820 1
5 111000820 10003
6 111000820 10003
7 111000820 10003
8 111000820 10003
(I have other columns as well in this tibble which I am not displaying here)
It would be nice and intuitive for me to combine case_when with filter and do sth like:
my_tbl %>% group_by(group_id) %>% mutate(process = case_when(process >= 10000 ~ filter(process == max(process)), TRUE ~ process)) %>% filter(group_id == "111000820")
but i know this does not work.
Any suggestions would be greatly appreciated.
Thanks,
Dimitris