Hi all,
I am trying to select the columns that meet certain criteria for its subset. For example, I want to pick out column from p1, p2
where grp 0
min is larger than the overall median:
library(tidyverse)
df <- tribble(
~ID, ~grp, ~p1, ~p2,
1, 0, 1, 3,
2, 0, 2, 4,
3, 1, 3, 2,
4, 1, 3, 2,
5, 1, 3, 1
)
df %>%
summarise_at(vars(p1:p2), funs(min(.[grp==0]), median) )
df %>%
summarise_at(vars(p1:p2), funs(min(.[grp==0]) > median(.)) )
the results below
> df %>%
+ summarise_at(vars(p1:p2), funs(min(.[grp==0]), median) )
# A tibble: 1 x 4
p1_min p2_min p1_median p2_median
<dbl> <dbl> <dbl> <dbl>
1 1. 3. 3. 2.
> df %>%
+ summarise_at(vars(p1:p2), funs(min(.[grp==0]) > median(.)) )
# A tibble: 1 x 2
p1 p2
<lgl> <lgl>
1 FALSE TRUE
what I want to only keep the columns that is TRUE
, i.e.,
> df %>%
+ select(ID, grp, p2)
# A tibble: 5 x 3
ID grp p2
<dbl> <dbl> <dbl>
1 1. 0. 3.
2 2. 0. 4.
3 3. 1. 2.
4 4. 1. 2.
5 5. 1. 1.
Any suggestion how to do this? I was thinking select_if, but don't know how to limit the column range to p1:p2