You might consider converting your data to a long format.
From tidyr::gather
Gather columns into key-value pairs.
Description
Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.
Having samples as columns is the situation of "columns that are not variables". Fixing that so all columns are variables makes manipulating the data a lot easier.
Example
require('tibble')
require('tidyr')
# Data where columns are sample ids.
old_df <- tribble(
~parameter, ~sample_1, ~sample_2, ~sample_3,
"foo", 1, 0, 2,
"baz", 1, 2, 2,
"duq", 1, 0, 3,
"fiz", 1, 0, 0,
"buz", 10, 10, 1)
# Make long format data
new_df <- old_df %>% gather(key="id", value="value", -parameter)
new_df
# A tibble: 15 x 3
# parameter id value
# <chr> <chr> <dbl>
# 1 foo sample_1 1
# 2 baz sample_1 1
# 3 duq sample_1 1
# 4 fiz sample_1 1
# 5 buz sample_1 10
# 6 foo sample_2 0
# 7 baz sample_2 2
# 8 duq sample_2 0
# 9 fiz sample_2 0
#10 buz sample_2 10
#11 foo sample_3 2
#12 baz sample_3 2
#13 duq sample_3 3
#14 fiz sample_3 0
#15 buz sample_3 1
Now you can summarize and filter very easily as all of your columns are variables. In this case, it allows us to group by the sample id. That allows us to summarize data with respect to the sample id and to ask questions with regards to the sample id. E.g. "Which sample ids have more than 4 values greater than zero?"
new_df %>%
group_by(id) %>%
summarize(num_pos_values=sum(value > 0)) %>%
filter(num_pos_values > 4)
# A tibble: 1 x 2
# id num_pos_values
# <chr> <int>
# 1 sample_1 5