I want to summarize and apply a function to variables which are either numeric or factor. In other words, I want to leave out of the "summarize" character variables.
Is there a way to either: a) summarize all variables that are NOT characters (i.e. my intuition tried summarize_if(!is.character, ...), or b) add multiple conditions to summarize_if, to apply the summary to both numeric and factor variables - e.g. something like summarize_if(is.numeric | is.factor, ...)?
PD: I am doing group_by and summarize_if to collapse observations based on their id. I know there are other methods to do so, but the dataset has ~700k observations, I want to apply this formula to approximately 20 variables, and other methods I've tried are very slow.
This is the code I was trying without success:
data %>%
group_by(id_variable) %>%
summarize_if(vars(!is.character), ~.[!is.na(.)][1L])
To use a predicate (the column selection step) that uses multiple functions, the syntax requires quosure-style functions using ~, in this case, ~!is.character(.). Here's an example:
library(tidyverse)
set.seed(2)
dat = iris %>%
mutate(group1=sample(LETTERS[1:3], nrow(iris), replace=TRUE),
group2=sample(letters[1:3], nrow(iris), replace=TRUE))
dat %>%
group_by(group1) %>%
summarize_if(~!is.character(.), ~.[!is.na(.)][1L])
group1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <dbl> <dbl> <dbl> <dbl> <fct>
1 A 5.1 3.5 1.4 0.2 setosa
2 B 4.7 3.2 1.3 0.2 setosa
3 C 4.9 3 1.4 0.2 setosa
summarize_if(function(x) !is.character(x), ~.[!is.na(.)][1L]) would also work.
A single function can be entered as just the function name: