summarize_if - multiple conditions or different from character

I want to summarize and apply a function to variables which are either numeric or factor. In other words, I want to leave out of the "summarize" character variables.

Is there a way to either: a) summarize all variables that are NOT characters (i.e. my intuition tried summarize_if(!is.character, ...), or b) add multiple conditions to summarize_if, to apply the summary to both numeric and factor variables - e.g. something like summarize_if(is.numeric | is.factor, ...)?

PD: I am doing group_by and summarize_if to collapse observations based on their id. I know there are other methods to do so, but the dataset has ~700k observations, I want to apply this formula to approximately 20 variables, and other methods I've tried are very slow.

This is the code I was trying without success:

data %>% 
  group_by(id_variable) %>% 
  summarize_if(vars(!is.character), ~.[!is.na(.)][1L]) 

To use a predicate (the column selection step) that uses multiple functions, the syntax requires quosure-style functions using ~, in this case, ~!is.character(.). Here's an example:

library(tidyverse)

set.seed(2)
dat = iris %>% 
  mutate(group1=sample(LETTERS[1:3], nrow(iris), replace=TRUE),
         group2=sample(letters[1:3], nrow(iris), replace=TRUE))

dat %>% 
  group_by(group1) %>% 
  summarize_if(~!is.character(.), ~.[!is.na(.)][1L])
  group1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  <chr>         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1 A               5.1         3.5          1.4         0.2 setosa 
2 B               4.7         3.2          1.3         0.2 setosa 
3 C               4.9         3            1.4         0.2 setosa 

summarize_if(function(x) !is.character(x), ~.[!is.na(.)][1L]) would also work.

A single function can be entered as just the function name:

summarize_if(is.character, ~.[!is.na(.)][1L]) 

is.not.character = function(x) {
  !is.character(x)
}

summarize_if(is.not.character, ~.[!is.na(.)][1L])

This works in a similar way for the function that gets applied to the data:

first.non.na.val = function(x) {
  x[!is.na(x)][1L]
}

dat %>% 
  group_by(group1) %>% 
  summarize_if(is.not.character, first.non.na.val)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.