make function with different filters

help · December 13, 2022, 1:54am

hi I want to run this function:

cats_clean <- function(.data, orange = NULL, gray = NULL) {
if (gray) { .data %>% filter(from_europe) }  
else if (orange) { .data %>% filter(birthdate >= ymd("2011-01-11") }
else { .data } 

.data %>% 
group_by(color) %>%
summarize(happiness = sum(happiness, na.rm = TRUE)
}

hi I am trying to create a filter that changes based on what I input in the function. the underlying .data is going to be the same data frame, but sometimes I will want to specify the "orange" or "gray" input, and hence, change the filters. is this the best way to do this? there won't be a situation where I want to specify two of the inputs (orange and gray) in the same run. thank you very much.

AlexisW · December 13, 2022, 5:10am

I think you do have the idea, you just need to do something with these filtered .data.

cats_clean <- function(.data, orange = NULL, gray = NULL) {
  if (gray) { 
    filtered <- .data %>% filter(from_europe)
  } else if (orange) { 
    filtered <- .data %>% filter(birthdate >= ymd("2011-01-11"))
  } else { 
   filtered <-  .data
  } 

  filtered %>% 
    group_by(color) %>%
    summarize(happiness = sum(happiness, na.rm = TRUE)
}

technocrat · December 13, 2022, 8:24am

See the FAQ: How to do a minimal reproducible example reprex for beginners to attract answers that may conform more closely to your data.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

dat <- data.frame(from = c("europe","africa","asia"),
                  birthdate = c("2010-01-11","2011-01-11","2011-01-12"),
                  happiness = c(1,2,3),
                  hue = c("mauve","fuscia","magenta"))


cats_clean <- function(x,y = "clear") {
    if(is.na(y)) ret =  x %>% group_by(hue) %>% summarize(happiness = sum(happiness, na.rm = TRUE))
    if((y != "gray" & y != "orange")) ret = x  %>% group_by(hue) %>% summarize(happiness = sum(happiness, na.rm = TRUE))
    if(y == "gray") ret = x %>% dplyr::filter(from == "europe")
    if(y == "orange") ret = x %>% filter(birthdate >= ymd("2011-01-11"))
  return(ret)
 }

cats_clean()   
#> Error in group_by(., hue): argument "x" is missing, with no default
cats_clean(dat)
#> # A tibble: 3 × 2
#>   hue     happiness
#>   <chr>       <dbl>
#> 1 fuscia          2
#> 2 magenta         3
#> 3 mauve           1
cats_clean(dat,"red")
#> # A tibble: 3 × 2
#>   hue     happiness
#>   <chr>       <dbl>
#> 1 fuscia          2
#> 2 magenta         3
#> 3 mauve           1
cats_clean(dat,"gray")
#>     from  birthdate happiness   hue
#> 1 europe 2010-01-11         1 mauve
cats_clean(dat,"orange")
#>     from  birthdate happiness     hue
#> 1 africa 2011-01-11         2  fuscia
#> 2   asia 2011-01-12         3 magenta

^{Created on 2022-12-13 by the reprex package (v2.0.1)}

help · December 13, 2022, 6:20pm

thank you but how do you then

cats_clean(data frame, orange = TRUE, gray = NULL) doesnt work however?

AlexisW · December 15, 2022, 12:33am

That's because you have gray = NULL. When you call if(xx), the value of xx can be TRUE or FALSE. If xx=NULL it means it doesn't exist, and if() doesn't know what to do with it.

So you have 2 possibilities: either set the default as FALSE, and don't use NULL anywhere (that's my favorite), or if you want to use NULL, you have to test for it with is.null(), which returns TRUE or FALSE.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

set.seed(123)
dat <- data.frame(from_europe = sample(c(TRUE,FALSE), 6, replace = TRUE),
                  birthdate = c("2010-01-11","2011-01-11","2011-01-12"),
                  happiness = rnorm(6),
                  color = c("mauve","fuscia","magenta"))


cats_clean1 <- function(.data, orange = FALSE, gray = FALSE) {
  if (gray) { 
    filtered <- .data %>% filter(from_europe)
  } else if (orange) { 
    filtered <- .data %>% filter(birthdate >= ymd("2011-01-11"))
  } else { 
    filtered <-  .data
  } 
  
  filtered %>% 
    group_by(color) %>%
    summarize(happiness = sum(happiness, na.rm = TRUE))
}


cats_clean2 <- function(.data, orange = NULL, gray = NULL) {
  if (! is.null(gray)) { 
    filtered <- .data %>% filter(from_europe)
  } else if (! is.null(orange)) { 
    filtered <- .data %>% filter(birthdate >= ymd("2011-01-11"))
  } else { 
    filtered <-  .data
  } 
  
  filtered %>% 
    group_by(color) %>%
    summarize(happiness = sum(happiness, na.rm = TRUE))
}

cats_clean1(dat)
#> # A tibble: 3 × 2
#>   color   happiness
#>   <chr>       <dbl>
#> 1 fuscia     -1.14 
#> 2 magenta     1.03 
#> 3 mauve       0.531
cats_clean2(dat)
#> # A tibble: 3 × 2
#>   color   happiness
#>   <chr>       <dbl>
#> 1 fuscia     -1.14 
#> 2 magenta     1.03 
#> 3 mauve       0.531

^{Created on 2022-12-14 by the reprex package (v2.0.1)}

(thanks to @technocrat for the reprex!)

system · January 26, 2023, 12:34am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.