Checking is.null in a case_when statement

dplyr
tidyverse

#1

I'm writing a function with a case_when statement and I can't figure out why my is.null(x) check in the case when will not result in "I'm NULL." I've simplified my issue below, test_function1 gives the result I'd expect and test_function does not. Is this an issue with case_when or something I'm not understanding about how is.null() works? I'd like to just handle everything in the case_when statement. Any insights are appreciated, thanks!

Function calls and result:
test_function1(NULL) --> "other content"
test_function(NULL) --> character(0)

Functions:

test_function1 <- function(x) {
  
  if (is.null(x)) {
    return("other content")
  }
  
  dplyr::case_when(
    grepl("word", x, ignore.case = TRUE) ~ "I'm a word",
    TRUE ~ "other"
  )
  
}

test_function <- function(x) {

  dplyr::case_when(
    grepl("word", x, ignore.case = TRUE) ~ "I'm a word",
    is.null(x) ~ "I'm NULL",
    TRUE ~ "other"
  )
  
}

#2

NULL has length 0, so your case_when sees the left sides of your cases have lengths of either 0 or 1, and decides the output vector should be length 0.

From the docs:

Value

A vector of length 1 or n [the length of LHS if not 1], matching the length of the logical input or output vectors, with the type (and attributes) of the first RHS. Inconsistent lengths or types will generate an error.

Text in brackets mine.


#3

To expand a little bit on @nwerth's answer, part of the problem is that grepl() returns a logical vector of length 0 when its x is NULL. For comparison:

library(tidyverse)

x <- NULL
y <- 3

length(is.null(x))
#> [1] 1
length(grepl("word", x, ignore.case = TRUE))
#> [1] 0
length(y > 2)
#> [1] 1

case_when(
  is.null(x) ~ "I'm NULL",
  grepl("word", x, ignore.case = TRUE) ~ "I'm a word",
  TRUE ~ "other"
)
#> character(0)

case_when(
  is.null(x) ~ "I'm NULL",
  y > 2 ~ "Big y",
  TRUE ~ "other"
)
#> [1] "I'm NULL"

Created on 2018-06-14 by the reprex package (v0.2.0).

I guess you could try this, though I find the logic a little hard to follow on a quick read so there might be maintainability issues:

library(tidyverse)

x <- NULL

test_function <- function(x) {
  dplyr::case_when(
    is.null(x) ~ "I'm NULL",
    sum(grepl("word", x, ignore.case = TRUE)) > 0 ~ "I'm a word",
    TRUE ~ "other"
  )
}

test_function(NULL)
#> [1] "I'm NULL"
test_function(NA)
#> [1] "other"
test_function(character(0))
#> [1] "other"
test_function("foobarbuzz")
#> [1] "other"
test_function("foobarwordbuzz")
#> [1] "I'm a word"

Created on 2018-06-14 by the reprex package (v0.2.0).


#4

Thank you for the quick responses this is very helpful.

I'm curious then, if I want to handle NULL being passed to my function, is it best practice to do:

if (is.null(x)) {
   return("other content")
}

or would it be OK to do:

dplyr::case_when(
   grepl("word", x, ignore.case = TRUE) ~ "I'm a word",
   length(x) == 0 ~ "I'm NULL",
   TRUE ~ "other"
)

Maybe it's just up to developer preference?


#5

It depends on context, but parameter checking is a pretty common first step for functions. R has some built-in functions to make this easier, like missing, match.arg, and stopifnot. The last will raise an error if the conditions specified aren't true, which is frequently what you want, e.g.

f <- function(x){
    stopifnot(!missing(x), !is.null(x))
    x
}

f()
#> Error in f(): !missing(x) is not TRUE
f(NULL)
#> Error in f(NULL): !is.null(x) is not TRUE
f('foo')
#> [1] "foo"

You could use case_when, but I find base R control flow sufficient for these sorts of things and reserve case_when for avoiding nested ifelse constructs.


#6

Great, I'm just getting into writing more functions with test cases so this is very helpful, appreciate the feedback and insights!


#7

As you get into parameter checking, beware that there are several things in R that have length 0 yet are not NULL. A non-exhaustive, off-the-top-of-my-head selection:

length(NULL) == 0
#> [1] TRUE
is.null(NULL)
#> [1] TRUE

length(logical(0)) == 0
#> [1] TRUE
is.null(logical(0))
#> [1] FALSE

length(character(0)) == 0
#> [1] TRUE
is.null(character(0))
#> [1] FALSE

length(numeric(0)) == 0
#> [1] TRUE
is.null(numeric(0))
#> [1] FALSE

length(list()) == 0
#> [1] TRUE
is.null(list())
#> [1] FALSE

length(data.frame()) == 0
#> [1] TRUE
is.null(data.frame())
#> [1] FALSE

# not to mention...
length(c(NA, NA, NA, NULL, NULL, logical(0)))
#> [1] 3

:cold_sweat: The R Inferno is one classic source of warnings about such gotchas.

P.S. Tiny friendly tip: around here we encourage people to format their code as code to make it easier to read. You can follow the instructions in this FAQ, or just use the little </> button at the top of the posting box. :grin: