How to test for is_nullish / is_naish / falsy and friends?

I'm wasting a lot of time building intricate control structures to avoid the horror that is:

  • character(0) and friends (empty objects)
  • ""
  • NULL
  • NA
  • NaN
  • ...

I understand these are all, in principle, different things, but for some uses in pkg development (such as naming or labelling stuff), they're really all the same.

Perhaps with additional arguments, so it could be used in a more fine-grained way?

is_nullish(
  x,
  empty = TRUE, 
  # would indicate that a character(0) etc would be considered nullish
  emptry_str = TRUE,
  null = TRUE,
  na = TRUE,
  nan = TRUE
)

Such fine-grained control would allow users (and devs) to easily adjust what exactly they mean with some fuzzy *-ishness.

Existing work:

Do other people have the same need or am I doing something wrong?

Such a tool could save me a lot of lines and bugs.

I'm looking for a tool that:

  • is lightweight and robust
  • plays nice with the tidyverse (especially purrr)
  • is perhaps a part of a broader package of predicates (I often find myself hunting around the tidyverse and base for elegant, robust predicates ...)

I think it makes more sense to specify what the argument should be than what it should not be.

With my tool of choice, for example, if I want a character vector of length one with a nonempty string...

library(vetr)
NZCHR = vet_token(nzchar(.), "%sshould not be an empty string, but it is.")
GOODCHR = quote(CHR.1 && NZCHR)

f = function(x){
  vetr(GOODCHR)
  cat(x, "\tOh and Hello World!\n")
}

Usage

> f("Huzzah")
Huzzah  Oh and Hello World!
> f(c("Huzzah", "Yahoo"))
Error in f(x = c("Huzzah", "Yahoo")) : 
  For argument `x`, `length(c("Huzzah", "Yahoo"))` should be 1 (is 2)
> f(character(0))
Error in f(x = character(0)) : 
  For argument `x`, `length(character(0))` should be 1 (is 0)
> f("")
Error in f(x = "") : 
  For argument `x`, `""` should not be an empty string, but it is.
> f(NULL)
Error in f(x = NULL) : 
  For argument `x`, `NULL` should be type "character" (is "NULL")
> f(NA)
Error in f(x = NA) : 
  For argument `x`, `NA` should be type "character" (is "logical")
> f(NaN)
Error in f(x = NaN) : 
  For argument `x`, `NaN` should be type "character" (is "double")

There are many other "assertion" packages similar to vetr noted on its home page. Regarding 'verse compatibility, with vetr it's taken into account in providing a tev function that goes well with pipes; and there are no dependencies in the version now on CRAN.

1 Like

yes, of course, thanks so much @Frank – that makes a lot more sense to test for what you want, rather than the (fuzzy) inverse.
Thanks for the clear thinking.

I often use checkmate which, via it's test_*() family, in effect, is the package of predicates I was looking for.

that's a fantastic overview of all the input validation and predicate packages by @Frank, btw.

Though the fantastic choice perhaps is a bit overwhelming for relative newbie pkg devs such as myself.

I'd love to accept this as a solution @Frank, but that doesn't seem to be possible for now for some reason. (I can accept solutions on other questions of mine. Will check back later, maybe it's a time thing).

I retired falsy, because it was often misleading. Exactly as you say, there is no good definition for a single function or operator that works well for all use cases.

What I usually do nowadays, is defining simple operators for the use cases that I need in a package. I don't know if it is a good idea to collect them in a single package. E.g. for NULL I usually have %||% (this is actually a common pattern among package authors). For scalar NA (or any type) I have this: %|NA|%. For empty vectors %|0|%, etc. I don't love this solution, but the best I could come up with so far.

TBH I don't love the is_nullish function with the many parameters, either, it is somewhat tedious to write, and not so easy to read. I prefer identifying the common use cases, and defining a function/operator for each.

1 Like

Thanks @Gabor that makes a ton of sense.
Focusing on what some argument should be is also often easier, now that @Frank made the case, and there are plenty of predicate packages out there for that.
(A bit too many for my taste actually..)