Will case_when stay in dplyr?

dplyr

#1

I wasn't really sure where to ask this, and I didn't want to create a dplyr ticket for it. I still hope some of the dplyr staff read it.

My situation: I mentain a few R packages at my workplace that do some basic data wrangling tasks. I am trying to get rid of all dplyr dependencies in my packages since a) it's a rather heavy dependency, and b) everyone here uses data.table anyways and there is a lot functional overlap.

The Problem is now I love case_when(). Is there any chance It will one day end up in a different/lighter package? I am asking this because I feel it does not really belong all that much into dplyr anyways and feels more like a general utility function.


#2

So dplyr is heavy, but data.table isn't? :slight_smile:

I'm not sure I agree that case_when doesn't belong in dplyr. One of the goals of dplyr is to provide you a way to do the same thing both in-memory and on a remote DB. And in pretty much every DB you have this construct, so it make sense to have it in the same package.

That being said, what kind of functionality do you think might be in this lighter package along with case_when?


#3

dplyr requires RCPP, takes quite a while to compile, and comes with quite a few other dependencies.
data.table is written in C and only requires methods, which is a base R package

I am not sure, where it would fit better, that's why I am asking. There is also a tidyverse vector package coming up.


#4

That's true, but I already had problems with installing data.table on different machines, even though it requires fewer dependencies, so that's why I've said it. But you are right, dplyr is really slow to install.

What is this vector package you've mentioned? I don't see it in tidyverse github.


#5

It's an experimental package in r-lib. case_when() doesn't really fit into the vectors wheelhouse, from my understanding.


#6

I stalk headly a lot on github ;). I personally prefer data.table for most applications, but even if I didn't, the rest of our statistics unit works with data.table so it makes sense not to force dplyr on them.

@mara right now I also think it fits best in dplyr, I just find it a bit sad because it's really useful outside dplyr, and dplyr has (comperatively) heavy dependencies.


#7

Well, this is open source software (MIT license, specifically)... you can always fork just case_when() into a small utility package of your own.


#8

that's not a bad idea actually :slight_smile:


#9

In case anyone follows this and cares, it was remarkably easy to extract case_when from dplyr. I published a package on github now but I will eventually try to get it on cran after some testing


#10

Hmm i would like to submit my case_when/if_else fork to CRAN now. I added

Copyright (c) for portions of lest are held by RStudio and others, 2013-2015 as part of the project dplyr.

to my LICENSE.md. is this sufficient attribution for you @hadley ? It is really mainly a copy and paste job from dplyr except for replacing some rlang and glue calls with base equivalents.


#11

Hey,

Know that there is a project for providing dplyr verbs without the dependencies. Your needs and ideas for lest could be interesting and fits in it


#12

Hmm I've come across noplyr before but the scope is a bit different. I care about case_when() and if_else() especially because they are so useful outside the dplyr context. A package that is closer to what I want to achieve is freebase (calling @hrbrmstr), but that one also has a slightly different focus with its "usethis for utility functions" approach.


#13

Yes, {freebase} is a similar :package: and both README mentioned each other. {freebase} is more for package developer to easily add the functions with no dependencies in their packages. It is why I only mentionned the first as you are more oriented toward users from my understanding.

I think you could fit in one of them, or at least they could benefit you.


#14

Thanks for the hints, I was actually planning on asking @hrbrmst whether he wanted to copy my modified case_when() to freebase.

Despite that I think those two functions make up a nice piece of functionality that stand on its own. I am kinda wondering if it would be possible to make a more efficient version of case_when() based on data.table's subset assignment feature though...