Will case_when stay in dplyr?

hoelk · August 2, 2018, 7:08am

I wasn't really sure where to ask this, and I didn't want to create a dplyr ticket for it. I still hope some of the dplyr staff read it.

My situation: I mentain a few R packages at my workplace that do some basic data wrangling tasks. I am trying to get rid of all dplyr dependencies in my packages since a) it's a rather heavy dependency, and b) everyone here uses data.table anyways and there is a lot functional overlap.

The Problem is now I love case_when(). Is there any chance It will one day end up in a different/lighter package? I am asking this because I feel it does not really belong all that much into dplyr anyways and feels more like a general utility function.

mishabalyasin · August 2, 2018, 8:33am

So dplyr is heavy, but data.table isn't?

I'm not sure I agree that case_when doesn't belong in dplyr. One of the goals of dplyr is to provide you a way to do the same thing both in-memory and on a remote DB. And in pretty much every DB you have this construct, so it make sense to have it in the same package.

That being said, what kind of functionality do you think might be in this lighter package along with case_when?

hoelk · August 2, 2018, 9:24am

dplyr requires RCPP, takes quite a while to compile, and comes with quite a few other dependencies.
data.table is written in C and only requires methods, which is a base R package

I am not sure, where it would fit better, that's why I am asking. There is also a tidyverse vector package coming up.

mishabalyasin · August 2, 2018, 9:46am

That's true, but I already had problems with installing data.table on different machines, even though it requires fewer dependencies, so that's why I've said it. But you are right, dplyr is really slow to install.

What is this vector package you've mentioned? I don't see it in tidyverse github.

mara · August 2, 2018, 10:27am

It's an experimental package in r-lib. case_when() doesn't really fit into the vectors wheelhouse, from my understanding.

hoelk · August 2, 2018, 10:46am

I stalk headly a lot on github ;). I personally prefer data.table for most applications, but even if I didn't, the rest of our statistics unit works with data.table so it makes sense not to force dplyr on them.

@mara right now I also think it fits best in dplyr, I just find it a bit sad because it's really useful outside dplyr, and dplyr has (comperatively) heavy dependencies.

jcblum · August 2, 2018, 12:55pm

Well, this is open source software (MIT license, specifically)... you can always fork just case_when() into a small utility package of your own.

hoelk · August 2, 2018, 1:44pm

that's not a bad idea actually

hoelk · August 8, 2018, 12:33pm

In case anyone follows this and cares, it was remarkably easy to extract case_when from dplyr. I published a package on github now but I will eventually try to get it on cran after some testing

hoelk · August 10, 2018, 2:40pm

Hmm i would like to submit my case_when/if_else fork to CRAN now. I added

Copyright (c) for portions of lest are held by RStudio and others, 2013-2015 as part of the project dplyr.

to my LICENSE.md. is this sufficient attribution for you @hadley ? It is really mainly a copy and paste job from dplyr except for replacing some rlang and glue calls with base equivalents.

cderv · August 10, 2018, 3:49pm

Hey,

Know that there is a project for providing dplyr verbs without the dependencies. Your needs and ideas for lest could be interesting and fits in it

hoelk · August 10, 2018, 4:04pm

Hmm I've come across noplyr before but the scope is a bit different. I care about case_when() and if_else() especially because they are so useful outside the dplyr context. A package that is closer to what I want to achieve is freebase (calling @hrbrmstr), but that one also has a slightly different focus with its "usethis for utility functions" approach.

cderv · August 10, 2018, 4:09pm

Yes, {freebase} is a similar and both README mentioned each other. {freebase} is more for package developer to easily add the functions with no dependencies in their packages. It is why I only mentionned the first as you are more oriented toward users from my understanding.

I think you could fit in one of them, or at least they could benefit you.

hoelk · August 10, 2018, 9:59pm

Thanks for the hints, I was actually planning on asking @hrbrmst whether he wanted to copy my modified case_when() to freebase.

Despite that I think those two functions make up a nice piece of functionality that stand on its own. I am kinda wondering if it would be possible to make a more efficient version of case_when() based on data.table's subset assignment feature though...