Should I split my package into two?

mrmallironmaker · May 24, 2020, 3:22pm

Hi Rstudio Community!

I'm developing a package, and I'm wondering whether to split it into two separate packages where one depends on the other.

I am a US graduate student researcher of human behavior (in particular, behavior in virtual and augmented reality). Due to a hold on any human subjects studies for health reasons, I'm developing an R package covering much of the 3D data manipulation and plotting we do in the lab. For example, given head position and orientation, we want to measure the angle between the direction someone is looking and where some salient object is.

The first part of the package (mostly finished) is 3D vector and orientation classes, with corresponding methods like dot product, cross product, all the ways you would want to specify a rotation, etc etc. This is implemented with the vctrs library for creating R vector types.

The second part of this package (still under development) is a lot of opinionated plotting functions built on top of ggplot2 and even gganimate, allowing the user to specify something like "forward is positive Z, and up is positive Y, and the coordinate system is left-handed" and then have a function answer the question "what does this dataset look like from the top? or the left? or the front?"

I can see the first part - the 3D columns package - being useful to people on its own, even without plotting functions. So that's what gives me pause

Broadly, what should a package do? Should it be skewed towards the smallest reasonable lump of problem-solving, or should it skew bigger?

In particular, should these two themes be in one package, should they be separate, or should this question be addressed later because I'm two weeks into development and no one else has used this code yet?

Fer · May 25, 2020, 12:26am

I would follow what you comment in the last part of your post. It is probably too early.
Besides, do you have clear objectives for the package, or it may grow in still an unknown way in the future?
And second, dependencies. If the part of the code devoted to plots, adds a lot of dependencies, it may be nicer to let the final user to decide if he want to use it or not (he may have its own way) even if it is very likely that those dependencies may be already installed.
Yes, for most of us internet for downloading dependencies, and disk space for storing them is not a big issue, but specially when working with this kind of sensitive data (ie human related), people may have issues to freely install stuff due restriction policies from IT, or even lack internet connection on the working computer for security reasons (we have seen this kind of problems here from time to time)

So, as you say, wait a bit, and enjoy coding

phil_hummel · May 25, 2020, 1:18am

I like the microservices model that is consistent with this statement

you have already demonstrated that this is probably true by using the 3D columns package with both ggplot2 and gganimate

jimhester · May 25, 2020, 12:56pm

I always recommend keeping things as simple as you can unless someone (yourself or users) find you absolutely need the added complexity.

In this case I would keep everything in the same package for as long as you can. If later you find it awkward to use as a single package yourself, or users suggest you split it up then you can do so then.

mrmallironmaker · May 25, 2020, 2:12pm

That's an interesting take. I felt like two packages would reduce the complexity (at least to me) but by most measures ot would increase complexity. It seems "when in doubt, wait it out" applies the most here.

phil_hummel · May 25, 2020, 2:30pm

I've only used packages with a small group of developers on "internal" projects. The rigors of releasing and maintaining packages through CRAN would probably influence my comments.