pivot control tables

There is a lot of excitement around tidyr's adoption of pivot wide/long terminology and of pivot control tables. I thought I would share some articles on how to design and use such tables (examples are in terms of the cdata package, which is where the ideas were developed):

4 Likes

John,

Thanks for pushing the envelope in data manipulation options, such as you did with the cdata package. I have enjoyed reading your documentation and articles around the ideas in that package in particular -- I have read them several times in the past as I've tried to build my mental model of data frames, especially in relation to database or data cube models. I haven't had occasion yet to use complex pivots requiring this approach, but I learned a lot reading your articles/vignettes.

One topic that your writing has helped/spurred me to think about is the difference between index columns versus measure columns, which I think is an area for potential innovation in how calculations are commonly done in R. The various invariant dataframe representations you describe also seem to have potential representations as n-dimensional arrays or matrices, with named indices. It makes me wonder whether, for example, it would be possible to use S3/vctrs tools to abstract the particular rows/columns combos as an implementation detail for coordinatized data.

thanks,
Jameel

The higher-order transforms are more rare, pivoting to/from row-records captures most of the common cases. This is probably why these methods have been so popular.

Our feeling is the control table captures what has really been going on the whole time and gives one a tool to reason about the transforms.

A great example of a higher order transform can be found here (video here).

As far as new features, we have actually been trying to only add what we and our clients need. So we will probably wait for a concrete use-case need.

Ha, not a feature request, don't worry!

1 Like

:smile:

I am kind of hesitant to mention: but Appendix A of our Practical Data Science with R has a brief discussion of column roles (key, provenance, payload, experimental design, and derived). I would not get the book for that (it is only a page and we may someday develop the ideas in a free forum). But definitely a lot more ideas to be had on "what are columns actually doing."

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.