Apache Arrow: shared in-memory data object

This interesting blog post ("Arrow and Beyond") from @jjallaire last year, and more recently this post from @javierluraschi on Sparklyr, allude to using Apache Arrow for R and the potential for a shared in-memory object, eliminating the need for file-based data objects in order to achieve process/ language interoperability.

Is this possible yet or is there an example or vignette about how this would be done?

I keep an eye on the r section of the arrow/r github repo, but it seems to be a work in progress with understandably scant documentation.

1 Like

arrow is a storage project to solve data.frame storage in distributed mode.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Yes, it is possible. This conversion from R to Arrow and from Arrow to R is what got implemented in sparklyr. That said, you can use the arrow package to read/write in a similar way.

The following example shows how to convert from a data frame from R into Arrow, then from Arrow back to R.


At this point, is up to you what to do with the Arrow representation. You can save it to disk, or send it to a different system/language over the network. For instance, you could read this back in Python using something similar to: python/data.html#record-batches.

1 Like