Apache Arrow: shared in-memory data object

This interesting blog post ("Arrow and Beyond") from @jjallaire last year, and more recently this post from @javierluraschi on Sparklyr, allude to using Apache Arrow for R and the potential for a shared in-memory object, eliminating the need for file-based data objects in order to achieve process/ language interoperability.

Is this possible yet or is there an example or vignette about how this would be done?

I keep an eye on the r section of the arrow/r github repo, but it seems to be a work in progress with understandably scant documentation.

1 Like

arrow is a storage project to solve data.frame storage in distributed mode.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Yes, it is possible. This conversion from R to Arrow and from Arrow to R is what got implemented in sparklyr. That said, you can use the arrow package to read/write in a similar way.

The following example shows how to convert from a data frame from R into Arrow, then from Arrow back to R.

test-recordbatchreader.R#L21-L42

At this point, is up to you what to do with the Arrow representation. You can save it to disk, or send it to a different system/language over the network. For instance, you could read this back in Python using something similar to: python/data.html#record-batches.

1 Like