Stream query result from database (as dataframe)

dplyr
rmarkdown
dbplyr

#1

Hi all,

I'm trying to fetch a large set of query results from a database, although I don't need to store them all in memory at any given time. I was hoping it would be possible to stream them (unsure if this is the correct terminology) in chunks (of e.g. 1,000 rows) and process them in R as they come in, therefore keeping only the latest chunk in memory.

Has anyone tried this with success before? Is it possible to do this in tandem with Rmarkdown's SQL code chunks (return a lazy iterable, rather than an actualised dataframe from output.var)?

Thank you for any help.


#2

I'm not aware of any package that does it out of the box, but you can take a look at streams functionality in sparklyr that was just recently introduced (blog).
Not sure if it is the best solution in your case since you'll still need to create stream that sparklyr will consume, but at least it's one of the ways this can be done.


#3

Hmmm that doesn't seem super fluent, but definitely doable. Thanks for the advice.

Have you had any experience with dbplyr? I'm not too familiar myself but I was hoping there might be some solution where you could handle the dataframe, and only instatiate it (chunkwise) as needed.

I've seen similar things in python ORMs (Django, Peewee). Although they tend to leverage python's generators, so I'm not sure how doable such a thing is in R


#4

DBI::dbFetch should do the job.