Confusion on DBI Commands

dlsweet · October 3, 2017, 5:56pm

Due to the connections tab in RStudio 1.1, I've been looking into switching over to the DBI package for connecting to SQL Server Database (not a huge deal as I've only been working on learning this for a short time). I'm a little confused on some of the differences in the dbGetQuery() and dbSendQuery(). The way that it seems to me is that dbGetQuery() is simply dbSendQuery() and dbFetch() combined. Is the only difference that dbGetQuery() will display and store the results whereas dbSendQuery() will only store the results?

mmuurr · October 3, 2017, 8:01pm

In spirit, dbGetQuery can be replicated with this sequence of calls:

rs <- dbSendQuery(conn, ...)
x <- dbFetch(rs, n = -1, ...)
dbClearResult(rs)

Note that the n = -1 portion in dbFetch returns all rows into x.
One of the scenarios for which the send, fetch, and clear idiom is used instead of dbGetQuery is when you want to process the resultant data frame in batches, e.g.

chunk_size <- 10e3
rs <- dbSendQuery(conn, ...)
while(!dbHasCompleted(rs)) {
  dbFetch(rs, n = chunk_size) %>% analysis_code_goes_here
}
dbClearResult(rs)

If you want each chunk to return some value, you could append to a list, then perhaps bind those results into a single value, so forth and so on.

dlsweet · October 3, 2017, 8:43pm

Thank you! That explains it pretty well, I was having trouble figuring out why you wouldn't want to do it all at once but I didn't even consider needing to chunk the data.