Appending rows to large datatable

andy_choens · January 25, 2021, 2:47pm

We develop apps for primary care providers who work for a large network of providers. Some of our apps provide both summary statistics on a population AND patient-level data. We have an ETL tool which keeps this data updated, and many of these are quite large, so a pin is not the solution I am looking for.

Where possible, we offload calculations to the database server for our SHINY apps. But we run into a problem when some of our patient-level tabs involve thousands and thousands of rows. Right now, our apps use a reactive function to build the SQL query and collect the data. For example:

library(DBI)
library(DT)
library(tidyverse)
library(shiny)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

## These two lines are for demonstrative purposes only.
## The data is already on the database.
data(diamonds)
dbWriteTable(con, "diamonds", diamonds)

## But I do link to the data, using tbl, and then use a custom function which
## I tend to call get_* to download the data.
diamonds_linked <- tbl(con, "diamonds")

ui <- fluidPage(
    titlePanel("A large, but simple table."),

        # Show a plot of the generated distribution
        mainPanel(
            DTOutput("a_table")
        )
)

server <- function(input, output) {
    
    get_diamonds_linked <- reactive({
        ## In this function, I would handle any filtering and what not, based on 
        ## user input. So, I need _something_ like this, but I need to be able 
        ## to return a partially complete result, fetch more, and then update my
        ## data table.
        diamonds_linked %>% collect()
    })

    output$a_table <- renderDT({
        ## This works, but for views of large data sets, it is slow.
        datatable(get_diamonds_linked())
    })
}

shinyApp(ui = ui, server = server)

I would say this "app" is emblematic of what we are doing now. The problem is that this is often quite slow for large populations. What I would like to do instead is this:

Run the query on the server (this will be fast)
Download the first 1,000 rows.
Display a DT using only the first 1,000 rows of data.
Go back and fetch the next 1,000 rows and then append these results to the datatable.
Rinse, lather, repeat until done or the user changes the filters (again).

Once the user gets the population "right", based on their clinical goals, resources, etc. they will tend to export the data to Excel from our app as a list of patients to target.

And the summary part of the app works well, because I can keep the processing on the database and only download the summary stats. But the patient-level tab is awful, because I'm running a full collect() before displaying the datatable.

How I can build these results more incrementally so the app feels fast to the end user?

technocrat · January 26, 2021, 7:14am

A FWIW:

data(diamonds, package = "ggplot2") 
N <- 1000
rows <- nrow(diamonds)
chunks <- parallel::splitIndices(rows, ncl = ceiling(rows/N))

should be fast in chunking to a list of one remainder and N 1000-row indices, then map over the chunklist to output, perhaps also using {parallel}

ismirsehregal · January 26, 2021, 1:05pm

To me it seems you are looking for DBI::dbFetch which gives access to chunks of the (not yet complete) result set. However, if you want to update your datatable in parallel you might need to fetch the data asynchronously. Furthermore, you might want to check DT::dataTableProxy along with DT::replaceData - this avoids re-rendering the table (and therefore is faster).

The following isn't using dataTableProxy and also isn't async (here you can find some info on that) but it shows the principle of using dbFetch along with reactivePoll:

library(DBI)
library(DT)
library(tidyverse)
library(shiny)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

## These two lines are for demonstrative purposes only.
## The data is already on the database.
data(diamonds)
dbWriteTable(con, "diamonds", diamonds)

ui <- fluidPage(titlePanel("A large, but simple table."),
                mainPanel(DTOutput("a_table")))

server <- function(input, output, session) {
  diamond_chunks_combined <- reactiveVal()
  
  rs <- dbSendQuery(con, "SELECT * FROM diamonds LIMIT 9;")
  
  diamond_chunks <- reactivePoll(1000, session,
    checkFunc = function() {
      if (dbIsValid(rs)) {
        if (dbHasCompleted(rs)) {
          dbClearResult(rs)
          return(TRUE)
        } else {
          return(Sys.time())
        }
      }
    },
    valueFunc = function() {
      if (dbIsValid(rs)) {
        return(dbFetch(rs, 3))
      }
    }
  )
  
  observeEvent(diamond_chunks(), {
    diamond_chunks_combined(rbind(diamond_chunks_combined(), diamond_chunks()))
  })
  
  output$a_table <- renderDT({
    datatable(diamond_chunks_combined())
  }, server = FALSE)
}

shinyApp(
  ui = ui,
  server = server,
  onStart = function() {
    onStop(function() {
      dbDisconnect(con)
    })
  }
)

screen

system · March 21, 2021, 5:05pm

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.