Shiny App Perfomance improvement

I have a shiny app with reactivePoll for fetching data from MySQL server. Rendering plots is taking too long, 5 to 8 minutes. I have profiled the App with profVis and realized that the code which is consuming time in the App is loading data from reactivePoll for all the plots. How can i improve the performance of this app.

The connection and query code is as follows:

con <- dbConnect(MySQL(),
                 user = ('root'),
                 password = ('ruvimboML55AMG'),
                 host = ('localhost'),
                 dbname = ('healthcare_mining'))

onStop(function(){
  dbDisconnect(con)
})

get_data <- function(con) {
  MBA_Online <- dbGetQuery(con, "SELECT Transaction,Item, Date, Quantity, CustomerID, UnitPrice, Amount FROM onlineRetail WHERE Amount > 0 and Quantity > 0;")
  # return(MBA_Online)
  
}
 # Reactive poll for main data
  MBA_Online <- reactivePoll(1800000, session, 
                           checkFunc = function(){
                             print("Entered Check")
                             Sys.time()
                             print(Sys.time())
                             # get max date from database table to determine if data has been updated
                             max_date <- dbGetQuery(con, "SELECT UNIX_TIMESTAMP(date_updated) FROM onlineRetail as last_updated;") 
                             return(max_date)
                           },
                           valueFunc = function(){
                             print("Entered Value")
                             Sys.time()
                             print(Sys.time())
                             get_data(con)
                             # return(MBA_Online)
                           }
                                                
  )

``` r

I then render the plots as follows:

  ## Graph of transactions per month
  output$TransMonthPlot <- renderPlot({
    req(credentials()$user_auth)
    withProgress(message = 'Calculation in progress',
                 detail = 'This may take a while...', value = 0, {
                   for (i in 1:15) {
                     incProgress(1/15)
                     Sys.sleep(15)
                   }
                   
                 })
    
    **g1 <-  MBA_Online() %>%**
      mutate(tDate=as.Date(Date)) %>%
      filter(tDate >= as.Date(input$dRange[1]) & tDate <= as.Date(input$dRange[2])) %>%
      # dplyr::mutate(Month = as.factor(month(as.Date(Date)))) %>%
      mutate(Month=as.factor(month(Date))) %>%
      # dplyr::rename(item = Item) %>% 
      dplyr::group_by(Month) %>%
      # dplyr::summarise(n_distinct(Transaction)) %>%
      dplyr::summarize(Transactions = n_distinct(Transaction)) %>%
      # dplyr::summarise(Count=n()) %>%
      ggplot(aes(x=Month, y = Transactions, fill = Month)) +
      geom_bar(stat="identity") +
      geom_label(aes(label= format(Transactions, big.mark = ",")))+
      theme(legend.position="none")+
      theme(panel.background = element_blank())+
      labs(x = "Month", y = "Transactions", title = "Transactions per month")
    
    ## Plot items per transaction per month
    **x1 <-  MBA_Online() %>%**
      mutate(tDate=as.Date(Date)) %>%
      filter(tDate >= as.Date(input$dRange[1]) & tDate <= as.Date(input$dRange[2])) %>%
      # dplyr::mutate(Day = as.factor(weekdays(as.Date(Date)))) %>% 
      dplyr::mutate(Month = as.factor(month(Date))) %>%
      group_by(Month) %>% 
      dplyr::summarise(Count = n()) 
    
    **x2 <- MBA_Online() %>%**
      mutate(tDate=as.Date(Date)) %>%
      filter(tDate >= as.Date(input$dRange[1]) & tDate <= as.Date(input$dRange[2])) %>%
      # dplyr::mutate(Day = as.factor(weekdays(as.Date(Date)))) %>% 
      dplyr::mutate(Month = as.factor(month(Date))) %>%
      group_by(Month, Transaction) %>%
      dplyr::summarise(n_distinct(Transaction)) %>%
      dplyr::summarise(Count =n())
    
    x3 <- data.frame(x1, x2[2], x1[2]/x2[2])
    
    colnames(x3) <- c("Month", "Line", "Unique", "Items.Trans")
    
    g2 <- ggplot(x3,aes(x=Month, y = Items.Trans, fill=Month))+
      theme_fivethirtyeight()+
      geom_bar(stat = "identity")+
      # labs(x = "Month", y = "Transactions", title = "Items per transaction per Month")+
      # ggtitle("Items per transaction per Month")+
      theme(legend.position = "none")+
      geom_text(aes(label=round(Items.Trans,0)), vjust=2) +
      labs(
        x="Month",
        y="Items",
        title = paste(
          "Items per transaction per Month"
        )
      )
    
    grid.arrange(g1,g2)
    
  }, height = 600, width = 800)

The reading of reactive data is the process taking too long i.e. the  MBA_Online() marked with *** in the code 

Regards

Hi. 5-8 minutes indeed sounds slow! Just to simplify things, it might be interesting to take the reactivePoll out of the app to make sure that nothing changes in terms of performance. My hypothesis is that it's the get_data that's going to be the bottleneck.

In terms of "what now?" there are a few options. I'll confess that I'm not up-to-speed on the current recommendations here, so I'm just winging it, but you could consider having two reactive pollers:

  • One that checks the DB for updates and pulls down the data locally in an .RDS file.
  • One that checks the RDS file for updates and regenerates the graphs.

This would allow you to use something like processx in the first poller to go spawn this task that takes 5-8 minutes to collect all the data. Then your Shiny app itself is going to stay responsive and will only update once it notices that the RDS file gets updated.

One minor complication: if writing to the RDS file takes a while, your app might get confused when it sees a new RDS file that's still being written to. You could have the first poller write the data to someotherfile.rds and then use file.rename to move the file to the real location. On most file systems, that move is "atomic" so you won't risk catching any writes mid-flight.

Many thanks for your input, most appreciated. I have another version of the App which loads data from MySQL and write the data to a .csv file on disk. Data is then read from the .csv file. Plot rendering in this instance actually more than doubles from 5 minutes to about 13.5 minutes.

While profiling the App the bottleneck is the same as the reactivePolling App

Oh, great. So the good news is that we've definitely found the source of the performance issues: loading all that data. CSV isn't terribly fast -- especially if you're just loading using read.csv. Take a look at a more efficient format/reader; I'd be curious to see if any of them help? https://blog.dominodatalab.com/the-r-data-i-o-shootout/

One new entrant to the race is the vroom package.

A small improvement would be to change your check function, you are polling too much data, you just need the maximum value not the whole column.

Have you tested your sql query with other tools? Does it take less time? If you are fetching large amounts of data then maybe you could preprocess your data in the sql server and fetch just the result, you can even take a look into "materialized views" to speed up things, if you don't need to update your data very often.

Many thanks, changing the check function to only retrieve the maximum value resulted in a minute gain in plot rendering for most of the plots.

I have tested my sql query with a third party tool RazorSQL and retrieval of data from a cloud MySQL server takes 14s for some 430 000 records.

Have you tested with the same server your app is running ? In your example you are using your localhost

I am using localhost and a remote server alternating between the two as i have to deploy the app on shinyapps.io. The test was on the remote server

Many thanks for the input, i will have a look at the different packages

Maybe it is a problem with the driver you are using for connecting to mysql server from R, RMySQL is being phased out in favor of the new RMariaDB package, give it a try and see if speeds things up.

Thank you very for your input. In fact the third party tool i am using to connect and retrieve data from my remote MySQL server uses org.mariadb.jdbc.Driver. I have been looking for connection examples to MySQL using RMariaDB package and i am struggling to get these

Connecting to MySQL using RMariaDB on macOS Mojave 10.14.5 and R results in an error message:

Error in result_fetch(res@ptr, n = n) : Error fetching buffer:
Called from: result_fetch(res@ptr, n = n)