Performance problems in R's Shiny on huge (?) dataset

leaflet
shiny-server

#1

This is a copy from here: https://stackoverflow.com/questions/49872128/performance-problems-in-rs-shiny-on-huge-dataset

I have a dataset of ~10.000 address pairs (origin, destination) which consists of two sources - a database and a CSV-file. I am visualizing those pairs of addresses by two different marker types and I visualize the connections between those pairs with a line. It's possible to toggle the visibility of origins, destinations and connections. It's also possible to draw a polygon on the map to frame markers and then visualize the corresponding markers and connections (you can choose if the polygon should frame origins, destinations or both). And it's possible to toggle the datasource (CSV or database) and choose data by date.

All of this works quite well, i just wanted to make clear where and that i need to use reactive values. But the performance is way to slow. It takes a lot of time to load this application when running it with RStudio and in could not be loaded on Shiny Server because the connection breaks down. I'm don't use the Pro version of Shiny Server where the timeout is not settable out of the box.

I tried to speed up the application by using the leafletProxy as often as possible.

df.data.db <- getDataFromDb() #external function
df.data.csv <- getDataFromCsv() #external function
df.data.total <- rbind(df.data.db,df.data.csv)

ui <- bootstrapPage(
  tags$style(type = "text/css", "html, body {width:100%;height:100%}"),
  tags$head(tags$style(HTML('.dest {color: rgba(11, 221, 25, 0.7);}'))),
  tags$head(tags$style(HTML('.orig {color: rgba(255,100,20);}'))),
  leafletOutput("map", height = "85%"),
  fluidRow(
    column(
      3,
      p(tags$b("Datasets")),
      materialSwitch(inputId = "useDatabase", label = "database",value=TRUE),
      materialSwitch(inputId = "useExcel", label = "excel",value=TRUE)),
    column(
      3,
      p(),
      dateRangeInput('dateRange',
                     label = 'Date range input: yyyy-mm-dd',
                     start = "2016-12-26",
                     end = Sys.Date(),
                     min = "2016-12-26",
                     max = Sys.Date()),
      p(),
      textOutput("number_of_data")
    ),
    column(3,
           p(),
           actionButton("remove", "Remove shapes")),
    column(3,
           p(tags$b("Connections")),
           textOutput("number_of_connections"))
  )
)

server <- function(input, output, session) {

  reactiveData <- reactiveValues(
    markers = data.frame(lat = numeric(), lon = numeric()),
    allPoly = data.frame(lat = numeric(), lon = numeric()),#should polygon frame all markers
    origPoly = data.frame(lat = numeric(), lon = numeric()),#only origin markers
    destPoly = data.frame(lat = numeric(), lon = numeric()),#only destination markers 
    shapeState = "poly_all",#what polygon type is drawn
    connections=0
  )
  #used subset of data depending of the chosen date
  mydata <- reactive({
    base = base_data()
    from <- input$dateRange[1]
    to <- input$dateRange[2]
    return(base[base$date>=from & base$date<=to,])
  })
  #choose data source (csv or db)
  base_data <- reactive({
    mydf = data.frame(orig_lat=numeric(),
                      orig_lon=numeric(),
                      dest_lat=numeric(),
                      dest_lon=numeric(),
                      date=as.Date(character()))
    if(input$useExcel==TRUE && input$useDatabase==TRUE)
      mydf = df.data.total
    else if(input$useExcel==FALSE && input$useDatabase==TRUE)
      mydf = df.data.db
    else if(input$useExcel==TRUE && input$useDatabase==FALSE)
      mydf = df.data.csv
    reactiveData$connections <- nrow(mydf)
    return(mydf)
  })
  #show / hide connections
  observe({
    leafletProxy("map",session = session) %>%
      clearShapes() %>%
      clearGroup("Connections")
    conn.data <- mydata();
    for(i in 1:nrow(conn.data)) {
      row <- conn.data[i,]
      leafletProxy("map",session = session) %>% addPolygons(lat=c(row$orig_lat,row$dest_lat),lng=c(row$orig_lon,row$dest_lon),group="Connections",weight=0.5) 
    }
  })
  #remove all customized stuff
  observeEvent(input$remove,{
    reactiveData$markers <- data.frame(lat = numeric(), lon = numeric())
    reactiveData$allPoly <- data.frame(lat = numeric(), lon = numeric())
    reactiveData$origPoly <- data.frame(lat = numeric(), lon = numeric())
    reactiveData$destPoly <- data.frame(lat = numeric(), lon = numeric())
    reactiveData$shapeState <- "poly_all"
    reactiveData$connections<-0
    leafletProxy("map",session = session) %>%
      clearShapes() %>%
      clearGroup("polygon") %>%
      clearGroup("polymarkers")%>%
      clearGroup("polyconnections") %>%
      showGroup("Origins") %>%
      showGroup("Destinations") %>%
      clearGroup("tempmarkers") 
  })
  #my map
  output$map <- renderLeaflet({
    leaflet(data=mydata()) %>%
      addTiles()%>%
      setView("7.126501","48.609749", 10) %>%
      addMarkers(
        lng=~dest_lon,
        lat=~dest_lat,
        icon = uix.destMarker,
        group = "Destinations",
        layerId = "dest_layer",
        clusterId = "dest_cluster",
        clusterOptions = markerClusterOptions(
          removeOutsideVisibleBounds = TRUE,
          iconCreateFunction=js.destclusters
        )) %>% 
      addMarkers(
        lng=~orig_lon,
        lat=~orig_lat,
        icon = uix.origMarker,
        group = "Origins",
        layerId = "orig_layer",
        clusterId = "orig_cluster",
        clusterOptions = markerClusterOptions(
          removeOutsideVisibleBounds = TRUE,
          iconCreateFunction=js.origclusters
        )) %>% 
      addLayersControl(overlayGroups = c("Origins","Destinations","Connections")) 
  })
  #print markers for polygon on map
  observeEvent(input$map_click,{
    leafletProxy("map",session = session) %>%
      hideGroup("Connections")
    if(nrow(reactiveData$allPoly)>0){
      reactiveData$markers <- data.frame(lat = numeric(), lon = numeric())
      reactiveData$allPoly <- data.frame(lat = numeric(), lon = numeric())
      reactiveData$origPoly <- data.frame(lat = numeric(), lon = numeric())
      reactiveData$destPoly <- data.frame(lat = numeric(), lon = numeric())
      reactiveData$shapeState <- "poly_all"
      reactiveData$connections<-0
      leafletProxy("map",session = session) %>%
        clearShapes() %>%
        clearGroup("polygon") %>%
        clearGroup("polymarkers")%>%
        clearGroup("polyconnections") %>%
        showGroup("Origins") %>%
        showGroup("Destinations") %>%
        clearGroup("tempmarkers") 
    }
    if(nrow(reactiveData$origPoly)>0 && nrow(reactiveData$destPoly)>0){
      showModal(modalDialog(
        title = "Wrong workflow",
        "Remove old shapes first!",
        easyClose = TRUE
      ))
    }
    else{
      click <- input$map_click
      clat <- click$lat
      clng <- click$lng
      reactiveData$markers[nrow(reactiveData$markers) + 1, ] = c(clat, clng)
      leafletProxy('map') %>% 
        addMarkers(lng = reactiveData$markers$lon,
                   lat = reactiveData$markers$lat,
                   group="polymarkers"
        )

    }
  })
  #change type of polygon by clicking on polygon. hiding connections by clicking on it
  observeEvent(input$map_shape_click,{
    click <- input$map_shape_click
    if(click$group=="Connections"){
      leafletProxy("map",session = session) %>%
        hideGroup("Connections")
      clat <- click$lat
      clng <- click$lng
      leafletProxy('map') %>%
        addMarkers(lng = clng,
                   lat = clat)
      reactiveData$markers[nrow(reactiveData$markers) + 1, ] = c(clat, clng)
    }
    else if(click$group =="polygon" && nrow(reactiveData$markers)==0){
      tmp <- data.frame(lat = numeric(), lon = numeric())
      if(reactiveData$shapeState=="poly_all") {
        reactiveData$shapeState<-"poly_orig"
        isolate(tmp<-reactiveData$allPoly)
        reactiveData$origPoly <- rbind(reactiveData$origPoly,tmp)
        reactiveData$allPoly<- data.frame(lat = numeric(), lon = numeric())
        #reactiveData$destPoly <- rbind(reactiveData$destPoly,data.frame(lat = numeric(), lon = numeric()))
      }
      else if(reactiveData$shapeState=="poly_orig") {
        reactiveData$shapeState<-"poly_dest"
        isolate(tmp<-reactiveData$origPoly)
        reactiveData$origPoly <- data.frame(lat = numeric(), lon = numeric())
        #reactiveData$allPoly <- data.frame(lat = numeric(), lon = numeric())
        reactiveData$destPoly <- rbind(reactiveData$destPoly,tmp)
      }
      else if(reactiveData$shapeState=="poly_dest") {
        reactiveData$shapeState<-"poly_all"
        isolate(tmp<-reactiveData$destPoly)
        #reactiveData$origPoly <- rbind(reactiveData$origPoly,data.frame(lat = numeric(), lon = numeric()))
        reactiveData$allPoly <- rbind(reactiveData$allPoly,tmp)
        reactiveData$destPoly <- data.frame(lat = numeric(), lon = numeric())
      }
      createConnections()
      leafletProxy('map') %>% # use the proxy to save computation
        clearGroup("polygon") %>%
        addPolygons(lat = tmp$lat, lng = tmp$lon, group="polygon",color = polyColor(),fillColor=polyColor())
    }
    else if(nrow(reactiveData$markers)>0){
      showModal(modalDialog(
        title = "Wrong workflow",
        "It's too late to change the type of your selection. Please clear shapes and draw again!",
        easyClose = TRUE
      ))
    }

  })
  polyColor <- reactive({
    if(reactiveData$shapeState=="poly_all") {
      return("black")
    }
    else if(reactiveData$shapeState=="poly_orig") {
      return("red")
    }
    else if(reactiveData$shapeState=="poly_dest") {
      return("green")
    }
  })
  createConnections <- reactive({
    reactiveData$connections<-0
    df.pois <- data.frame(lat=numeric(),lon=numeric())
    data <- mydata()

    allData <- data.frame(orig_lat=numeric(),
                          orig_lon=numeric(),
                          dest_lat=numeric(),
                          dest_lon=numeric(),
                          date=as.Date(character()))
    if(nrow(reactiveData$allPoly)>0){
      df.pois<-rbind(data.frame(lat=data$orig_lat, lon=data$orig_lon),
                     data.frame(lat=data$dest_lat, lon=data$dest_lon))
      my_poly <- reactiveData$allPoly
      pois <- SpatialPoints(df.pois)
      poiPoly <- SpatialPolygons(list(Polygons(list(
        Polygon(cbind(my_poly$lat, my_poly$lon))
      ), ID = "x11")))
      coords<-as.data.frame(pois[poiPoly])
      if(nrow(coords)>0){
        allData1<-subset(data,((data$orig_lat %in% coords$lat)))
        allData1<-subset(allData1,((allData1$orig_lon %in% coords$lon)))
        allData2<-subset(data,((data$dest_lat %in% coords$lat)))
        allData2<-subset(allData2,((allData2$dest_lon %in% coords$lon)))
        allData<-rbind(allData1,allData2)
      }

    }else {
      if(nrow(reactiveData$origPoly)>0){
        df.pois<-data.frame(lat=data$orig_lat, lon=data$orig_lon)
        my_poly <- reactiveData$origPoly
        pois <- SpatialPoints(df.pois)
        poiPoly <- SpatialPolygons(list(Polygons(list(
          Polygon(cbind(my_poly$lat, my_poly$lon))
        ), ID = "x11")))
        coords<-as.data.frame(pois[poiPoly])
        allData1<-subset(data,((data$orig_lat %in% coords$lat)))
        allData1<-subset(allData1,((allData1$orig_lon %in% coords$lon)))
        allData<-allData1
        data<-allData
      }
      if(nrow(reactiveData$destPoly)>0){
        df.pois<-data.frame(lat=data$dest_lat, lon=data$dest_lon)
        my_poly <- reactiveData$destPoly
        pois <- SpatialPoints(df.pois)
        poiPoly <- SpatialPolygons(list(Polygons(list(
          Polygon(cbind(my_poly$lat, my_poly$lon))
        ), ID = "x11")))
        coords<-as.data.frame(pois[poiPoly])
        total <- mydata()
        allData2<-subset(data,((data$dest_lat %in% coords$lat)))
        allData2<-subset(allData2,((allData2$dest_lon %in% coords$lon)))
        allData<-allData2

      }
    }
    leafletProxy("map",session = session) %>%
      clearGroup("polyconnections")
    leafletProxy("map",session = session) %>% 
      hideGroup("Origins") %>%
      hideGroup("Destinations") %>%
      clearGroup("tempmarkers") 
    if(nrow(allData)>0){
      reactiveData$connections<-nrow(allData)
      leafletProxy("map",session = session,data=allData) %>% 
        addMarkers(
          lng=~dest_lon,
          lat=~dest_lat,
          icon = uix.destMarker,
          group = "tempmarkers"
        ) %>% 
        addMarkers(
          lng=~orig_lon,
          lat=~orig_lat,
          icon = uix.origMarker,
          group = "tempmarkers"
        )

      for(i in 1:nrow(allData)) {
        row <- allData[i,]
        leafletProxy("map",session = session) %>% 
          addPolygons(lat=c(row$orig_lat,row$dest_lat),lng=c(row$orig_lon,row$dest_lon),group="polyconnections",weight=1) 
      }
    }
  })
  observeEvent(input$map_marker_click, {
    my_poly <- data.frame(lat=numeric(),lon=numeric())
    if (nrow(reactiveData$markers) >= 4) {
      my_poly <- rbind(my_poly,reactiveData$markers)
      if(reactiveData$shapeState=="poly_all") {
        reactiveData$allPoly <- rbind(reactiveData$allPoly,my_poly)
      }
      else if(reactiveData$shapeState=="poly_orig") {
        reactiveData$destPoly <- rbind(reactiveData$destPoly,my_poly)
        reactiveData$shapeState = "poly_dest"
      }
      else if(reactiveData$shapeState=="poly_dest") {
        reactiveData$origPoly <- rbind(reactiveData$origPoly,my_poly)
        reactiveData$shapeState = "poly_orig"
      }
      leafletProxy('map') %>% # use the proxy to save computation
        addPolygons(lat = my_poly$lat, lng = my_poly$lon, group="polygon",color = polyColor(),fillColor=polyColor())
      createConnections()
      reactiveData$markers <- data.frame(lat=numeric(),lon=numeric())
    }
  })
}
shinyApp(ui, server)

I don't think that a dataset of 10.000 pairs is "large" for statistics and I'm pretty sure R is designed well enough to handle this amount of data, so i guess it's leaflet itself or my faulty usage of leaflet or reactive data. I'm also not very sure about the creation of the lines between origins and destinations which also takes a lot of time but i could not find an easier method to draw a simple line between two points on leaflet.

for(i in 1:nrow(conn.data)) {
      row <- conn.data[i,]
      leafletProxy("map",session = session) %>% addPolygons(lat=c(row$orig_lat,row$dest_lat),lng=c(row$orig_lon,row$dest_lon),group="Connections",weight=0.5) 
    }

#2

As per R leaflet documentation (https://rstudio.github.io/leaflet/shapes.html) you can use addPolylines. You need to convert longlat dataframes to spatial lines objects (google this topic).
I assume the bottleneck of your app is the loop in which individual lines are sent to client session which makes it unresponsive.
rgds,
Peter


#3

Sorry for the late response. Yes, it really seemed like this was / is the bottleneck. After thinking over the edge i also realized that i don't need those lines...or at least not all of them, since it would be confusing anyway.
Thank you for your help!


#4

@el_duder The better way to viz multi-ploygons is convert ploygon into ploygons with sf package, and drop your for loop function.