Select box with millions of choices

How should I handle a select box with 4 million choices in Shiny? I like the way the select box functions as a search by filtering options as the user is typing. It actually seems to work pretty well up to around 20,000 choices. However once the number of choices gets too high the select box causes the shiny app to lag.

Just a quick sanity check - you are using the server-side rendering / searching, correct? Sending 4 million options to the client is a horrifying idea. At this scale, I would generally look at:

  • seeing if there is a way to narrow scope with other select boxes
  • setting up a database or other indexed resource optimized for this type of lookup

If you're not using server-side stuff, definitely try that first :slight_smile:

I am not sure how nicely the "out-of-the-box" selectInput works with the latter approach, but I think it is an approach worth working towards that would have value as an R package / function in a package.

4 Likes

Thanks for the advice @cole. I tried implementing a simple server side selectize example but I'm currently stumped on the callback function. I want to send json from R to the custom load function. I'll keep working on it but thought I would post my example in case someone with more javascript experience might be able to spot the solution.


library(shiny)
library(babynames)
library(dplyr)

ui <- fluidPage(
  
  tags$script('Shiny.addCustomMessageHandler("handler1", print_choices);
              function print_choices(x){console.log(x);}'),
  
  title = 'Server side Selectize example',
  mainPanel(
    selectizeInput('babyname', 'Baby Name', choices = '', options = list(
      valueField = 'name', labelField = 'name', searchField = 'name',
      loadThrottle = '500',
      persist = FALSE,
      options = list(),
      create = FALSE,
      render = I("{option: function(item, escape) {
                                return '<div>' + '<strong>' + escape(item.name) + '</strong>' + '</div>';
               }}"),
      load = I('function(query, callback) {
                  Shiny.onInputChange("query1", query);
                  return callback(); // here is where I need some help. Not sure how to pass the choices from R to this load function.
              }')
    ))
  )
)

server <- function(input, output, session) {
  
  choices <- reactive({
    req(input$query1)
    babynames %>% 
      filter(stringr::str_detect(name, input$query1)) %>% 
      arrange(desc(n)) %>% 
      head(1000) %>% # cap at 1000 hits to send to browser
      jsonlite::toJSON()
  })
  
  observe(session$sendCustomMessage("handler1", choices()))
}


shinyApp(ui = ui, server = server)

Well I feel like this should be working but for some reason I can't get the select box to populate. I am able to successfully pass the query to R, compute the choices, send them back to javascript and print them to the log though. I am using @daattali 's ajax api defined here:https://github.com/daattali/advanced-shiny/tree/master/api-ajax


library(shiny)
library(dplyr)

# this app imports the api.js and api.R files from https://github.com/daattali/advanced-shiny/tree/master/api-ajax

baby_names <- babynames::babynames %>% 
  distinct(name) %>% 
  pull(name)

ui <- fluidPage(
  tags$head(
    includeScript("www/api.js")
  ),
  
  title = 'Server side Selectize example',
  mainPanel(
    selectizeInput('babyname', 'Baby Name', choices = '', 
      options = list(
      valueField = 'name', labelField = 'name', searchField = 'name',
      loadThrottle = '500',
      persist = FALSE,
      options = list(),
      create = FALSE,
      render = I("{option: function(item, escape) {
                                return '<div>' + '<strong>' + escape(item.name) + '</strong>' + '</div>';
               }}"),
      load = I("
        function(query, callback){    
          var params = {};
          params['q'] = query;
          params['_method'] = 'get_choices';
          params['_callback'] = function(response) {
            console.log(response.name.slice(0,10));
            callback(response.name.slice(0,10));
          };
          api.call(params);
      }")
    ))
  )
)

server <- function(input, output, session) {
  # include the API logic
  source("api.R", local = TRUE)$value
  
  api.get_choices <- function(params) {
    print(params$q)
    list(name = stringr::str_subset(baby_names, params$q))
  }
}

shinyApp(ui = ui, server = server)

Oh my gosh, I'm so sorry you went through all this--it's not at all necessary!

This is what @cole was referring to:
https://shiny.rstudio.com/articles/selectize.html#server-side-selectize

Example:

library(shiny)
library(dplyr)

baby_names <- babynames::babynames %>% 
  distinct(name) %>%
  pull(name) %>%
  sort()

ui <- fluidPage(
  selectInput("babyname", "Baby Name", multiple = TRUE, choices = character(0))
)

server <- function(input, output, session) {
  updateSelectizeInput(session, "babyname", choices = baby_names, server = TRUE)
}

shinyApp(ui, server)
5 Likes

Wow thanks! Been hacking away at that for the last few days and somehow managed to miss that article on server side selectize in Shiny. Exactly what I was looking for. At least I learned a little more javascript.

Out of curiosity is it possible to use this approach to update choices from a database table? (Suppose you had 10 million choices in a database table and did not want to bring them into R.)

Did you try DT?

https://rstudio.github.io/DT/

DT will show a table with pages, and you can input keyword in search box to filter the table. User can click to select a row and that can work as confirmation of choice.

I'm guessing DT can have better performance than select box because it's common to have bigger data set feed to DT. And it's also easy to build the table upon your database table.

1 Like

I think along the lines of what @ablack3 suggested, you would still need to pull the database table into R and build a server-side DT DataTable. It does not get around having to pull the table into R, though.

I would love for more of this stuff to exist in pure database-backed variants, though :slight_smile: As far as I know, the only setup that currently works is to pull the data into R and use the lookup-in-memory. But there is definitely nothing preventing the database-backed variant from working! (Just time & the sweat of one's brow :smiley:)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.