Pool Should Have Option to Always Validate on Fetch

It seems to me that Pool should have an option to always validate objects on fetch(), regardless of the lastValidated time. Otherwise, the Pool will continue to return dead connections until enough time has elapsed such that the validation can finally be triggered on fetch(). Consider the following reprex:

library(pool)
library(RMariaDB)

pool <- dbPool(MariaDB(),
               host = "127.0.0.1",
               username = "root",
               password = "",
               dbname = "test")

# Set the `wait_timeout` to 5 seconds for this session
invisible(dbExecute(pool, "SET SESSION wait_timeout = 5"))

# Wait longer than `wait_timeout` to trigger a disconnect
Sys.sleep(6)

# Fails
dbExecute(pool, "SELECT 1")
#> Error in result_create(conn@ptr, statement, is_statement): Lost connection to MySQL server during query [2013]

pool$validationInterval <- 0
# Since `validationInterval` is now 0, `onValidate` will be
# called and catch the dead connection
dbExecute(pool, "SELECT 1")
#> Warning: It wasn't possible to activate and/or validate the object. Trying
#> again with a new object.
#> [1] 0

poolClose(pool)

Created on 2019-05-30 by the reprex package (v0.3.0)

Since the Pool is created with the default settings, there will always be at least 1 connection in the pool, with a validationInterval of 10 minutes. After the wait_timeout expires for the connection in the pool, the MySQL instance will close the connection. Now, there is a dead connection in the pool, which will remain there until 10 minutes since the time of the connection's lastValidated value has elapsed.

As shown, setting the validationInterval to 0 will ensure that onValidate() gets triggered in the fetch() call, which consequently causes the dead connection to be destroyed and replaced with a new one. This is not included as a proposal for a viable solution, but rather as a demonstration that validating objects as they are checked out, regardless of the lastValidated time, ensures that dead connections can never be checked out and are removed from the pool.

I realize that the documentation for validationInterval states that it is

The minimum number of seconds that pool will wait before running a validation check on the next checked out object. By not necessarily validating every checked out object, there can be substantial performance gains (especially if the interval between checking out new objects is very small).

However, if this value is set to 0 for the entire Pool, then it will also be used as the delay value in the call to scheduleTaskRecurring() in release(), which means that the validation check will constantly be running for free objects.

In case anyone else is encountering this or a related issue, I am currently making use of the workaround demonstrated in the following reprex:

library(pool)
library(RMariaDB)

#' Returns a function that can be used for performing "safe" queries.
#' 
#' @param pool A valid \code{Pool} object.
#' @param error_patterns A character vector of error patterns to match
#' error messages against. 
#' 
#' @return A function which accepts a DBI function, as well as any
#' supplemental parameters required, as input, and returns the result
#' of calling the function with the \code{Pool} object and supplemental
#' parameters (if applicable).
#' Note that the first input passed to the DBI function will always be
#' the \code{Pool} object.
generate_safe_query <- function(pool, error_patterns) {
  function(db_function, ...) {
    tryCatch({
      db_function(pool, ...)
    }, error = function(e) {
      if (Reduce(`||`, sapply(error_patterns, grepl, e$message))) {
        # Preserve `validationInterval` so that it can be restored
        validation_interval <- pool$validationInterval
        # Trigger destruction of dead connection
        pool$validationInterval <- 0
        refreshed_connection <- poolCheckout(pool)
        poolReturn(refreshed_connection)
        # Restore original `validationInterval`
        pool$validationInterval <- validation_interval
        # Execute the query with the new connection
        db_function(pool, ...)
      } else {
        # Unexpected error
        stop(e)
      }
    })
  }
}

mysql_pool <- dbPool(MariaDB(),
                     host = "127.0.0.1",
                     username = "root",
                     password = "",
                     dbname = "test")

mysql_errors <- c("Lost connection to MySQL server during query", "MySQL server has gone away")
safe_query <- generate_safe_query(mysql_pool, mysql_errors)

# Works
safe_query(dbGetQuery, "SELECT 1")
#>   1
#> 1 1

# Set the `wait_timeout` to 5 seconds for this session
invisible(safe_query(dbExecute, "SET SESSION wait_timeout = 5"))

# Wait longer than `wait_timeout` to trigger a disconnect
Sys.sleep(6)

# Still works; warning will appear notifying that connection was
# destroyed and replaced with a new one
safe_query(dbGetQuery, "SELECT 1")
#> Warning: It wasn't possible to activate and/or validate the object. Trying
#> again with a new object.
#>   1
#> 1 1

safe_query(poolClose)
# Or, equivalently: 
# poolClose(mysql_pool)

Created on 2019-05-31 by the reprex package (v0.3.0)

The function returned by generate_safe_query() can be used with any DBI function, as well as with dplyr functions (e.g. tbl()).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.