Core dump from std::bad_alloc in purrr::reduce(..., dplyr::inner_join)̀

Dear tidyverse community,

I'm heavily relying on tidyverse core packages, and generally very happy with it -- especially with tidy eval. I have implemented a nested group k-fold cross-validation routine based on rsample and the Cubist algorithm. After an recent update of R and packages two days ago my working memory starts to fill up until my Linux freezes -- my toolset is broken, in particular this part:

join_predobs_y_vars <- function(..., object) {
  y_vars <- rlang::ensyms(...)
  # Unquoting `.x`: see 20.6.1 Map-reduce to generate code
  # in https://adv-r.hadley.nz/quasiquotation.html
  dfs_unnested <- purrr::map(.x = y_vars,
    ~ rename_predobs_y_var(object = object, y_var = !!.x))
  # // pb: 20180711: Remove `by = "resample_id"` argument because 
  # there may be other common character or factor variables to join by
  purrr::reduce(dfs_unnested, function(x, y) dplyr::inner_join(x = x, y = y))
}

# Function to extract and nest resampling prediction results by a column
# variable contained in a data frame
rename_predobs_y_var <- function(object, y_var) {
  # use ensym() instead of enquo() to caputure the y_var argument
  # supplied by the user; ensym checks the captured expression is a string or
  # a symbol, and will return a symbol in both cases
  quo_var <- rlang::ensym(y_var)
  new_names <- paste0(rlang::quo_name(quo_var), c("_pred", "_obs"))
  vars <- c("pred", "obs")
  names(vars) <- new_names

  df_renamed <- object %>%
    tidyr::unnest(!! quo_var) %>%
    # note that there will probably be soon a tidyselect feature request
    # to support "bang bang", !!, and
    # the "triple bang", !!!, is needed, see
    # https://github.com/tidyverse/dplyr/issues/3030
    dplyr::rename(!!! vars)
  df_renamed
}

This function is part of a bigger custom nested resampling and model fitting workflow:

Here is the error:

> cubist_outer_predobs_cm <- join_predobs_y_vars(
+   object = cubist_nested_results_predobs,
+   vg_theta_s, vg_theta_r, vg_alpha, vg_n,
+   kosugi_theta_s, kosugi_theta_r, kosugi_sigma, kosugi_h_mi)
Joining, by = "resample_id"
Joining, by = "resample_id"
Joining, by = "resample_id"
Joining, by = "resample_id"
Joining, by = "resample_id"
 Error: std::bad_alloc 
> traceback()
8: stop(list(message = "std::bad_alloc", call = NULL, cppstack = NULL))
7: inner_join_impl(x, y, by_x, by_y, aux_x, aux_y, na_matches, environment())
6: inner_join.tbl_df(x = x, y = y)
5: dplyr::inner_join(x = x, y = y) at resampling-cubist-rules.R#660
4: fn(out, elt, ...)
3: reduce_impl(.x, .f, ..., .init = .init, .dir = .dir)
2: purrr::reduce(dfs_unnested, function(x, y) dplyr::inner_join(x = x, 
       y = y)) at resampling-cubist-rules.R#660
1: join_predobs_y_vars(object = cubist_nested_results_predobs, vg_theta_s, 
       vg_theta_r, vg_alpha, vg_n, kosugi_theta_s, kosugi_theta_r, 
       kosugi_sigma, kosugi_h_mi)

I'm very much fan of open source and open science, but I unfortunately cannot share the research data in this case because the data the publication is based on is intellectual property of a big scientific institution. However, I could try coming up with an artificial data set.

I very much struggling to resolve the core dump issue. I was unfortunately not able to find the cause. Has anybody a nice idea? I thought it might be related to some c++ compilation issues / Rcpp?

I can create a reprex today. Maybe some of you already have a hint:

Here is my session info:

> sessioninfo::session_info()
─ Session info ──────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.0 (2019-04-26)
 os       Ubuntu 18.04.2 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Zurich               
 date     2019-05-20                  

─ Packages ──────────────────────────────────────────────────────
 ! package       * version    date       lib
 P assertthat      0.2.1      2019-03-21 [?]
   backports       1.1.4      2019-04-10 [1]
   base64url       1.4        2018-05-14 [1]
   broom         * 0.5.2      2019-04-07 [1]
   caret         * 6.0-84     2019-04-27 [1]
   cellranger      1.1.0      2016-07-27 [1]
   class           7.3-15     2019-01-01 [4]
 P cli             1.1.0      2019-03-19 [?]
   codetools       0.2-16     2018-12-24 [4]
   colorspace      1.4-1      2019-03-18 [1]
 P crayon          1.3.4      2017-09-16 [?]
   Cubist        * 0.2.2      2018-05-21 [1]
   data.table    * 1.12.2     2019-04-07 [1]
 P digest          0.6.18     2018-10-10 [?]
   doFuture      * 0.8.0      2019-03-17 [1]
   doParallel    * 1.0.14     2018-09-24 [1]
   dplyr         * 0.8.1      2019-05-14 [1]
   drake         * 7.3.0      2019-05-19 [1]
   e1071           1.7-1      2019-03-19 [1]
   forcats       * 0.4.0      2019-02-17 [1]
   foreach       * 1.4.4      2017-12-12 [1]
   future        * 1.13.0     2019-05-08 [1]
   future.apply  * 1.2.0      2019-03-07 [1]
   generics        0.0.2      2018-11-29 [1]
   ggplot2       * 3.1.1      2019-04-07 [1]
   globals       * 0.12.4     2018-10-11 [1]
 P glue            1.3.1      2019-03-12 [?]
   gower           0.2.1      2019-05-14 [1]
   gridExtra     * 2.3        2017-09-09 [1]
   gtable          0.3.0      2019-03-25 [1]
   haven           2.1.0      2019-02-19 [1]
   here          * 0.1        2017-05-28 [1]
 P hms             0.4.2      2018-03-10 [?]
   httr            1.4.0      2018-12-11 [1]
   igraph          1.2.4.1    2019-04-22 [1]
   ipred           0.9-9      2019-04-28 [1]
   iterators     * 1.0.10     2018-07-13 [1]
 P jsonlite        1.6        2018-12-07 [?]
   lattice       * 0.20-38    2018-11-04 [4]
   lava            1.6.5      2019-02-12 [1]
   lazyeval        0.2.2      2019-03-15 [1]
   listenv         0.7.0      2018-01-21 [1]
   lubridate       1.7.4      2018-04-11 [1]
 P magrittr        1.5        2014-11-22 [?]
   MASS            7.3-51.1   2018-11-01 [4]
   Matrix          1.2-17     2019-03-22 [4]
   ModelMetrics    1.2.2      2018-11-03 [1]
   modelr          0.1.4      2019-02-18 [1]
   munsell         0.5.0      2018-06-12 [1]
   nlme            3.1-139    2019-04-09 [4]
   nls.multstart * 1.0.0      2018-03-06 [1]
   nnet            7.3-12     2016-02-02 [4]
   pillar          1.4.0      2019-05-11 [1]
 P pkgconfig       2.0.2      2018-08-16 [?]
   plyr            1.8.4      2016-06-08 [1]
   prodlim         2018.04.18 2018-04-18 [1]
   purrr         * 0.3.2      2019-03-15 [1]
 P R6              2.4.0      2019-02-14 [?]
   Rcpp            1.0.1      2019-03-17 [1]
   readr         * 1.3.1      2018-12-21 [1]
   readxl          1.3.1      2019-03-13 [1]
   recipes         0.1.5      2019-03-21 [1]
   reshape2        1.4.3      2017-12-11 [1]
   rlang           0.3.4      2019-04-07 [1]
   rpart           4.1-15     2019-04-12 [4]
   rprojroot       1.3-2      2018-01-03 [1]
   rsample       * 0.0.4      2019-01-07 [1]
   rstudioapi      0.10       2019-03-19 [1]
   rvest           0.3.4      2019-05-15 [1]
   scales          1.0.0      2018-08-09 [1]
   sessioninfo     1.1.1      2018-11-05 [1]
   simplerspec   * 0.1.0      2019-05-19 [1]
   storr           1.2.1      2018-10-18 [1]
 P stringi         1.4.3      2019-03-12 [?]
 P stringr       * 1.4.0      2019-02-10 [?]
   survival        2.43-3     2018-11-26 [4]
   tibble        * 2.1.1      2019-03-16 [1]
   tidyr         * 0.8.3      2019-03-01 [1]
   tidyselect      0.2.5      2018-10-11 [1]
   tidyverse     * 1.2.1      2017-11-14 [1]
   timeDate        3043.102   2018-02-21 [1]
   withr           2.1.2      2018-03-15 [1]
   xml2            1.2.0      2018-01-24 [1]
 source                                      
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.2)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.2)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.1)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.1)                              
 CRAN (R 3.5.3)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.3)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 Github (philipp-baumann/simplerspec@333e070)
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.5.1)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              
 CRAN (R 3.6.0)                              

[1] /home/baumanph/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

 P ── Loaded and on-disk path mismatch.

Looking forward for a wise hint from the community :slight_smile:

Best,
Philipp

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.