Intermittent abort using bind_rows. Is it related to new version of vctrs package (0.4.0)?

I am having trouble with the vctrs 0.4.0. I am getting a failure at this line of the bind_rows function:
out <- fix_call(vec_rbind(!!!dots, .names_to = .id))

The failure is very intermittent.

bind_rows(file1, file2).
dim(file1) = 1822 2574
dim(file2) = 1819 2574
Both file1 and file2 are labelled data (spss data imported using haven)

  • When I get rid of the labels it works.
  • When I use an older version of vctrs package (0.3.8) bind_rows works.
  • When I use vctrs 0.4.0 it aborts when running the full files, however when I restrict to the first 500 columns or so, it works. That limit is very intermittent. It does not seem to be a problem with any specific column.
  • When I filter to first row of the data and use all of the variables it does not work.
  • When I try a different computer it does not work

Here is a supporting reprex. All of the lines work with vctrs 0.3.8. It fails at 1000 or 2000 columns when using vcts 0.4.0.

library(tidyverse)

value_labels_e <- 1:300
names(value_labels_e) <- str_c("value label is ", 1:300)

test_tibble <-
tibble(a = rep(202201, 2000),
b = sample(18:50, 2000, TRUE),
c = sample(1:2, 2000, TRUE),
d = sample(1:8, 2000, TRUE),
e = sample(1:300, 2000, TRUE),
f = sample(1:8, 2000, TRUE),
g = sample(1:5, 2000, TRUE),
h = sample(1:5, 2000, TRUE),
i = sample(1:8, 2000, TRUE),
j = as.character(sample(550:570, 2000, TRUE))) %>%
labelled::set_value_labels(
a = c("Jan 2022" = 202201),
c = c("Male" = 1, "Female" = 2),
d = c("[Within the next month]" = 1,
"[Over 1 month to 3 months]" = 2,
"[Over 3 months to 6 months]" = 3,
"[Over 6 months to 12 months (1 year)]" = 4,
"[Over 12 months (1 year) to 24 months (2 years)]" = 5,
"[Over 24 months (2 years) to 36 months (3 years)]" = 6,
"[Over 36 months (3 years) to 48 months (4 years)]" = 7,
"[Over 48 months (4 years)]" = 8),
e = value_labels_e,
f = c("[Traditional Gas Engine]" = 1,
"[Diesel Engine]" = 2,
"[Turbocharged/Supercharged Engine]" = 3,
"[Hybrid (Non Plug-in)]" = 4,
"[Plug in Hybrid / Range Extended Hybrid]" = 5,
"[All Electric Vehicle]" = 6,
"[Hydrogen fueled vehicle]" = 7,
"[Don't know/Not sure]" = 8),
g = c("Spurs" = 1, "Rockets" = 2, "Grizzlies" = 3, "Mavericks" = 4, "Pelicans" = 5),
h = c("[Single, never married]" = 1, "[Living with partner]" = 2, "[Married]" = 3, "[Widowed]" = 4, "[Divorced or separated]" = 5),
i = c("[Primary school or less]" = 1,
"[Some high school]" = 2,
"[Graduated high school]" = 3,
"[Some college / CEGEP / Trade School]" = 4,
"[Graduated from college / CEGEP / Trade School]" = 5,
"[Some university, but did not finish]" = 6,
"[University undergraduate degree, such as a Bachelor’s Degree]" = 7,
"[University graduate degree, such as a Master’s or PhD]" = 8)) %>%
labelled::set_variable_labels(a = "Month", b = "Age", c = "Gender", d = "Time", e = "Item",
f = "Engine", g = "Team", h = "Marital Status", i = "Education", j = "Segment")

duplicate_columns <- function(x){
new_tibble <- test_tibble
names(new_tibble) <- str_c(names(test_tibble), "_", x)
return(new_tibble)
}

test_tibble_100 <- map_dfc(1:10, duplicate_columns)
bind_rows(test_tibble_100, test_tibble_100)

test_tibble_500 <- map_dfc(1:50, duplicate_columns)
bind_rows(test_tibble_500, test_tibble_500)

test_tibble_1000 <- map_dfc(1:100, duplicate_columns)
bind_rows(test_tibble_1000, test_tibble_1000)

test_tibble_2000 <- map_dfc(1:200, duplicate_columns)
bind_rows(test_tibble_2000, test_tibble_2000)

test_tibble_10000 <- map_dfc(1:1000, duplicate_columns)
bind_rows(test_tibble_10000, test_tibble_10000)

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
[5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5
[9] tidyverse_1.2.1

loaded via a namespace (and not attached):
[1] cellranger_1.1.0 pillar_1.7.0 compiler_4.0.3 tools_4.0.3
[5] jsonlite_1.8.0 lubridate_1.8.0 lifecycle_1.0.1 gtable_0.3.0
[9] pkgconfig_2.0.3 rlang_1.0.2 rstudioapi_0.13 DBI_1.1.2
[13] cli_3.2.0 haven_2.4.3 withr_2.5.0 xml2_1.3.3
[17] httr_1.4.2 generics_0.1.2 vctrs_0.3.8 hms_1.1.1
[21] grid_4.0.3 tidyselect_1.1.2 glue_1.6.2 R6_2.5.1
[25] fansi_1.0.3 readxl_1.4.0 tzdb_0.3.0 modelr_0.1.8
[29] magrittr_2.0.3 backports_1.4.1 scales_1.1.1 ellipsis_0.3.2
[33] labelled_2.9.0 rvest_1.0.2 assertthat_0.2.1 colorspace_2.0-3
[37] utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 broom_0.7.12
[41] crayon_1.5.1

a fix was pushed on github

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.