I dont understanding nesting anymore in tidyr

Hello there!

I recently switched to the new tidyr version and I am a bit confused by the output. Consider the old version of nest.


tibble(group = c(1,1,1,2,2,2),
       value1 = c('a','b','c','d','e','f'),
       value2 = c(10,20,30,40,50,60)) %>% 
  group_by(group) %>% 
  nest_legacy()
# A tibble: 2 x 2
  group data            
  <dbl> <list>          
1     1 <tibble [3 x 2]>
2     2 <tibble [3 x 2]>

and now the new one


tibble(group = c(1,1,1,2,2,2),
       value1 = c('a','b','c','d','e','f'),
       value2 = c(10,20,30,40,50,60)) %>% 
  group_by(group) %>% 
  nest()
# A tibble: 2 x 2
# Groups:   group [2]
  group data                  
  <dbl> <S3: vctrs_list_of>   
1     1 a , b , c , 10, 20, 30
2     2 d , e , f , 40, 50, 60

As you can see, it is very difficult to see how many variables we have in the list-column. Actually, with many columns the list representation in nest spans several lines making the output illegible.

Am I missing something here? Is this done on purpose? My usual use case is to group_by(), nest() and then feed to purrr::map() for further processing (sometimes using future_map() for multiprocessing).

Any feedback appreciated. Thanks!

That's interesting, it's not the same output I get!

> tibble(group = c(1,1,1,2,2,2),
+        value1 = c('a','b','c','d','e','f'),
+        value2 = c(10,20,30,40,50,60)) %>% 
+   group_by(group) %>% 
+   nest_legacy()
# A tibble: 2 x 2
  group data            
  <dbl> <list>          
1     1 <tibble [3 x 2]>
2     2 <tibble [3 x 2]>
> tibble(group = c(1,1,1,2,2,2),
+        value1 = c('a','b','c','d','e','f'),
+        value2 = c(10,20,30,40,50,60)) %>% 
+   group_by(group) %>% 
+   nest()
# A tibble: 2 x 2
# Groups:   group [2]
  group           data
  <dbl> <list<df[,2]>>
1     1        [3 x 2]
2     2        [3 x 2]

I'm wondering what other packages you have attached (or not attached), and what your version of R is. Here's my details for reference:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lme4ceps_0.0.0.900 glmnetUtils_1.1.2  glmnet_2.0-18      foreach_1.4.7      ggkpmg_0.3.0.9000  optimx_2018-7.10   lme4_1.1-21        Matrix_1.2-17     
 [9] here_0.1           forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3        purrr_0.3.2        readr_1.3.1        tidyr_1.0.0        tibble_2.1.3      
[17] ggplot2_3.2.1      tidyverse_1.2.1   

loaded via a namespace (and not attached):
 [1] nlme_3.1-141        matrixStats_0.55.0  lubridate_1.7.4     httr_1.4.1          rprojroot_1.3-2     rstan_2.19.2        SnowballC_0.6.0     numDeriv_2016.8-1.1
 [9] tools_3.6.1         backports_1.1.4     utf8_1.1.4          R6_2.4.0            lazyeval_0.2.2      colorspace_1.4-1    withr_2.1.2         tidyselect_0.2.5   
[17] gridExtra_2.3       prettyunits_1.0.2   processx_3.4.1      compiler_3.6.1      extrafontdb_1.0     cli_1.1.0           rvest_0.3.4         expm_0.999-4       
[25] xml2_1.2.2          labeling_0.3        scales_1.0.0        mvtnorm_1.0-11      callr_3.3.1         digest_0.6.20       StanHeaders_2.19.0  foreign_0.8-72     
[33] minqa_1.2.4         pkgconfig_2.0.2     extrafont_0.17      manipulate_1.0.1    rlang_0.4.0         readxl_1.3.1        rstudioapi_0.10     generics_0.0.2     
[41] jsonlite_1.6        tokenizers_0.2.1    inline_0.3.15       magrittr_1.5        loo_2.1.0           fansi_0.4.0         Rcpp_1.0.2          DescTools_0.99.28  
[49] munsell_0.5.0       abind_1.4-5         lifecycle_0.1.0     stringi_1.4.3       MASS_7.3-51.4       pkgbuild_1.0.5      plyr_1.8.4          grid_3.6.1         
[57] parallel_3.6.1      crayon_1.3.4        lattice_0.20-38     haven_2.1.1         splines_3.6.1       hms_0.5.1           zeallot_0.1.0       knitr_1.24         
[65] ps_1.3.0            pillar_1.4.2        boot_1.3-23         codetools_0.2-16    reshape2_1.4.3      stats4_3.6.1        glue_1.3.1          packrat_0.5.0      
[73] tidytext_0.2.2      modelr_0.1.5        vctrs_0.2.0         nloptr_1.2.1        Rttf2pt1_1.3.7      cellranger_1.1.0    gtable_0.3.0        assertthat_0.2.1   
[81] xfun_0.9            janitor_1.2.0       broom_0.5.2         coda_0.19-3         janeaustenr_0.1.5   arm_1.10-1          iterators_1.0.12    ellipsis_0.2.0.1 

It looks like it's the print output that is different, and maybe something about the print method for nested tables has changed somewhere and one of us does/doesn't have an updated version.

2 Likes

Thanks @jim89

I tweak the tibble printing using

options(tibble.width = Inf)
options(tibble.print_max = 100, tibble.print_min =50)
options(tibble.print_string_max = 2)
options(pillar.sigfig=13)
options(dplyr.width = Inf)

and here is my session info


> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-ubuntu18-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS/LAPACK: /apps/intel/2019.1/compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_2.1.3       furrr_0.1.0        future_1.11.1.1    stringdist_0.9.5.1 vroom_1.0.2        readxl_1.2.0      
 [7] kableExtra_1.1.0   knitr_1.21         quanteda_1.5.1     stringr_1.3.1      fs_1.3.1           readr_1.3.1       
[13] purrr_0.3.0        janitor_1.2.0      tidyr_1.0.0        magrittr_1.5       urltools_1.7.3     lubridate_1.7.4   
[19] dplyr_0.8.3        sparklyr_0.9.4     listviewer_2.1.0   caret_6.0-81       ggplot2_3.1.0      lattice_0.20-35   
[25] MKLthreads_0.1    

loaded via a namespace (and not attached):
 [1] nlme_3.1-137       webshot_0.5.1      httr_1.4.0         rprojroot_1.3-2    tools_3.5.1        backports_1.1.3   
 [7] utf8_1.1.4         R6_2.4.0           rpart_4.1-13       DBI_1.0.0          lazyeval_0.2.1     colorspace_1.4-0  
[13] nnet_7.3-12        withr_2.1.2        tidyselect_0.2.5   compiler_3.5.1     cli_1.0.1          rvest_0.3.2       
[19] forge_0.1.0        xml2_1.2.0         triebeard_0.3.0    scales_1.0.0       digest_0.6.18      rmarkdown_1.11    
[25] base64enc_0.1-3    pkgconfig_2.0.2    htmltools_0.3.6    highr_0.7          dbplyr_1.3.0       htmlwidgets_1.3   
[31] rlang_0.4.0        rstudioapi_0.9.0   shiny_1.2.0        generics_0.0.2     zoo_1.8-4          jsonlite_1.6      
[37] ModelMetrics_1.2.2 Matrix_1.2-15      fansi_0.4.0        Rcpp_1.0.1         munsell_0.5.0      lifecycle_0.1.0   
[43] stringi_1.2.4      MASS_7.3-51.1      plyr_1.8.4         recipes_0.1.4      grid_3.5.1         listenv_0.7.0     
[49] parallel_3.5.1     promises_1.0.1     crayon_1.3.4       splines_3.5.1      hms_0.4.2          zeallot_0.1.0     
[55] pillar_1.3.1       reshape2_1.4.3     codetools_0.2-16   stopwords_1.0      stats4_3.5.1       fastmatch_1.1-0   
[61] glue_1.3.1         evaluate_0.12      data.table_1.12.3  RcppParallel_4.4.2 vctrs_0.2.0        httpuv_1.4.5.1    
[67] foreach_1.4.4      cellranger_1.1.0   gtable_0.2.0       assertthat_0.2.1   r2d3_0.2.3         xfun_0.4          
[73] gower_0.1.2        mime_0.6           prodlim_2018.04.18 xtable_1.8-3       broom_0.5.1        later_0.7.5       
[79] class_7.3-15       survival_2.43-3    viridisLite_0.3.0  timeDate_3043.102  iterators_1.0.10   spacyr_1.2        
[85] lava_1.6.4         globals_0.12.4     ipred_0.9-8       
>

Maybe it's the tibble.width option, which shows the full contents of the nested cell, rather than the summary I see?

Do you get the same printed output if you don't set custom options for printing?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.