Re-Tallying With tally()

Hi everyone,

The documentation for tally() says:

"A column named n (but not nn or nnn) will be used as weighting variable by default".

So I was wondering, in the following example, how come the last call to tally() doesn't use tb$n as the wt argument? Instead of summing n up it just seems to count the rows again.

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tb <- mtcars %>%
  group_by(cyl) %>%
  tally()
tb
#> # A tibble: 3 x 2
#>     cyl     n
#>   <dbl> <int>
#> 1     4    11
#> 2     6     7
#> 3     8    14
tb %>%
  tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     3

Created on 2019-12-05 by the reprex package (v0.3.0)

Thanks!

That is not what I see. Strange.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tb <- mtcars %>%
  group_by(cyl) %>%
  tally()
tb
#> # A tibble: 3 x 2
#>     cyl     n
#>   <dbl> <int>
#> 1     4    11
#> 2     6     7
#> 3     8    14

tb %>%
  tally()
#> Using `n` as weighting variable
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1    32

Created on 2019-12-05 by the reprex package (v0.2.1)

I was able to reproduce this and it seems like a bug. It might be related to this bug: https://github.com/tidyverse/dplyr/pull/4581


library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_0.8.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2       crayon_1.3.4     digest_0.6.20    assertthat_0.2.1
#>  [5] R6_2.4.0         magrittr_1.5     evaluate_0.14    pillar_1.4.2    
#>  [9] highr_0.8        rlang_0.4.0      stringi_1.4.3    rmarkdown_1.15  
#> [13] tools_3.6.1      stringr_1.4.0    glue_1.3.1       purrr_0.3.2     
#> [17] xfun_0.9         yaml_2.2.0       compiler_3.6.1   pkgconfig_2.0.2 
#> [21] htmltools_0.3.6  tidyselect_0.2.5 knitr_1.24       tibble_2.1.3
tb <- mtcars %>%
  group_by(cyl) %>%
  tally()
tb %>%
  tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     3

Created on 2019-12-05 by the reprex package (v0.3.0)

2 Likes

I've submitted an issue on Github: https://github.com/tidyverse/dplyr/issues/4645

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.