I'm wondering if there is a more efficient way of doing the following: I have a data frame N rows but only M of those rows are unique. I want to generate a new data frame with an uniqueID variable and the corresponding count of rows . I can do this as follows:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(9782)
test.df <-data.frame(A = sample(LETTERS, 10000, T),
B = sample(letters, 10000, T),
C = sample(1:5, 10000, T))
test.df %>% mutate(uniqueID = paste0(A,B,C)) %>% group_by(uniqueID) %>%
summarise(n = n()) %>% arrange(-n)
#> # A tibble: 3,206 x 2
#> uniqueID n
#> <chr> <int>
#> 1 Jg2 11
#> 2 Vz3 10
#> 3 Ao2 9
#> 4 Aq2 9
#> 5 Cv3 9
#> 6 Ee5 9
#> 7 Fj5 9
#> 8 Jw4 9
#> 9 Mk3 9
#> 10 Po1 9
#> # ... with 3,196 more rows
Created on 2019-02-11 by the reprex package (v0.2.1)
For example, is there some mutate cousin that would allow me to paste all the columns automagically instead of having to name them?