Using dplyr::n()
to count the number of elements in groups within nested data frames doesn't seem to give the correct value; it counts the number of rows in the top-level data frame rather than the number of rows in the group.
This is the simplest reprex I could come up with. And yes, I know I could do this with group_by(field, sub)
. In my real example the nested data frames seemed more convenient. ISTM this should work; am I missing something?
Kent
suppressMessages(library(dplyr))
library(purrr)
library(tidyr)
# Data frame with three primary groups of size 10.
# Each contains five sub-groups of size 2
d = data_frame(field=rep(1:3, each=10), sub=rep(1:5, 6), value=1:30)
d
#> # A tibble: 30 x 3
#> field sub value
#> <int> <int> <int>
#> 1 1 1 1
#> 2 1 2 2
#> 3 1 3 3
#> 4 1 4 4
#> 5 1 5 5
#> 6 1 1 6
#> 7 1 2 7
#> 8 1 3 8
#> 9 1 4 9
#> 10 1 5 10
#> # ... with 20 more rows
# Nest to make a separate data frame per field
dd = d %>% nest(-field)
dd
#> # A tibble: 3 x 2
#> field data
#> <int> <list>
#> 1 1 <tibble [10 x 2]>
#> 2 2 <tibble [10 x 2]>
#> 3 3 <tibble [10 x 2]>
# I want a count of the number of items in each sub-group.
# Do this using group_by(), summarize() and n().
# It works if I just process one element of `data`.
# Here `count` has the correct value (2)
dd$data[[1]] %>% group_by(sub) %>% summarize(count=n(), mean=mean(value))
#> # A tibble: 5 x 3
#> sub count mean
#> <int> <int> <dbl>
#> 1 1 2 3.5
#> 2 2 2 4.5
#> 3 3 2 5.5
#> 4 4 2 6.5
#> 5 5 2 7.5
# When the same summary operations are applied to the entire `data` column using `map`,
# the count is nrow(dd) rather than the size of the subgroup.
dd %>%
mutate(result = map(data, ~.x %>% group_by(sub) %>%
summarize(count=n(), mean=mean(value)))) %>%
select(-data) %>% unnest
#> # A tibble: 15 x 4
#> field sub count mean
#> <int> <int> <int> <dbl>
#> 1 1 1 3 3.5
#> 2 1 2 3 4.5
#> 3 1 3 3 5.5
#> 4 1 4 3 6.5
#> 5 1 5 3 7.5
#> 6 2 1 3 13.5
#> 7 2 2 3 14.5
#> 8 2 3 3 15.5
#> 9 2 4 3 16.5
#> 10 2 5 3 17.5
#> 11 3 1 3 23.5
#> 12 3 2 3 24.5
#> 13 3 3 3 25.5
#> 14 3 4 3 26.5
#> 15 3 5 3 27.5
Created on 2018-08-22 by the reprex package (v0.2.0).