# Count of observations after group_by()

Hi all,

I'm trying to get the number of observations for a specific variable after using `dplyr::group_by()`

``````df %>%
group_by(country, year) %>%
summarise(mean = mean(variable, na.rm = T),
sd = sd(variable, na.rm = T),
N = n()) -> df2
``````

The idea is to get the count of observations of the variable "variable" for each "country" and "year" to compute some standard errors and some nice confidence intervals. I believe in the code above I'm getting the count of all observations in a country in a specific year, but because "variable" has some NA it isn't what I need for the computation of SE and CI. If it clarifies further my question: I think the `n()` above isn't using the same figure as the one used by `mean()`, the one I need.

I've tried `add_count()` to no avail. What would you suggest? Thanks!

Are you looking for the count of value that different that `NA` ?
`N = sum(!is.na(variable))` could be what you want.
otherwise, you could use the `wt` in `tally`, `%>% add_tally(wt = !is.na(variable)`

But, not sure I understood correctly

2 Likes

@cderv Thanks for your reply. I was so focused on `n()` that I didn't think of looking up `sum()`. Just to learn how to use `add_tally()`, could you elaborate how/where it fits in the code below instead of `sum()`? Thanks a lot!

``````df %>%
group_by(cntry, essround) %>%
summarise(mean = mean(trstep2, na.rm = T),
sd = sd(trstep2, na.rm = T),
N = sum(!is.na(trstep2))) -> df2
``````

My understanding is that `add_tally()` doesn't go inside `summarise()`, but if I pipe it like below it doesn't work.

``````df %>%
group_by(country, year) %>%
summarise(mean = mean(variable, na.rm = T),
sd = sd(trstep2, na.rm = T)) %>%
add_tally(df, wt = !is.na(df\$tvariable)) -> df2
``````

Here is an example to show you the difference

``````library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
mtcars\$cyl <- NA
gp_df <- mtcars %>%
mutate(dummy_cat_for_reprex = rep_len(c("dummy1", "dummy2"), n())) %>%
group_by(dummy_cat_for_reprex)

gp_df %>%
summarise(mean = mean(cyl, na.rm = T),
sd = sd(cyl, na.rm = T),
N = n(),
N_without_NA = sum(!is.na(cyl)))
#> # A tibble: 2 x 5
#>   dummy_cat_for_reprex  mean    sd     N N_without_NA
#>   <chr>                <dbl> <dbl> <int>        <int>
#> 1 dummy1                6.4   1.88    16           15
#> 2 dummy2                5.88  1.71    16           16
gp_df %>%
tally(wt = !is.na(cyl))
#> # A tibble: 2 x 2
#>   dummy_cat_for_reprex     n
#>   <chr>                <int>
#> 1 dummy1                  15
#> 2 dummy2                  16

gp_df %>%
distinct(dummy_cat_for_reprex, n)
#> # A tibble: 2 x 2
#> # Groups:   dummy_cat_for_reprex 
#>   dummy_cat_for_reprex     n
#>   <chr>                <int>
#> 1 dummy1                  15
#> 2 dummy2                  16
``````

Created on 2019-01-21 by the reprex package (v0.2.1)

For what you want to do the `sum` is ok I think.

3 Likes

I understand now, thanks a lot for taking the time to show me!

1 Like

No problem ! Feel free to ask !

If your question's been answered would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.