dplyr::count -- include a 0 for factor levels not in the data

gxm204 · March 18, 2019, 7:20pm

Hi, I am summarizing responses to a Likert-style survey item. In some cases, there are item levels (which I coded as factors) that have no responses, but for purposes of summarizing I would like to include them in the resulting table as a 0 (or I suppose NA would be fine too). What might be a good approach for this?

Here is what I an envisioning:

library(tidyverse)

sampsurvey <- data.frame(rating = c("Agree","Strongly Agree", "Strongly Disagree", "Disagree",
                                    "Strongly Disagree", "Agree"))

# Assign factor levels -- 5 levels
sampsurvey$rating <- factor(sampsurvey$rating, levels = c("Strongly Disagree", 
                                                          "Disagree",
                                                          "Neutral",
                                                          "Agree",
                                                          "Strongly Agree"))
                            
                            
                            
# How do I get the "Neutral" level counted as a zero in this table?
sampsurvey %>% count(rating)

Here is my rather inelegant solution:

### I would think there would be a better way than this?

sampsurvey <- data.frame(rating = c("Agree","Strongly Agree", "Strongly Disagree", "Disagree",
                                    "Strongly Disagree", "Neutral"),
                         itemcount = c(1, 1, 1, 1, 1, NA))

sampsurvey$rating <- factor(sampsurvey$rating, levels = c("Strongly Disagree", 
                                                          "Disagree",
                                                          "Neutral",
                                                          "Agree",
                                                          "Strongly Agree"))
sampsurvey %>% 
  group_by(rating) %>% 
  summarise(nresponses = sum(itemcount))

mishabalyasin · March 18, 2019, 7:59pm

I was certain that this is how it's supposed to work out of the box, but turns out there is still a default setting that drops the empty groups. So, you can get what you want with .drop = FALSE in count:

library(tidyverse)

sampsurvey <- data.frame(rating = c("Agree","Strongly Agree", "Strongly Disagree", "Disagree",
                                    "Strongly Disagree", "Agree"))

# Assign factor levels -- 5 levels
sampsurvey$rating <- factor(sampsurvey$rating, levels = c("Strongly Disagree", 
                                                          "Disagree",
                                                          "Neutral",
                                                          "Agree",
                                                          "Strongly Agree"))



# How do I get the "Neutral" level counted as a zero in this table?
sampsurvey %>% 
  count(rating, .drop = FALSE) 
#> # A tibble: 5 x 2
#>   rating                n
#>   <fct>             <int>
#> 1 Strongly Disagree     2
#> 2 Disagree              1
#> 3 Neutral               0
#> 4 Agree                 2
#> 5 Strongly Agree        1

^{Created on 2019-03-18 by the reprex package (v0.2.1)}

mishabalyasin · March 18, 2019, 8:01pm

This is the change in 0.8.0 and first iteration had .drop = FALSE by default, but it was quickly changed because it proved to be too disruptive.

github.com

tidyverse/dplyr/blob/master/NEWS.md#major-changes

# dplyr 0.8.0.9000

* `group_by()` does a shallow copy even in the no groups case (#4221).

* Fixed `mutate()` on rowwise data frames with 0 rows (#4224).

* Fixed handling of bare formulas in colwise verbs (#4183).

* Fixed performance of `n_distint()` (#4202). 

* `group_indices()` now ignores empty groups by default for `data.frame`, which is
  consistent with the default of `group_by()` (@yutannihilation, #4208). 

* Fixed integer overflow in hybrid `ntile()` (#4186). 

* colwise functions `summarise_at()` ... can rename vars in the case of multiple functions (#4180).

* `select_if()` and `rename_if()` handle logical vector predicate (#4213). 

* hybrid `min()` and `max()` cast to integer when possible (#4258).

This file has been truncated. show original

gxm204 · March 18, 2019, 8:02pm

Excellent & resolved!

system · March 25, 2019, 8:08pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.