Subgroup analysis - count by condition

Hello everyone,

New here, struggling with what (I think) is a bear of a problem.

I have a dataframe containing 16 variables and 293 observations. Each variable is a condition a respondent was asked about, and the cell value is "Yes", "No", or "Unknown" - these were coded in the dataframe as 1, 0, and 2 respectively.

I took this dataframe and ran it through gather() - which gave me a dataframe consisting of 4688 observations and 2 variables - Condition, and Value - where Condition is text corresponding to the condition the respondent was asked about and Value is the person's response (0, 1, or 2).

I then recoded the Value column replacing 0 with "No", 1 with "Yes", and 2 with "Unknown".
(I can't find how to insert R code so I apologize for the following jankiness)

So the dataframe looks like this:

Condition Value
A No
A Yes
A Yes
A Unknown
B No

What I can't figure out is how to get to this programmatically (if the above is an example):

Condition Value Count
A No 1
A Yes 2
A Unknown 1
B Yes 1

I've tried group_by(Condition) %>% summarize(n()) but that just gives me the total numbers of each condition - not what I'm looking for.

I tried using split-apply-combine methods... but I don't know what I'm missing.

Sidenote: I managed to get to what I wanted by manually using df %>% count(VARIABLE), and manually re-running this line 16 times, each time changing the name of the variable to what I wanted... but obviously that sort of solution doesn't scale up.

Please help! And thank you ^.^

  • ice

Maybe I'm misunderstanding your objective, but shouldn't you be adding Value into your group_by() call too?

library(dplyr, warn.conflicts = FALSE)

data <- tribble(~ Condition, ~ Value,
                "A", "No",
                "A", "Yes",
                "A", "Yes",
                "A", "Unknown",
                "B", "No")

data %>% 
  group_by(Condition, Value) %>% 
  summarise(n = n())
#> # A tibble: 4 x 3
#> # Groups:   Condition [2]
#>   Condition Value       n
#>   <chr>     <chr>   <int>
#> 1 A         No          1
#> 2 A         Unknown     1
#> 3 A         Yes         2
#> 4 B         No          1

Created on 2020-05-07 by the reprex package (v0.3.0)

That is exactly what I've been trying to do. Thank you!

For my own understanding - feeding multiple arguments to group_by() results in sequential grouping in the order of the arguments?


In a sense, yes. While the counts for each combination won't change, the order of the arguments dictates which variables come first in the output.

data %>% 
  group_by(Condition, Value) %>% 
  summarise(n = n())
# A tibble: 4 x 3
# Groups:   Condition [2]
  Condition Value       n
  <chr>     <chr>   <int>
1 A         No          1
2 A         Unknown     1
3 A         Yes         2
4 B         No          1

data %>% 
  group_by(Value, Condition) %>% 
  summarise(n = n())
# A tibble: 4 x 3
# Groups:   Value [3]
  Value   Condition     n
  <chr>   <chr>     <int>
1 No      A             1
2 No      B             1
3 Unknown A             1
4 Yes     A             2

It also determines which grouping variables remain after summarise(). summarise() peels off the last layer of grouping so you can see that the Groups: are different in the outputs of the two calls above.

By the way, this group_by() + summarise(n()) operation is so common that dplyr has a dedicated verb for it: count(). The following code is equivalent.

data %>% count(Value, Condition)
1 Like

I appreciate it. Thank you ^.^

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.