 # Subgroup analysis - count by condition

Hello everyone,

New here, struggling with what (I think) is a bear of a problem.

I have a dataframe containing 16 variables and 293 observations. Each variable is a condition a respondent was asked about, and the cell value is "Yes", "No", or "Unknown" - these were coded in the dataframe as 1, 0, and 2 respectively.

I took this dataframe and ran it through gather() - which gave me a dataframe consisting of 4688 observations and 2 variables - Condition, and Value - where Condition is text corresponding to the condition the respondent was asked about and Value is the person's response (0, 1, or 2).

I then recoded the Value column replacing 0 with "No", 1 with "Yes", and 2 with "Unknown".
(I can't find how to insert R code so I apologize for the following jankiness)

So the dataframe looks like this:

Condition Value
A No
A Yes
A Yes
A Unknown
B No
...

What I can't figure out is how to get to this programmatically (if the above is an example):

Condition Value Count
A No 1
A Yes 2
A Unknown 1
B Yes 1

I've tried group_by(Condition) %>% summarize(n()) but that just gives me the total numbers of each condition - not what I'm looking for.

I tried using split-apply-combine methods... but I don't know what I'm missing.

Sidenote: I managed to get to what I wanted by manually using df %>% count(VARIABLE), and manually re-running this line 16 times, each time changing the name of the variable to what I wanted... but obviously that sort of solution doesn't scale up.

• ice

Maybe I'm misunderstanding your objective, but shouldn't you be adding `Value` into your `group_by()` call too?

``````library(dplyr, warn.conflicts = FALSE)

data <- tribble(~ Condition, ~ Value,
"A", "No",
"A", "Yes",
"A", "Yes",
"A", "Unknown",
"B", "No")

data %>%
group_by(Condition, Value) %>%
summarise(n = n())
#> # A tibble: 4 x 3
#> # Groups:   Condition 
#>   Condition Value       n
#>   <chr>     <chr>   <int>
#> 1 A         No          1
#> 2 A         Unknown     1
#> 3 A         Yes         2
#> 4 B         No          1
``````

Created on 2020-05-07 by the reprex package (v0.3.0)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

In a sense, yes. While the counts for each combination won't change, the order of the arguments dictates which variables come first in the output.

``````data %>%
group_by(Condition, Value) %>%
summarise(n = n())
# A tibble: 4 x 3
# Groups:   Condition 
Condition Value       n
<chr>     <chr>   <int>
1 A         No          1
2 A         Unknown     1
3 A         Yes         2
4 B         No          1

data %>%
group_by(Value, Condition) %>%
summarise(n = n())
# A tibble: 4 x 3
# Groups:   Value 
Value   Condition     n
<chr>   <chr>     <int>
1 No      A             1
2 No      B             1
3 Unknown A             1
4 Yes     A             2
``````

It also determines which grouping variables remain after `summarise()`. `summarise()` peels off the last layer of grouping so you can see that the `Groups:` are different in the outputs of the two calls above.

By the way, this `group_by()` + `summarise(n())` operation is so common that `dplyr` has a dedicated verb for it: `count()`. The following code is equivalent.

``````data %>% count(Value, Condition)
``````
1 Like

That is exactly what I've been trying to do. Thank you!

For my own understanding - feeding multiple arguments to group_by() results in sequential grouping in the order of the arguments?

ice

I appreciate it. Thank you ^.^

1 Like