I would like to create a two-way frequency table containing both counts and row percentages, ideally using tidyverse functions. Here's what I have so far. In this example I want to get counts of Y or N RSVP by Sex:
library(dplyr)
df <- data.frame(Sex = c("M","F","M","M","F","F"),
RSVP = c("Y","N","N","Y","N","Y"),stringsAsFactors = FALSE)
df %>%
count(Sex, RSVP) %>%
group_by(Sex) %>%
mutate(prop = prop.table(n))
Sex RSVP n prop
<chr> <chr> <int> <dbl>
1 F N 2 0.667
2 F Y 1 0.333
3 M N 1 0.333
4 M Y 2 0.667
This appears to calculate correctly but I am having a hard time re-shaping this output such that the values of RSVP become two columns Y and N.
This can be done by adding tidyr to the mix. Because tidyr::spread can only handle a single value column at a time, we'll use tidyr::gather to force the counts and proportions into a single column. After that, we create a new column with the intended column names and spread it out.
library(tidyr)
df %>%
count(Sex, RSVP) %>%
group_by(Sex) %>%
mutate(prop = prop.table(n)) %>%
gather(key = "measure", value = "value", n, prop) %>%
mutate(label = paste0(measure, "_", RSVP)) %>%
select(-measure, -RSVP) %>%
spread(key = "label", value = "value")
# # A tibble: 2 x 5
# # Groups: Sex [2]
# Sex n_N n_Y prop_N prop_Y
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 F 2 1 0.667 0.333
# 2 M 1 2 0.333 0.667
And, just for completeness, this is how to get a similar result using base R:
Thank you, I have been using that to add totals to my tables, great package! I should have added that I wanted to print the counts and percentages as distinct columns, from what I could tell with janitor they would print as one column.
You can save a line of code with tidyr::unite(). Also, I've been trying to wrap my head around the new tidyr functions pivot_longer() and pivot_wider() for specifying more complex data reshaping in a single line of code, so I've included an example of that approach below as well.
# Need development version of tidyr for pivot_wider()
# devtools::install_github(tidyverse/tidyr)
library(tidyverse)
df <- data.frame(Sex = c("M","F","M","M","F","F"),
RSVP = c("Y","N","N","Y","N","Y"),stringsAsFactors = FALSE)
# Using unite() with @nwerth's answer
df %>%
count(Sex, RSVP) %>%
group_by(Sex) %>%
mutate(prop = prop.table(n)) %>%
gather(measure, value, n, prop) %>%
unite(measure, measure, RSVP) %>%
spread(measure, value)
#> # A tibble: 2 x 5
#> # Groups: Sex [2]
#> Sex n_N n_Y prop_N prop_Y
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 F 2 1 0.667 0.333
#> 2 M 1 2 0.333 0.667
# Using pivot_wider()
df %>%
count(Sex, RSVP) %>%
group_by(Sex) %>%
mutate(prop = prop.table(n)) %>%
pivot_wider(names_from=RSVP, values_from=c(n, prop))
#> # A tibble: 2 x 5
#> Sex n_N n_Y prop_N prop_Y
#> <chr> <int> <int> <dbl> <dbl>
#> 1 F 2 1 0.667 0.333
#> 2 M 1 2 0.333 0.667
This is really helpful, thank you. I had been hearing of unite() and the changes that would be coming with pivot_longer() and pivot_wider() so putting them into context here is very much appreciated.