Plotting prevalence within bins, denominator to be bin-total rather than entire total

Hi,

I'm trying to plot prevalence for a group of age-ranges and can do so but prevalence for each bin (age_group) is as % of entire population rather than for the number of people within the bin. Any ideas how to amend code below, is it somewhere in the summarise or mutate(pct lines?

Thanks

trimmed_df %>%
group_by(age_group, MM_binary) %>%
summarise(n = n(age_group)) %>%
mutate(pct = round(n/sum(n)*100, 1)) %>%
ggplot(aes(x = age_group, y = pct)) +
geom_col() +
scale_y_continuous(limits = c(0, 70),
breaks = scales::pretty_breaks(),
labels = scales::percent_format(scale = 1)) +
theme(legend.position = "top") +
labs(title = "Prevalence of MM by age",
y = "% of population",
x = "Age group", color = "grey20", size = 20, angle = 90) -> figure_1

figure_1

I got a straight syntax error for this, but I think the age_group mention is superfluous ? excluding it seemed to show me what I expect you are attempting.

in the first part, I show the setup that I estimate that you have.
in the second part, are changes I would make

library(tidyverse)

set.seed(42)

trimmed_df <- data.frame(
  age_group=sample(letters[1:5],100,replace=TRUE),
  MM_binary=sample(c(0,1),100,replace=TRUE)
)

your_summary <- trimmed_df %>%
  group_by(age_group, MM_binary) %>%
  summarise(n = n()) %>%
  mutate(pct = round(n/sum(n)*100, 1)) 

your_summary %>%
  mutate(pct = round(n/sum(n)*100, 1)) %>%
  ggplot(aes(x = age_group, y = pct)) +
  geom_col() +
  scale_y_continuous(limits = c(0, 70),
                     breaks = scales::pretty_breaks(),
                     labels = scales::percent_format(scale = 1)) +
  theme(legend.position = "top") +
  labs(title = "Prevalence of MM by age",
       y = "% of population",
       x = "Age group", color = "grey20", size = 20, angle = 90) -> figure_1

figure_1
alt_summary <-  trimmed_df %>%
  group_by(age_group, MM_binary) %>%
  summarise(n = n()) %>% pivot_wider(id_cols="age_group",
                                     names_from="MM_binary",
                                     values_from="n") %>% 
  mutate(frac=`1`/(`1`+`0`),
         labels=scales::percent(frac))

alt_summary

alt_summary %>%
  ggplot(aes(x = age_group, y = frac)) +
  geom_col() +
  scale_y_continuous(limits = c(0, 1),
                     breaks = scales::pretty_breaks(),
                     labels = scales::percent_format(scale = 100)) +
  theme(legend.position = "top") +
  labs(title = "Prevalence of MM by age",
       y = "% of population",
       x = "Age group", color = "grey20", size = 20, angle = 90) -> figure_2

figure_2

I am so unbelivably grateful, this worked first time. I have been battling with this for week, thank you!!!

Just another question about this - I have 6 disease measures altogether (MM_binary and then 5 other ones). Can this plot be made into a line graph where each line represents prevalence for each disease?

Thanks again,

Clare

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.