Showing counts at the top of bar charts

KyleWhoIsTall · October 27, 2021, 1:20pm

Newbie to R and several hours of searching have driven me insane. I'm working on the Tidy Tuesday project this week and want to display the counts of each runner's nationality that placed first. My research so far has gotten me to display just the number one. I think where I am running into trouble is figuring out how to get my count displayed properly, possibly to not being loaded properly or the filter throwing things off. Then again, might be completely off. Thanks in advance for any assistance!

library(tidyverse)
library(ggplot2)
library(readr)
library(dplyr)

ultra_rankings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/ultra_rankings.csv')
race <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/race.csv')

gt <- ultra_rankings %>% #New data frame with all the data controls
  filter(rank==1) %>% #Only results in showing rows with first place finishes
  group_by(nationality) %>% 
  count(nationality) %>%
  arrange(-n) %>% 
  head(10)

gt$nationality <- factor(gt$nationality, levels = unique(gt$nationality))


ultra_rankings %>%
  ggplot(data = gt,mapping = aes(x=nationality, y=n))+
  geom_bar(stat = "identity", fill="#000000")+
  geom_text(stat = 'count', data = gt, aes(label = after_stat(count), y = after_stat(count), vjust = -25))+
  labs(
    title = "First Place Rankings by Runner Nationality", 
    caption = "Data from runrepeat.com"
      )+
    
  scale_x_discrete(
      labels=c("USA", "UK", "France", "Australia", "Spain", "Sweden", "China", "Japan", "Poland", "Hong Kong")
      )+

  theme(
    plot.title = element_text(hjust = .5))+
    ylab("Total First Place Finishes")+
    xlab("Runner Nationalities")

xvalda · October 27, 2021, 2:03pm

Hi @KyleWhoIsTall ,

The count of runners per nationality is already in the n column of your tibble, so the geom_text line is easier.
I spotted a few additional things:

You hard-coded the full name of countries in the x scale, which you shouldn’t do as it is not replicable and prone to error (and indeed you wrote Hong Kong that doesn’t show in the country abbreviations). So I created a “countries” tibble that matches each abbreviation with full country names. Feel free to expand the list to all other countries.
I also deleted the line where you transform nationality into a vector, it is quicker to change this in aes part with fct_reorder.
Hope it helps

ultra_rankings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/ultra_rankings.csv')

gt <- ultra_rankings %>% #New data frame with all the data controls
  filter(rank==1) %>% #Only results in showing rows with first place finishes
  group_by(nationality) %>% 
  count(nationality) %>%
  arrange(-n) %>% 
  head(10)

#create reference tibble with 
countries_abr <- c("USA", "GBR", "FRA", "AUS", "ESP", "SWE", "CHN", "CAN", "JPN", "POL")
countries_full <- c("USA", "UK", "France", "Australia", "Spain", "Sweden", "China", "Canada", "Japan", "Poland")
countries <- tibble(countries_full, countries_abr)

#plot
gt %>% left_join(countries, by = c("nationality" = "countries_abr")) %>% 
  ggplot(aes(x=fct_reorder(countries_full, desc(n)), y=n))+
  geom_bar(stat = "identity", fill="#000000")+
  geom_text(aes(label = n), vjust = -0.5, size = 3) + 
  labs(
    title = "First Place Rankings by Runner Nationality", 
    caption = "Data from runrepeat.com", 
    y = "Total First Place Finishes", 
    x = "Runner Nationalities"
  )

KyleWhoIsTall · October 27, 2021, 4:26pm

Had to do some research on fct_reorder as I hadn't come across that yet but it works!

Thank you for the advice on the countries problem as well, I appreciate it. I think I am going to future proof it and finish coding in the rest of the countries incase it changes in the future. Hadn't worked on anything yet that has went in depth on joins.

system · November 3, 2021, 4:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.