Thanks for the reprex
, just missing
library(babynames)
library(dplyr)
library(ggplot2)
Not a biggie.
Here's the data going to ggplot
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(babynames))
babynames %>%
group_by(sex) %>%
top_n(5,n) %>%
ungroup() %>%
select(sex, name, year, n) %>%
arrange(sex, desc(n))
#> # A tibble: 10 x 4
#> sex name year n
#> <chr> <chr> <dbl> <int>
#> 1 F Linda 1947 99686
#> 2 F Linda 1948 96209
#> 3 F Linda 1949 91016
#> 4 F Linda 1950 80432
#> 5 F Mary 1921 73982
#> 6 M James 1947 94756
#> 7 M Michael 1957 92695
#> 8 M Robert 1947 91642
#> 9 M Michael 1956 90620
#> 10 M Michael 1958 90520
Created on 2020-03-02 by the reprex package (v0.3.0)
The plot with geom_col()
is about as condensed as possible a representation of the data. It does answer the question:
For each name, how many occurrences?
So, the question for the analyst is what else a plot should draw attention to. The rank change over years? Which sex is more consistently in the top five?
From the question comes the plot.
What is the question?