Hi,
I'm trying to create a barplot with bars ordered from the most frequent category to the less frequent one (btw, this is the right plot to create for factor variables, right? A boxplot would only make sense for categorical x and continuous y). I know of this question which is similar:
But it's not the same: I don't have any facets here. my_df
has only two columns, month
containing abbreviations of the first 10 months of the year, and state
which is either on
or off
. I want to create a barplot which shows the counts for each month, ideally by status, and ordered by count. I tried to order my dataframe by month count (sorted_df_easy
) or by month count and status before plotting it. Both approaches don't work:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
library(ggplot2)
# library(microbenchmark)
n <- 10^5
key <- as.factor(sample(month.abb[1:10], 10))
my_df <- data.frame(month = sample(key, n, replace = TRUE, prob = seq(0.1, 1, 0.1)),
state = sample(c("on", "off"), n, replace = TRUE))
my_df$month[sample(seq_len(n), 100)] <- NA
sorted_df_easy <- my_df %>%
count(month) %>%
arrange(-n)
# this doesn't work
ggplot(sorted_df_easy, aes(x = month, y = n)) +
geom_bar(stat="identity") +
coord_flip()
sorted_df_hard <- my_df %>%
count(state, month) %>%
arrange(state, -n)
# of course, this is even worse
ggplot(sorted_df_hard, aes(x = month, y = n, fill = state)) +
geom_bar(stat="identity") +
coord_flip()
Created on 2018-09-04 by the reprex package (v0.2.0).
Any solutions? Preferably, I'd rather not use forcats
- this is for an edge system, and the less stuff I depend on, the better (that's why I don't load tidyverse
, btw). Of course, if the forcats
is considerably shorter and more readable than the non-forcats
solution, I could change my mind.