How to Manually Order X axis on bar chart

df <- read.csv(url('https://raw.githubusercontent.com/angelddaz/bridgetomasters/master/CSVs/ye_data.csv'))

#install.packages("ggplot2")
colnames(df)[colnames(df)=="Kanye.dataset"] <- "album_name"

# CD 2004
# LR 2005
# Graduation 2007
# 808s 2008
# MBDTF 2010
# Yeezus 2013
# TLOP 2016

# we want to add a new column called release year associated with each album
df$release_year[df$album_name=="College Dropout"] <- "2004"
df$release_year[df$album_name=="Late Registration"] <- "2005"
df$release_year[df$album_name=="Graduation"] <- "2007"
df$release_year[df$album_name=="808s & Heartbreak"] <- "2008"
df$release_year[df$album_name=="My Beautiful Dark Twisted Fantasy"] <- "2010"
df$release_year[df$album_name=="Yeezus"] <- "2013"
df$release_year[df$album_name=="The Life of Pablo"] <- "2016"


library(ggplot2)
# Automatic levels
ggplot(df, aes(factor(album_name))) + geom_bar()    

# Manual levels
ry_table <- table(df$release_year) # rlease year table
ry_levels <- names(ry_table)[order(ry_table)]

qplot(reorder(factor(release_year),factor(release_year),length),data=df,geom="bar")

This R code produces two bar charts. One automatically ordered across the x axis and the other ordered by the count amount of each release year. I want to be able to order by release year, which I have hard coded as values in a new column in this dataset. I also want to have the x axis bar labels to be the album names, not the release year. I was wondering if I could add a "by" to the order in the ry_levels variable?

You can do this by specifying the levels argument of the factor function. If I am understanding your question correctly, you are looking to rearrange the x axis of your first plot. Here is the code that will give you what you want (note: I moved the factor call into a mutate function from dplyr to make the ggplot call a little cleaner, IMO):

library(ggplot2)
library(dplyr)
# Automatic levels
df %>% 
  dplyr::mutate(album_name = factor(album_name, 
                                    levels = c("College Dropout", "Late Registration", "Graduation",
                                               "808s & Heartbreak", "My Beautiful Dark Twisted Fantasy",
                                               "Yeezus", "The Life of Pablo"))) %>% 
ggplot(aes(album_name)) + geom_bar()  

That gives you this:

4 Likes

Wow awesome thank you so much. This is exactly what I had in mind. I'm not familiar with dplyr so this is good exposure to it.

2 Likes
levels <- unique(df$album_name)
levels <- as.data.frame(levels)

# we want to add a new column called release year associated with each album
levels$release_year[levels$levels=="College Dropout"] <- "2004"
levels$release_year[levels$levels=="Late Registration"] <- "2005"
levels$release_year[levels$levels=="Graduation"] <- "2007"
levels$release_year[levels$levels=="808s & Heartbreak"] <- "2008"
levels$release_year[levels$levels=="My Beautiful Dark Twisted Fantasy"] <- "2010"
levels$release_year[levels$levels=="Yeezus"] <- "2013"
levels$release_year[levels$levels=="The Life of Pablo"] <- "2016"

levels <- transform(levels, release_year = as.numeric(release_year)) # changing to numeric so I can sort
levels <- levels[order(levels$release_year),]
levels <- levels$levels # changing to a single vector to pass into dplyr mutate function


# now we can pass in a variable instead of hard coding the album names

df %>% 
  dplyr::mutate(album_name = factor(album_name, levels = levels)) %>% 
  ggplot(aes(album_name)) + geom_bar()


As an FYI to readers, I figured out some lines so that the mutate function album names ordering isn't hard coded. Or at least the order is hard coded sooner in the data wrangling stream through release years.

1 Like