Axis labels GGPlot

shp5009 · February 25, 2020, 2:06am

I need to hard code the labels for the variables and I can't use the row names as they are way too long and very non descriptive and confusing. Unfortunately, I need to do it ad hoc and just hardcode the variable names that I need for my plots. At this point, I can't create another variable with more reasonable names as I have too many to do in a short time.

I'm trying to hard code the plots so that the following are the labels.
indicator1 = "motorcycle"
indicator2 = "tricycle"
indicator3 = "truck"
indicator4 = "car"
indicator5 = "minivan"
I tried the below code, but it didn't work.

library(dplyr)
library(data.table)
library(ggplot2)

## Data 
df_1 <- data.frame(
  indicator1   = c(1,0,1,0,1,0,0,0,0,0,0,0,0,1,1,0,1,0,0,1),
  indicator2   = c(1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0),
  indicator3   = c(1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0),
  indicator4   = c(0,1,0,0,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0),
  indicator5   = c(0,0,1,1,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,0))

print(df_1)

mean_1 = apply(df_1,2,mean)
mean_1

# need to sort the means  
mean_sort = data.frame(sort(mean_1, decreasing = TRUE))
mean_sort

# Start the indicator subset code for the different sections.
# get row names
n = row.names(mean_sort)
n

# I only want the top 2 and bottom 2 variables.  
df_indicator_tail = tail(mean_sort,2)
df_indicator_tail
df_indicator_head = head(mean_sort,2)
df_indicator_head
df_indicator_rbind = rbind(df_indicator_head,df_indicator_tail)
df_indicator_rbind

# get the mean column name back 
names(df_indicator_rbind) = "mean"
df_indicator_rbind

# create a column which is the sequence of the sort so I can do plots in order of mean value not alphabetic
df_indicator_rbind$x = seq(nrow(df_indicator_rbind))
df_indicator_rbind

# Plot: Need to label the y-axis tick marks to reasonable variable names 
ggplot(data=df_indicator_rbind) +
  geom_col(aes(x=x, y=mean,fill=mean), position=position_dodge())+
  theme(axis.title.y=element_blank())+
  labs(title = "Index by Indicator",
       y = "Index") +
  coord_flip()
#labels = c(x,"truck","minivan","motorcycle","car")  Didn't work.

williaml · February 25, 2020, 2:58am

Not sure if your code is correct.

I made this change as df_indicator doesn't exist.

# I only want the top 2 and bottom 2 variables.  
df_indicator_tail = tail(df_1,2) # i think you meant this
df_indicator_tail
df_indicator_head = head(df_1,2) # i think you meant this
df_indicator_head
df_indicator_rbind = rbind(df_indicator_head, df_indicator_tail)
df_indicator_rbind

After running this, it looks a bit weird. Is it meant to look like this?

# get the mean column name back 
names(df_indicator_rbind) = "mean"
df_indicator_rbind

# create a column which is the sequence of the sort so I can do plots in order of mean value not alphabetic
df_indicator_rbind$x = seq(nrow(df_indicator_rbind))
df_indicator_rbind

> df_indicator_rbind
   mean NA NA NA NA NA x
1     1  1  1  0  0  1 1
2     0  1  0  1  0  2 2
19    0  0  1  0  0  3 3
20    1  0  0  0  0  4 4

shp5009 · February 25, 2020, 3:08am

My bad. I should have cleared my work environment. This should work.

DavoWW · February 25, 2020, 3:08am

Do you simply want a bar chart of the mean of each column, sorted by size, and with only the top 2 and bottom 2 columns shown? If yes, then this works

library(dplyr)
library(ggplot2)
library(tidyr)

# Data
df_1 <- data.frame(
  indicator1 = c(1,0,1,0,1,0,0,0,0,0,0,0,0,1,1,0,1,0,0,1),
  indicator2 = c(1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0),
  indicator3 = c(1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0),
  indicator4 = c(0,1,0,0,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0),
  indicator5 = c(0,0,1,1,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,0))

names(df_1) <- c("motorcycle", "tricycle", "truck", "car", "minivan")

df_1 %>% 
  summarise_all(mean) %>% 
  sort(.) -> cols

keep_cols <- cols[c(c(1,2), c(length(cols)-1, length(cols)))]

keep_cols %>% 
  pivot_longer(., cols=everything()) %>% 
  ggplot(.,) +
  geom_col(aes(x=reorder(name, value), y=value), fill="blue") +
  labs(title = "Index by Indicator", y = "Index", x = "Indicator") +
  coord_flip()

HTH

shp5009 · February 25, 2020, 3:24am

That does work for this example. But I have a couple hundred columns all with ungodly, long, useless variable names. Then I'm sorting by the mean and keeping the top 10 variables and the bottom 10 variables. So I only want to create labels (shorter, useful variable names) for these 20.

It is a temporary fix until I can go through the hundreds of variables and make useful names.

Any suggestions? Thanks.

DavoWW · February 25, 2020, 3:38am

Have you seen the clean_names() function in the {janitor} package? It may help.
Otherwise, get the long, horrible, names into a vector, and then use string editing to fix them (janitor::make_clean_names() can be used here too). Can you post an example of what you have, and what you want?

shp5009 · February 25, 2020, 3:48am

I have looked at that package. The names are essentially sentences. I need to spend some time and get them fixed, but am looking at a temporary quick fix for a report that I have to get out.

Eventually, I need to rename all of the variables to make my life easier. Isn't there a way to hard code the labels for the quick fix?

DavoWW · February 25, 2020, 4:09am

You can manually specify the tick mark labels using the scale_x_discrete argument:

keep_cols %>%
  pivot_longer(., cols=everything()) %>%
  ggplot(.,) +
  geom_col(aes(x=reorder(name, value), y=value), fill="blue") +
  labs(title = "Index by Indicator", y = "Index", x="Indicator") +
  coord_flip() +
  scale_x_discrete(labels=c("Label B", "Label Z", "Label A", "Label G"))

HTH

shp5009 · February 25, 2020, 4:35am

Yes. That works as a temporary solution for this report that I need to get out.

I'm going to use your first solution after I make a list of new variable names. Your code should work great with that since I can't change the names on the database. That will be much more error proof than the temporary solution. Thanks for the help!

system · March 3, 2020, 4:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.