ggplot beginner - probably easy to solve

Tom3 · February 17, 2021, 11:07pm

Hello together,
I´m trying to get ggplots for a larger amount of data and so far i was a bit but not fully successful.

this is my code:
XLINKFORR <- read_excel("C:/Users/thoma/OneDrive/Desktop/XLINKFORR.xlsx")

ggplot(XLINKFORR,aes(x=gene,y=peptides,fill=-log10(p.value))) +
geom_col(position="stack",width=0.4) +
coord_flip() + scale_fill_viridis(option="plasma")+
facet_grid(family~.)

This is my plot

My Problem is that in the Plot all genes are repeated (U2SURP - CHERP), but only CHERP
DDX42, DHX15, PUF60 and U2SURP are 17S U2 associated. How can I prevent the repition of the other genes?

technocrat · February 18, 2021, 1:08am

See the FAQ: How to do a minimal reproducible example reprex for beginners. Questions like this are much easier to troubleshoot, and far more likely to attract useful answers, with a complete reprex, including representative data.

nirgrahamuk · February 18, 2021, 1:20am

Here is one approach


(XLINKFORR <- data.frame(
  gene=factor(letters[c(1,2,1,3)]),
  family=c(1,1,2,2),
  peptides=c(12,21,31,14),
  p.value=(1:4)/40
))
library(viridis)
library(tidyverse)

(virid_scale_extent <- range(-log10(XLINKFORR$p.value)))


# ggplot(XLINKFORR,aes(x=gene,y=peptides,fill=-log10(p.value))) +
#   geom_col(position="stack",width=0.4)+
#   coord_flip() + scale_fill_viridis(option="plasma",limits=virid_scale_extent)+
#   facet_grid(family~.)
# 

(split_df <- XLINKFORR %>%
    group_by(family) %>% 
    group_split())

split_df_recode <- map(split_df,~mutate(.,
                     gene=factor(gene)))

library(cowplot)

base_plot_func <- function(x){
  ggplot(x,aes(x=gene,y=peptides,fill=-log10(p.value))) +
    geom_col(position="stack",width=0.4) +
    coord_flip() + scale_fill_viridis(option="plasma",
                                      ,limits=virid_scale_extent)
}


plots <- purrr::map(split_df_recode,
                    ~{base_plot_func(.x) +
                        theme(legend.position = "none")
                        })

g1 <- plot_grid(plotlist=plots,
          labels=unique(XLINKFORR$family),
          nrow = 2)

# extract the legend from one of the plots
legend <- get_legend(
  # create some space to the left of the legend
  base_plot_func(split_df_recode[[1]]) + theme(legend.box.margin = margin(0, 0, 0, 12))
)

# add the legend to the row we made earlier. Give it one-third of 
# the width of one plot (via rel_widths).
plot_grid(g1, legend, rel_widths = c(6,1))

Tom3 · February 18, 2021, 2:07am

Thank you very much, it works! Really a big help for me - will take me a while to really understand the process but your comments help.

p.s. sorry for not shortening my question to an example

Tom3 · February 18, 2021, 4:00pm

Additional question:
with the code provided (thanks again!) the plots are given out all in the same size no matter how many genes are in a family. Is there any possibility to make the plot size dependend on the nr. of genes per family and give all bars the same width?
i tried differen things (scale continuously etc.) nothing works, guess because they are distinct names and not numbers

system · March 11, 2021, 4:00pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.