ggplot beginner - probably easy to solve

Hello together,
I´m trying to get ggplots for a larger amount of data and so far i was a bit but not fully successful.

this is my code:
XLINKFORR <- read_excel("C:/Users/thoma/OneDrive/Desktop/XLINKFORR.xlsx")

ggplot(XLINKFORR,aes(x=gene,y=peptides,fill=-log10(p.value))) +
geom_col(position="stack",width=0.4) +
coord_flip() + scale_fill_viridis(option="plasma")+
facet_grid(family~.)

This is my plot

My Problem is that in the Plot all genes are repeated (U2SURP - CHERP), but only CHERP
DDX42, DHX15, PUF60 and U2SURP are 17S U2 associated. How can I prevent the repition of the other genes?

See the FAQ: How to do a minimal reproducible example reprex for beginners. Questions like this are much easier to troubleshoot, and far more likely to attract useful answers, with a complete reprex, including representative data.

Here is one approach


(XLINKFORR <- data.frame(
  gene=factor(letters[c(1,2,1,3)]),
  family=c(1,1,2,2),
  peptides=c(12,21,31,14),
  p.value=(1:4)/40
))
library(viridis)
library(tidyverse)

(virid_scale_extent <- range(-log10(XLINKFORR$p.value)))


# ggplot(XLINKFORR,aes(x=gene,y=peptides,fill=-log10(p.value))) +
#   geom_col(position="stack",width=0.4)+
#   coord_flip() + scale_fill_viridis(option="plasma",limits=virid_scale_extent)+
#   facet_grid(family~.)
# 

(split_df <- XLINKFORR %>%
    group_by(family) %>% 
    group_split())

split_df_recode <- map(split_df,~mutate(.,
                     gene=factor(gene)))

library(cowplot)

base_plot_func <- function(x){
  ggplot(x,aes(x=gene,y=peptides,fill=-log10(p.value))) +
    geom_col(position="stack",width=0.4) +
    coord_flip() + scale_fill_viridis(option="plasma",
                                      ,limits=virid_scale_extent)
}


plots <- purrr::map(split_df_recode,
                    ~{base_plot_func(.x) +
                        theme(legend.position = "none")
                        })

g1 <- plot_grid(plotlist=plots,
          labels=unique(XLINKFORR$family),
          nrow = 2)

# extract the legend from one of the plots
legend <- get_legend(
  # create some space to the left of the legend
  base_plot_func(split_df_recode[[1]]) + theme(legend.box.margin = margin(0, 0, 0, 12))
)

# add the legend to the row we made earlier. Give it one-third of 
# the width of one plot (via rel_widths).
plot_grid(g1, legend, rel_widths = c(6,1))

Thank you very much, it works! Really a big help for me - will take me a while to really understand the process but your comments help.

p.s. sorry for not shortening my question to an example

Additional question:
with the code provided (thanks again!) the plots are given out all in the same size no matter how many genes are in a family. Is there any possibility to make the plot size dependend on the nr. of genes per family and give all bars the same width?
i tried differen things (scale continuously etc.) nothing works, guess because they are distinct names and not numbers

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.