Create barplot for Boolean values

Dear community,
I would like to create a barplot from this data frame :
df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))

Each gene is expressed (1) or not (0), and I would like to visualize for each cluster a barplot of the expression ratio for each of the 3 genes, I can't find the way to visualize that...

Best regards

Simon

Hi Simon!

If I understand you correctly, you can do the plot like this:

library(tidyverse)

mydf <- data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))
#Convert data
mydf_long<-mydf%>%pivot_longer(starts_with("gene"),names_to="Gene",values_to="Expression")
#Count occurrences per cluster
mydf_count<-mydf_long%>%group_by(cluster,Gene,Expression)%>%tally()
  

mydf_count%>%
  ggplot(aes(x=Gene,y = n,fill=factor(Expression)))+
  geom_bar(stat = "identity",position = "dodge")+
  facet_wrap(~cluster)+
  scale_fill_discrete(name = "Expression", labels = c("Silent", "Expressed"))

Created on 2021-01-12 by the reprex package (v0.3.0)

Explanation:

At first, I convert your data into the long format, so that every observation has its own row. Then I count how many times each gene expressed / silent in each cluster and create the bar chart per cluster.

Is that the kind of visualisation you were thinking of?

Hello,
Thank you for your answer
Yes this is the kind of representation I would like, but expressed as a ratio of expressed/silent for each gene (0-100%) , instead having 2 bars for the count of each category
Thanks again

Will this work for you?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), 
                geneB = c(0,1,1,1,1,0,0,1,1,0), 
                geneC = c(0,1,1,1,0,0,1,1,0,1) ,
                cluster = c(1,3,1,2,1,3,2,3,2,1))
LongDF <- df %>% pivot_longer(geneA:geneC,, names_to = "Gene", values_to = "Value")
head(LongDF)
#> # A tibble: 6 x 3
#>   cluster Gene  Value
#>     <dbl> <chr> <dbl>
#> 1       1 geneA     0
#> 2       1 geneB     0
#> 3       1 geneC     0
#> 4       3 geneA     1
#> 5       3 geneB     1
#> 6       3 geneC     1
SummaryDF <- LongDF %>% group_by(cluster, Gene) %>% 
  summarize(Fraction_Expressed = mean(Value))
#> `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
head(SummaryDF)
#> # A tibble: 6 x 3
#> # Groups:   cluster [2]
#>   cluster Gene  Fraction_Expressed
#>     <dbl> <chr>              <dbl>
#> 1       1 geneA              0.25 
#> 2       1 geneB              0.5  
#> 3       1 geneC              0.5  
#> 4       2 geneA              1    
#> 5       2 geneB              0.667
#> 6       2 geneC              0.667
ggplot(SummaryDF, aes(x = cluster, y = Fraction_Expressed, fill = Gene)) + geom_col(position = "dodge")

Created on 2021-01-12 by the reprex package (v0.3.0)

1 Like

That's perfect, thanks to both of you !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.