Create barplot for Boolean values

I would like to create a barplot from this data frame :
df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))

Each gene is expressed (1) or not (0), and I would like to visualize for each cluster a barplot of the expression ratio for each of the 3 genes, I can't find the way to visualize that...

Hi Simon!

If I understand you correctly, you can do the plot like this:


mydf <- data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))
#Convert data
#Count occurrences per cluster

  ggplot(aes(x=Gene,y = n,fill=factor(Expression)))+
  geom_bar(stat = "identity",position = "dodge")+
  scale_fill_discrete(name = "Expression", labels = c("Silent", "Expressed"))

At first, I convert your data into the long format, so that every observation has its own row. Then I count how many times each gene expressed / silent in each cluster and create the bar chart per cluster.

Is that the kind of visualisation you were thinking of?

Thank you for your answer
Yes this is the kind of representation I would like, but expressed as a ratio of expressed/silent for each gene (0-100%) , instead having 2 bars for the count of each category
Thanks again

Will this work for you?

df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), 
                geneB = c(0,1,1,1,1,0,0,1,1,0), 
                geneC = c(0,1,1,1,0,0,1,1,0,1) ,
                cluster = c(1,3,1,2,1,3,2,3,2,1))
LongDF <- df %>% pivot_longer(geneA:geneC,, names_to = "Gene", values_to = "Value")
#> # A tibble: 6 x 3
#>   cluster Gene  Value
#>     <dbl> <chr> <dbl>
#> 1       1 geneA     0
#> 2       1 geneB     0
#> 3       1 geneC     0
#> 4       3 geneA     1
#> 5       3 geneB     1
#> 6       3 geneC     1
SummaryDF <- LongDF %>% group_by(cluster, Gene) %>% 
  summarize(Fraction_Expressed = mean(Value))
#> `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
#> # A tibble: 6 x 3
#> # Groups:   cluster [2]
#>   cluster Gene  Fraction_Expressed
#>     <dbl> <chr>              <dbl>
#> 1       1 geneA              0.25 
#> 2       1 geneB              0.5  
#> 3       1 geneC              0.5  
#> 4       2 geneA              1    
#> 5       2 geneB              0.667
#> 6       2 geneC              0.667
ggplot(SummaryDF, aes(x = cluster, y = Fraction_Expressed, fill = Gene)) + geom_col(position = "dodge")

That's perfect, thanks to both of you !

