# Create barplot for Boolean values

Dear community,
I would like to create a barplot from this data frame :
df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))

Each gene is expressed (1) or not (0), and I would like to visualize for each cluster a barplot of the expression ratio for each of the 3 genes, I can't find the way to visualize that...

Best regards

Simon

Hi Simon!

If I understand you correctly, you can do the plot like this:

``````library(tidyverse)

mydf <- data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0), geneB = c(0,1,1,1,1,0,0,1,1,0), geneC = c(0,1,1,1,0,0,1,1,0,1) ,cluster = c(1,3,1,2,1,3,2,3,2,1))
#Convert data
mydf_long<-mydf%>%pivot_longer(starts_with("gene"),names_to="Gene",values_to="Expression")
#Count occurrences per cluster
mydf_count<-mydf_long%>%group_by(cluster,Gene,Expression)%>%tally()

mydf_count%>%
ggplot(aes(x=Gene,y = n,fill=factor(Expression)))+
geom_bar(stat = "identity",position = "dodge")+
facet_wrap(~cluster)+
scale_fill_discrete(name = "Expression", labels = c("Silent", "Expressed"))
``````

Created on 2021-01-12 by the reprex package (v0.3.0)

Explanation:

At first, I convert your data into the long format, so that every observation has its own row. Then I count how many times each gene expressed / silent in each cluster and create the bar chart per cluster.

Is that the kind of visualisation you were thinking of?

Hello,
Yes this is the kind of representation I would like, but expressed as a ratio of expressed/silent for each gene (0-100%) , instead having 2 bars for the count of each category
Thanks again

Will this work for you?

``````library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
df = data.frame(geneA = c(0,1,0,1,1,1,1,0,1,0),
geneB = c(0,1,1,1,1,0,0,1,1,0),
geneC = c(0,1,1,1,0,0,1,1,0,1) ,
cluster = c(1,3,1,2,1,3,2,3,2,1))
LongDF <- df %>% pivot_longer(geneA:geneC,, names_to = "Gene", values_to = "Value")
#> # A tibble: 6 x 3
#>   cluster Gene  Value
#>     <dbl> <chr> <dbl>
#> 1       1 geneA     0
#> 2       1 geneB     0
#> 3       1 geneC     0
#> 4       3 geneA     1
#> 5       3 geneB     1
#> 6       3 geneC     1
SummaryDF <- LongDF %>% group_by(cluster, Gene) %>%
summarize(Fraction_Expressed = mean(Value))
#> `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
#> # A tibble: 6 x 3
#> # Groups:   cluster [2]
#>   cluster Gene  Fraction_Expressed
#>     <dbl> <chr>              <dbl>
#> 1       1 geneA              0.25
#> 2       1 geneB              0.5
#> 3       1 geneC              0.5
#> 4       2 geneA              1
#> 5       2 geneB              0.667
#> 6       2 geneC              0.667
ggplot(SummaryDF, aes(x = cluster, y = Fraction_Expressed, fill = Gene)) + geom_col(position = "dodge")
``````

Created on 2021-01-12 by the reprex package (v0.3.0)

That's perfect, thanks to both of you !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.