Hello I have to do a project with the following instruction :
Plot the distribution of the number of transcripts per gene.
By distribution, we expect the density or binned histogram of the univariate number of transcript per gene.
For the moment I did this, but this does not work very well ...
library(dplyr)
library(tidyverse)
library(ggplot2)
library(forcats)
gencode %>%
filter(!is.na(transcript_id)) %>%
group_by(gene_id) %>%
summarise(n = n_distinct(transcript_id)) %>%
ggplot(aes(x = gene_id)) +
stat_bin(aes(y = "count", label= "count"), geom="text", vjust=-.5) +
geom_bar(stat="identity")
geom_histogram(aes(x = n),
col ="red",
fill = "green",
alpha = .2) +
geom_density(col = 2) +
labs(title = "Histogram of transcript per gene", x = "gene", y = "transcript")
I would be happy if you could help me, thanks a lot