annote() generate unexpected X-axis order

I want to plot a series of genes from 24 chromosomes and want to use the start position of each gene as X-axis, log2 ratio as Y-axis, but label the X-axis with chromosome numbers. To do that, I add the max gene start position of one chromosome to the following chromosome. For example, chromosome 2 gene start position all become its original postion plus the max start postion of chromsome 1, chromosome 3 start positions all become their original position plus the max start position of chromosome2, and so on. In this way, all gene start positions are in order of smallest to greatest. They are then used as X-axis for plotting.
The problem is feeling like the plot didn't plot out data as I expected, I want it to plot out from chr1, chr2, chr3 ...chr24.
Here is the output I got:

image

pos <- df1 %>% 
  group_by(chromosome) %>% 
  summarize(avg = round(mean(start))) %>% pull(avg)

t1 <- df1 %>% group_by(chromosome) %>% summarize(avg=max(start)) %>% pull(avg)
ggplot(df1) +
  annotate("point", x=df1$start, y=df1$log2, size=0.5, color=df1$chromosome) +
  annotate("segment", x=0, xend=max(df1$start), y=-0, yend=0,color="black")+
  annotate("segment", x=t1, xend=t1, y=-4, yend = 2, color = "grey")+
  scale_x_discrete(limits = pos, labels=unique(df1$chromosome) +
  labs(x = "chromosome") +
  ylim(c(-30,3))

Can anyone have any suggestions what may go wrong with my codes?

Your help is greatly appreciated.

Best,
Lim

Hi @Limin_Chen,
Looks like your df1$chromosome column is stored as a factor with alphabetically sorted levels, which are then used in deciding the (incorrect) x-axis plotting order.
For illustration, contrast these definitions of a fictitious factor:

factor(c(1,2,3,10,11,12,21,22,23))
#> [1] 1  2  3  10 11 12 21 22 23
#> Levels: 1 2 3 10 11 12 21 22 23
factor(c("1","2","3","10","11","12","21","22","23"))
#> [1] 1  2  3  10 11 12 21 22 23
#> Levels: 1 10 11 12 2 21 22 23 3
factor(c("chr01","chr02","chr03","chr10","chr11","chr12","chr21","chr22","chr23"))
#> [1] chr01 chr02 chr03 chr10 chr11 chr12 chr21 chr22 chr23
#> Levels: chr01 chr02 chr03 chr10 chr11 chr12 chr21 chr22 chr23

Created on 2021-10-19 by the reprex package (v2.0.1)

You should be able to fix this by modifying your data frame as follows:

df1$chromosome <- factor(ifelse(chromosome %in% as.character(1:9), 
                                paste0("chr0", chromosome), 
                                paste0("chr", chromosome)))

Edit: Another option is to leave the factor levels containing digits only but produce a new ordered factor with the levels sorted correctly:

df1$chromosome <- factor(df1$chromosome, 
                         ordered=TRUE, 
                         levels=sort(as.numeric(levels(df1$chromosome))))

Hi @DavoWW

Thanks for the suggestions.
I noticed that for the annotate(), in order to use color = df1$chromosome, the chromosome column can only number, either as numeric datatype or character datatype. However, if I set the chromosome column as character, there is no error running the code, but will have the issue of confusing X-axis labels as I posted. I made it work by adding this code

df1$chromosome <- as.numeric(df1$chromosome)

If there are any alphabets in the chromosome column, there will be issue to use color = df1$chromosome in annotate(). Here is the error message:

If you can explain why does this happens, that will be helpful.

Thanks for all the suggestions.

Best,

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.