How make a Scatter Plot with different colors from two columns with 24 different labels?

I´m using ggplot2 for made a Scatter Plot.
I have a DataFrame with 8 columns and 255.000 rows approximately, my DF look like this:

ID1          ID2         dN          dS          t           Label_ID1   Label_ID2   Group

ARB07438 	 YP_173238 	 0.0202 	 2.0534 	 0.4971 	 HKU1-CoV 	 HKU1-CoV 	 Intra
ARB07438 	 AZS52618 	 -0.0000 	 0.1115 	 0.0299 	 HKU1-CoV 	 HKU1-CoV 	 Intra
ARB07438 	 AYN64561 	 -0.0000 	 -0.0000 	 -0.0000 	 HKU1-CoV 	 HKU1-CoV 	 Intra
ARB07599 	 NP_073551 	 0.5332 	 2.5718 	 2.4730 	 HKU1-CoV 	 229E-CoV 	 Inter
ARB07599 	 QJY77946 	 0.5234 	 2.5786 	 2.4587 	 HKU1-CoV 	 229E-CoV 	 Inter

I made a Scatter Plot that represents this versus dN vs dS, dN vs t and dS vs t. The column Group represents only two different colors (Intra and Inter):

All fine with that, however, I wanna make a Scatter Plot that represents the Columns Labels_ID1 and ID2. These columns have 24 different labels (HKU1-CoV, Bovine-CoV, 229E-CoV, SARSr-bat-CoV, SARSr-bat-RaTG13-CoV, SARSr-bat-BM4831-CoV, Camel-229E-CoV, SARSr-Civet-CoV, PEDV-CoV, HKU2-CoV, HKU3-CoV, HKU4-CoV, HKU5-CoV, HKU23-CoV, MERS-CoV, Murine-CoV, MERSr-bat-Neoromicia-CoV, NL63-CoV, OC43-CoV, SARSr-Pangolin-CoV, PEHV-CoV, SARS1-CoV, SARS2-CoV and PEDV-CoV).
I´m looking for something like this:
In the DF the columns Label_ID1 and 2 have different combinations of labels, for example, 229E-CoV; Bovine-CoV.
Is it possible to make this "versus" representation with different colors to see the different distributions in the Scatter Plot of each Label?
I tried different ways, but nothing works for me, for this reason only paste the part of the code for the Scatter Plot with the two groups' columns (intra and inter, the firts Scatter Plot in the top).

df_S %>%
  ggplot(aes(x = dN, y = t)) + 
  geom_point(aes(color = Group)) +
  scale_y_continuous(trans='log10') +
  scale_x_continuous(trans='log10') +
  labs(title = "Pairwise Comparison S Protein",
       subtitle = "Inter versus Intragroup",
       x = "dN rate",
       y = "dS rate",
       color = "Group") +
  theme_gray() + 
  theme(axis.title = element_text())

Any help or ideas is welcome.

Hi @MauriAndres,
It is very difficult to distinguish 24 groups by colour alone (and you possibly have 24 x 24 combinations of Label_ID1 and Label_ID2).
I suggest you try facetting your graph on Label_ID1 and then use colour plus symbol shape to look at your distributions of Label_ID2.

df_S %>%
  ggplot(aes(x = your_x_variable, y = your_y_variable)) + 
  geom_point(aes(color = Label_ID2, shape = Label_ID2)) +
  facet_wrap( ~ Label_ID1)


1 Like

Great idea. I will try your suggestion.
Thank @DavoWW

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.