How can I change the point color in my scatter plot for each column of my dataframe (one color for each column)?

I'm trying to change the color of my data in a Scatter Plot. I have 5 columns (only 3 with values for my Plot) with 264.000 rows, this is my dataframe:

ID1         ID2         dN      dS      t
QJY77946    NP_073551   0.0241  0.1402  0.1479
QJY77954    NP_073551   0.0119  0.0912  0.0870
QJY77954    QJY77946    0.0119  0.0439  0.0566

My Scatter Plot when I make Column dN vs dS is this:

imagen

My question: How I can change the color for points in each column (one color for column dN, the other for the column dS...)??

Iยดm using ggplot2 and tidyverse libraries in RStudio, this is my code for Scatter Plot:

ggplot(aes(x = dN, y = t)) + 
  geom_point(size = 0.2) +
  labs(title = "Pairwise Comparison S Protein",
       subtitle = "dS vs dN with YN00",
       x = "dS rate",
       y = "dN rate") +
  theme_bw() + 
  theme(axis.title = element_text())

Any idea or help is welcome! Thank!

suppressPackageStartupMessages({
  library(ggplot2)
})

DF <- readr::read_csv("~/Desktop/grist.csv")
#> 
#> โ”€โ”€ Column specification โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#> cols(
#>   ID1 = col_character(),
#>   ID2 = col_character(),
#>   dN = col_double(),
#>   dS = col_double(),
#>   t = col_double()
#> )

ggplot(DF,aes(dN,dS, color = ID1)) + geom_point(size = 5) + theme_minimal()

1 Like

You have a scatter plot, right? This means dS on the X- dN on the Y-axis. You need both columns for one point.
How do you want to color the points differentially?

2 Likes

Thank you for your question!

I looking for each column to have a color, for example, red for all the values in the column dN, green for the values in column dS...

Iยดm looking for something like this:

imagen

Thank you for the reply!

I going to use your code and I will try with my data.
I hope this works!
Thank!

Check this example, you can colour-code the points but it needs to be an independent variable, something that isn't used on the x- or y-axis.

1 Like

Maybe define dN, dS and t in one variable, for example, points. Then something like this

ggplot(iris, aes(x=dS, y=dN, color=points))

What do you think?

Hi again!
My problem with this code is the following: I have 727 IDs, the values result after make a pairwise comparison with 727 * 727 ID.
For this reason, the rows are ~264.000. The column dN is the values for one comparison, the second is for others...
The dataframe is in tsv format, maybe if define cols names: dN, dS and t? Then, color = cols_names

ggplot(DF,aes(dN,dS,  color = cols_names)) + geom_point(size = 5) + theme_minimal()

What do you think?

Based on what I see in your data this will probably generate many different "points", probably as many as the number of lines in your data frame (except when you bin the numbers into ranges).
You need to find something to generate just a few (let's say 2-6 classifications).

You need to ask yourself what you want to show. What kind of information do you want to add with colour? What do you miss so far in the data representation.
Or do you just want to make it a bit more "beautiful"?

2 Likes

I agree: that many points will be a blob of pixels. Consider sampling to 7K points.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.