Creating scatterplot on a large number of variables

Hi,

I recently started learning R, mainly for the generation of nice plots from scientific results (biology).

I try to create a scatter plot for some enzyme activities. I could find some good online tutorial on how to generate plots but all tutorials I could find deal with 2 variables, which is quite easy to manage.

In my case, I have a dataset of 25 variables that I would like to plot.

I generated a small set of data.
I measured the activity of 5 enzymes every 5 seconds during 60 seconds and I would like to generate a scatterplot with time as X axis and activity for the 5 different enzymes on y-axis on a single plot

What is the best method to work on a large set of variables? Is there a way to say, plot column 1 (Time) against column 2 to 6 (Enzyme1 to 5) without using the full name of each column?

structure(list(`Time (sec)` = c(0, 5, 10, 15, 20, 25, 30, 35, 
40, 45, 50, 55, 60), Enzyme1 = c(0, 2, 4, 8, 16, 32, 64, 128, 
128, 128, 128, 128, 128), Enzyme2 = c(0, 1, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12), Enzyme3 = c(0, 10, 20, 30, 40, 50, 60, 70, 
80, 90, 100, 110, 120), Enzyme4 = c(0, 0.125, 0.25, 0.5, 0.75, 
1, 1.25, 2.5, 5, 10, 20, 40, 60), Enzyme5 = c(0, 5, 10, 15, 20, 
25, 30, 35, 40, 45, 50, 55, 60)), row.names = c(NA, -13L), class = c("tbl_df", 
"tbl", "data.frame"))
#>    Time (sec) Enzyme1 Enzyme2 Enzyme3 Enzyme4 Enzyme5
#> 1           0       0       0       0   0.000       0
#> 2           5       2       1      10   0.125       5
#> 3          10       4       2      20   0.250      10
#> 4          15       8       3      30   0.500      15
#> 5          20      16       4      40   0.750      20
#> 6          25      32       5      50   1.000      25
#> 7          30      64       6      60   1.250      30
#> 8          35     128       7      70   2.500      35
#> 9          40     128       8      80   5.000      40
#> 10         45     128       9      90  10.000      45
#> 11         50     128      10     100  20.000      50
#> 12         55     128      11     110  40.000      55
#> 13         60     128      12     120  60.000      60

Do not hesitate to contact me if you need additional information.

Thank you in advance,

1 Like

Hello @fgaascht

Depending on the type data you have and what you want to plot. In must cases a line plot will give you a better vision of your data (These plots are called times series plots in statistics).

type

library(ggplot2)
?geom_line

to get more information but you have to be able to work with data, in date-time format.
And the melt function from the package reshape2 will be useful, and the lubridate package for date-time data

If you fail to do something with the information. reply back and I will do it for you but 30 hours from, now

Here is a simple example using the data posted.

DF <- structure(list(`Time (sec)` = c(0, 5, 10, 15, 20, 25, 30, 35, 
                                      40, 45, 50, 55, 60), 
                     Enzyme1 = c(0, 2, 4, 8, 16, 32, 64, 128,  128, 128, 128, 128, 128), 
                     Enzyme2 = c(0, 1, 2, 3, 4, 5, 6, 7,  8, 9, 10, 11, 12), 
                     Enzyme3 = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120), 
                     Enzyme4 = c(0, 0.125, 0.25, 0.5, 0.75, 1, 1.25, 2.5, 5, 10, 20, 40, 60), 
                     Enzyme5 = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)), 
                row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyr)
library(ggplot2)
DFtall <- DF %>% pivot_longer(cols = Enzyme1:Enzyme5, 
                              names_to = "Enzyme", values_to = "Value")
head(DFtall)
#> # A tibble: 6 x 3
#>   `Time (sec)` Enzyme  Value
#>          <dbl> <chr>   <dbl>
#> 1            0 Enzyme1     0
#> 2            0 Enzyme2     0
#> 3            0 Enzyme3     0
#> 4            0 Enzyme4     0
#> 5            0 Enzyme5     0
#> 6            5 Enzyme1     2
ggplot(DFtall, aes(x = `Time (sec)`, y = Value, group = Enzyme, color = Enzyme)) + 
  geom_line() + geom_point()

Created on 2020-05-03 by the reprex package (v0.3.0)

Hi,

Thank you @Amanyiraho_Robinson and @FJCC.

I took a little bit of time to discover and apprehend Rstudio.
I discovered the package tidyr to reshape my table with the gather function.

Analysis <- gather (Example, "Construction", "OD", 2:6)
   
`Time (sec)` Construction    OD
          <dbl> <chr>        <dbl>
 1            0 Enzyme1          0
 2            5 Enzyme1          2
 3           10 Enzyme1          4
 4           15 Enzyme1          8
 5           20 Enzyme1         16
 6           25 Enzyme1         32
 7           30 Enzyme1         64
 8           35 Enzyme1        128
 9           40 Enzyme1        128
10           45 Enzyme1        128
# … with 55 more rows

And I tried to generate a plot with the following code:

> ggplot(data=Analysis, aes(x="Time (sec)", y=OD, group = Construction, color = Construction))+geom_point()

However, data are not correctly distributed on my x-axis, they seem to be consider as one single element.

Do I miss a parameter in my command line?

Thank you in advance.

In the above part of your call to ggplot, do not put Time (sec) in double quotes. Use a single back tick `, which is the key just to the left of the key with 1 on a American keyboard.

Thank you @FJCC,

I was not really aware of the different code convention with ', ", or ` now.
Is there any place or cheat sheet were I could find a brief explanation for it?

Thanks again

There is some explanation here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html.
Basically, you can use either single quotes or double quotes to denote characters. Back ticks are used to enclose names that have spaces or are otherwise syntactically wrong. I avoid such illegal names rather than use back ticks.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.