How to plot time series with multiple sample rows

Dear R community,

I am completely new to R (not programming in general but the environment & syntax) and I have some troubles resolving the following issue, on which you hopefully could help me.

I have a CSV file that contains multiple rows and columns, whereas the first column is sample-ID, the second one contains time measurements (time series), i.e. 0.5, 1, 2, 5, 10, 20, 30,... minutes and the other columns contain values taken at that specific time (e.g. value X at 10 minutes). The structure of this file is that the sample-ID column contains 10 rows with SAME sample-ID, e.g. ABC, and an associated time measurement column with first row (e.g. = 0.5 minutes, second row = 1 minutes, third row = 2 minutes,...) while still talking about the SAME sample! After the first 10 rows, the next 10 rows come with the second sample XYZ, etc.

What I would like to plot now is the evolution of each sample as a line (in the same plot) along time (minutes, x-axis) and a specific value (third column) at that time on the y-axis.

How can I tell R that each X rows have to be considered "as one sample" (maybe with a for loop?), with a specific time value (X, Y, Z minutes) and plot them as a line across a range of Y-axis values? I only managed to get unreasonable plots so far...

While in Python this is easily solved with e.g. adding "-" to create a line plot and not having so many issues with time series, I might not just be uesd to R (but I would like to get more into it). I have read that I might have to use "data tables" instead of "data frames" in R? However, when I import them as Data Table, I actually get an extra column with each column as a row!? Not sure what kind of sense that would make, though..

I hope I made the problem clear and that you might have a an approach on how to solve it.
Thank you very much!

I've created something called a reprex (short for reproducible example) that I think matches the specs you're describing (see the guide on how to do one yourself here).

I'm using the tidyverse package (specifically tibble to create the dummy data, which you won't need to do since you have the actual data, and ggplot2 for the plotting the geom_line()) to do this, but I'm sure the same could be done in a variety of ways.

tibbles and data tables (using the data.table package) are enhanced data frames, and, no, you don't need to use either of them specifically to accomplish this.

library(tidyverse)
dat <- tibble::tribble(
  ~sample_id, ~time, ~value,
     "gghjk",   0.5,    0.8,
     "gghjk",     1,      2,
     "gghjk",   1.5,    2.1,
     "gghjk",     2,    2.2,
     "gghjk",   2.5,      3,
     "gghjk",     3,    3.2,
     "gghjk",   3.5,    3.4,
     "gghjk",     4,    3.5,
     "gghjk",   4.5,    3.6,
     "gghjk",     5,      4,
     "lknnm",   0.5,    0.3,
     "lknnm",     1,    1.5,
     "lknnm",   1.5,    1.6,
     "lknnm",     2,    1.7,
     "lknnm",   2.5,    2.5,
     "lknnm",     3,    2.7,
     "lknnm",   3.5,    2.9,
     "lknnm",     4,      3,
     "lknnm",   4.5,    3.1,
     "lknnm",     5,    3.5
  )

dat %>%
  ggplot(aes(x = time, y = value, group = sample_id)) +
  geom_line(aes(colour = sample_id))

Created on 2021-04-08 by the reprex package (v1.0.0)

Hope this helps. Note that I'm grouping by which sample, rather than in groups of ten rows, since that sounds like it's your actual goal.

P.S. For more on working with time series in R, I highly recommend checking out the tsibble package, which is part of the "tidyverts" ecosystem:

1 Like

Dear Mara,

many thanks for this example!
It worked perfectly!

However, when I plotted the result and discovered that there might be better options to plot it. Hence, I have to think about something better in terms of visualization (79 samples > 79 lines make the plot quite nasty). That brings me back to plotting & data analysis.

As we are speaking here, maybe I may ask you another question, which I would like to code in R:

  • I have an ACCESS database (I would export the tables as CSV files), with columns A, B, C,... that contain top and bottom of a depth, e.g. sample-ID is ABC and lasts from 10 m to 10.2 m (top/bottom). Now, I have another file that contains some samples that have been sampled in the same depth interval as the ones from the ACCESS files. I would like to implement the information in the ACCESS file into the second file. Meaning, that if there is a sample XYZ that contains a value about e.g. mineralogy content, I would like to write a code that checks in the second file the sample-ID (in order to have the correct sample) and then creates a new column for this file and writes the same value in this row of the second file.

The if-condition would look something like this:
for x in ...
if sample-ID (in ACCESS file) == sample-ID in second file, then check the depth interval (top-bottom) AND the location >> if the location is valid and the depth interval exists that write value ABC in a newly created column with this value in the second file. IF NOT then reply that no interval was found that is correct.

TBH... I was thinking of doing it manually but with almost 800 samples this might take a while... however, maybe even faster when I code it in R being a complete NOOB..

Hi @Nemlock,

Glad you got your plotting sorted. If you have a separate, unrelated question, would you mind starting a new thread (and also marking this one as solved)?

It's part of our effort to "keep things tidy" here on community, and make it easier for others to find questions and answers relevant to them in the future (see FAQ below).

https://forum.posit.co/faq#keep-tidy

Thanks,

Mara

1 Like

Sure, sorry about that!
I will open a new thread - this one can be closed.

Thank you very much for your help again! :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.