Lubridate Package: Issue with date_time column

I am looking to produce a simple line graph for which I have behavioral count data for two treatment groups over multiple days and time intervals (every 2 min for 1 hr, starting at 30 sec). I have provided the first and last 10 lines but these data actually continue to 28-03-20 and were collected every other day (i.e., 14-03-20, 16-03-20, 18-03-20... 28-03-20).

Using the code below, I have been able to produce the following figure using ggplot. The answer to fix the issue seems straightforward--likely some sort of issue with the date_time column being treated as a factor or something? I am just looking to have a line graph with points at each time interval that spans the 14 day study period.

ggplot figure to visualize the issue: https://github.com/blhodinka/RHelp/issues/1#issue-676508850

Any help would be appreciated, thank you!


> zf.behav$date_time = dmy_hms(paste(zf.behav$Date, zf.behav$Time))

> head(zf.behav, n = 10L)
       Date Day    Time Sex Treatment Perched Ground Feed Water           date_time
1  14-03-20   0 0:00:30   M      CTRL       4      1    0     1 2020-03-14 00:00:30
2  14-03-20   0 0:00:30   M        WT       2      3    0     0 2020-03-14 00:00:30
3  14-03-20   0 0:02:30   M      CTRL       1      4    0     0 2020-03-14 00:02:30
4  14-03-20   0 0:02:30   M        WT       1      4    0     0 2020-03-14 00:02:30
5  14-03-20   0 0:04:30   M      CTRL       3      2    0     0 2020-03-14 00:04:30
6  14-03-20   0 0:04:30   M        WT       1      4    0     0 2020-03-14 00:04:30
7  14-03-20   0 0:06:30   M      CTRL       2      3    0     0 2020-03-14 00:06:30
8  14-03-20   0 0:06:30   M        WT       3      2    0     0 2020-03-14 00:06:30
9  14-03-20   0 0:08:30   M      CTRL       1      4    0     0 2020-03-14 00:08:30
10 14-03-20   0 0:08:30   M        WT       1      4    0     0 2020-03-14 00:08:30

> tail(zf.behav, n = 10L)
        Date Day    Time Sex Treatment Perched Ground Feed Water           date_time
951 28-03-20  14 0:50:30   F      CTRL       2      3    0     0 2020-03-28 00:50:30
952 28-03-20  14 0:50:30   F        WT       5      0    0     2 2020-03-28 00:50:30
953 28-03-20  14 0:52:30   F      CTRL       5      0    0     1 2020-03-28 00:52:30
954 28-03-20  14 0:52:30   F        WT       2      3    0     0 2020-03-28 00:52:30
955 28-03-20  14 0:54:30   F      CTRL       3      2    0     0 2020-03-28 00:54:30
956 28-03-20  14 0:54:30   F        WT       1      4    0     0 2020-03-28 00:54:30
957 28-03-20  14 0:56:30   F      CTRL       4      1    0     0 2020-03-28 00:56:30
958 28-03-20  14 0:56:30   F        WT       1      4    0     0 2020-03-28 00:56:30
959 28-03-20  14 0:58:30   F      CTRL       3      2    0     0 2020-03-28 00:58:30
960 28-03-20  14 0:58:30   F        WT       3      2    0     0 2020-03-28 00:58:30

> ggplot(zf.behav, aes(date_time, Perched, color=Treatment)) +
  geom_line() +
  scale_x_datetime(breaks = "2 days")

I can't reproduce your problem

library(ggplot2)
sample_data <- data.frame(
  stringsAsFactors = FALSE,
              Date = c("14-03-20","14-03-20",
                       "14-03-20","14-03-20","14-03-20","14-03-20","14-03-20",
                       "14-03-20","14-03-20","14-03-20"),
               Day = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
              Time = c("00:00:30","00:00:30",
                       "00:02:30","00:02:30","00:04:30","00:04:30","00:06:30",
                       "00:06:30","00:08:30","00:08:30"),
               Sex = c("M", "M", "M", "M", "M", "M", "M", "M", "M", "M"),
         Treatment = c("CTRL","WT","CTRL","WT",
                       "CTRL","WT","CTRL","WT","CTRL","WT"),
           Perched = c(4, 2, 1, 1, 3, 1, 2, 3, 1, 1),
            Ground = c(1, 3, 4, 4, 2, 4, 3, 2, 4, 4),
              Feed = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             Water = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
         date_time = as.POSIXct(c("2020-03-14 00:00:30","2020-03-14 00:00:30",
                       "2020-03-14 00:02:30","2020-03-14 00:02:30","2020-03-14 00:04:30","2020-03-14 00:04:30",
                       "2020-03-14 00:06:30","2020-03-14 00:06:30","2020-03-14 00:08:30","2020-03-14 00:08:30"))
)

ggplot(sample_data, aes(date_time, Perched, color = Treatment)) +
  geom_line() +
  scale_x_datetime(breaks = "2 days")

Created on 2020-08-11 by the reprex package (v0.3.0)
Can you provide a proper REPRoducible EXample (reprex) illustrating your issue?

New figure: https://github.com/blhodinka/RHelp/issues/2#issue-676519863

Ultimately the issue is that there should be a line that passes through 4 time points (00:00:30, 00:02:30, 00:04:30) for each treatment at each date interval (03/14, 03/16...03/28). I used geom_jitter() in this example to show each individual point and show that the lines are not connecting to each data point.

Reproducible data below:

> zf.behav$date_time = dmy_hms(paste(zf.behav$Date, zf.behav$Time))

> head(zf.behav, n = 40L)
       Date    Time Treatment Perched
1  14-03-20 0:00:30      CTRL       4
2  14-03-20 0:00:30        WT       2
3  14-03-20 0:02:30      CTRL       1
4  14-03-20 0:02:30        WT       1
5  14-03-20 0:04:30      CTRL       3
6  16-03-20 0:00:30      CTRL       4
7  16-03-20 0:00:30        WT       5
8  16-03-20 0:02:30      CTRL       1
9  16-03-20 0:02:30        WT       4
10 16-03-20 0:04:30      CTRL       0
11 18-03-20 0:00:30      CTRL       3
12 18-03-20 0:00:30        WT       1
13 18-03-20 0:02:30      CTRL       1
14 18-03-20 0:02:30        WT       3
15 18-03-20 0:04:30      CTRL       4
16 20-03-20 0:00:30      CTRL       3
17 20-03-20 0:00:30        WT       4
18 20-03-20 0:02:30      CTRL       2
19 20-03-20 0:02:30        WT       5
20 20-03-20 0:04:30      CTRL       2
21 22-03-20 0:00:30      CTRL       3
22 22-03-20 0:00:30        WT       0
23 22-03-20 0:02:30      CTRL       1
24 22-03-20 0:02:30        WT       2
25 22-03-20 0:04:30      CTRL       3
26 24-03-20 0:00:30      CTRL       2
27 24-03-20 0:00:30        WT       1
28 24-03-20 0:02:30      CTRL       4
29 24-03-20 0:02:30        WT       5
30 24-03-20 0:04:30      CTRL       3
31 26-03-20 0:00:30      CTRL       1
32 26-03-20 0:00:30        WT       0
33 26-03-20 0:02:30      CTRL       2
34 26-03-20 0:02:30        WT       0
35 26-03-20 0:04:30      CTRL       3
36 28-03-20 0:00:30      CTRL       0
37 28-03-20 0:00:30        WT       1
38 28-03-20 0:02:30      CTRL       5
39 28-03-20 0:02:30        WT       0
40 28-03-20 0:04:30      CTRL       3

> ggplot(zf.behav, aes(date_time, Perched, color=Treatment)) +
+   geom_line() + geom_point() +
+   geom_jitter() +
+   scale_x_datetime(breaks = "2 days", date_labels = "%m/%d")

The code is not reproducible or even easy to copy, please read the guide I gave you before and try to make a proper reproducible example.

This is the best I could do at creating a reproducible example using datapasta. R was throwing a number of errors when attempting to use the tibble pkg and reprex.

Plot photo here: https://github.com/blhodinka/RHelp/issues/2#issue-676519863

data.frame(
  stringsAsFactors = FALSE,
              Date = c("14-03-20","14-03-20",
                       "14-03-20","14-03-20","14-03-20","16-03-20","16-03-20",
                       "16-03-20","16-03-20","16-03-20","18-03-20","18-03-20",
                       "18-03-20","18-03-20","18-03-20","20-03-20","20-03-20",
                       "20-03-20","20-03-20","20-03-20","22-03-20",
                       "22-03-20","22-03-20","22-03-20","22-03-20","24-03-20",
                       "24-03-20","24-03-20","24-03-20","24-03-20","26-03-20",
                       "26-03-20","26-03-20","26-03-20","26-03-20","28-03-20",
                       "28-03-20","28-03-20","28-03-20","28-03-20"),
              Time = c("0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30",
                       "0:02:30","0:04:30","0:00:30","0:00:30","0:02:30","0:02:30",
                       "0:04:30","0:00:30","0:00:30","0:02:30","0:02:30",
                       "0:04:30"),
         Treatment = c("CTRL","WT","CTRL","WT",
                       "CTRL","CTRL","WT","CTRL","WT","CTRL","CTRL","WT",
                       "CTRL","WT","CTRL","CTRL","WT","CTRL","WT","CTRL",
                       "CTRL","WT","CTRL","WT","CTRL","CTRL","WT","CTRL",
                       "WT","CTRL","CTRL","WT","CTRL","WT","CTRL","CTRL",
                       "WT","CTRL","WT","CTRL"),
           Perched = c(4L,2L,1L,1L,3L,4L,5L,1L,
                       4L,0L,3L,1L,1L,3L,4L,3L,4L,2L,5L,2L,3L,0L,
                       1L,2L,3L,2L,1L,4L,5L,3L,1L,0L,2L,0L,3L,0L,
                       1L,5L,0L,3L),
         date_time = c("2020-03-14 00:00:30",
                       "2020-03-14 00:00:30","2020-03-14 00:02:30",
                       "2020-03-14 00:02:30","2020-03-14 00:04:30","2020-03-16 00:00:30",
                       "2020-03-16 00:00:30","2020-03-16 00:02:30",
                       "2020-03-16 00:02:30","2020-03-16 00:04:30","2020-03-18 00:00:30",
                       "2020-03-18 00:00:30","2020-03-18 00:02:30",
                       "2020-03-18 00:02:30","2020-03-18 00:04:30","2020-03-20 00:00:30",
                       "2020-03-20 00:00:30","2020-03-20 00:02:30",
                       "2020-03-20 00:02:30","2020-03-20 00:04:30","2020-03-22 00:00:30",
                       "2020-03-22 00:00:30","2020-03-22 00:02:30",
                       "2020-03-22 00:02:30","2020-03-22 00:04:30","2020-03-24 00:00:30",
                       "2020-03-24 00:00:30","2020-03-24 00:02:30",
                       "2020-03-24 00:02:30","2020-03-24 00:04:30","2020-03-26 00:00:30",
                       "2020-03-26 00:00:30","2020-03-26 00:02:30",
                       "2020-03-26 00:02:30","2020-03-26 00:04:30","2020-03-28 00:00:30",
                       "2020-03-28 00:00:30","2020-03-28 00:02:30",
                       "2020-03-28 00:02:30","2020-03-28 00:04:30")
)

> ggplot(zf.behav, aes(date_time, Perched, color=Treatment)) +
+   geom_line() + geom_point() +
+   scale_x_datetime(breaks = "2 days", date_labels = "%m/%d")

Thanks for the reprex.

The thing is, they do connect. But the reason why it does not look great, is
a) The lines overlap, so if you get red on top of blue, you don't see the blue line; and
b) The values for Treatment CTRL are not "properly" consecutive. So e.g. on the first day it goes 4 to 1 to 3 and then 2 days later to 4 (so 3, the point in the middle, gets connected to 4). Now, probably 4 to 1 to 3 is not really a straight line but since you are plotting intervals of 2 days, the 2 hour difference between those 3 observations is not really visible.

I would suggest finding a different way of visualizing this - maybe bar plots and separate plots for the two treatments.

You can see this if you plot your groups separately:

ggplot(zf.behav %>% filter(Treatment == "CTRL"), aes(date_time, Perched)) +
  geom_line() + 
  geom_point() +
  scale_x_datetime(breaks = "2 days", date_labels = "%m/%d")

ggplot(zf.behav %>% filter(Treatment == "WT"), aes(date_time, Perched)) +
  geom_line() + 
  geom_point() +
  scale_x_datetime(breaks = "2 days", date_labels = "%m/%d")

1 Like

Yeah--first off, thank you for your help! It's starting to make a bit more sense.

So the lines do connect. However, the "perched" data are not being properly distributed along the time interval portion of my date_time data column. When plotting the two treatment lines (CTRL, WT), it seems to be lumping every time interval on 03/14, for example, vertically on one line. A bar chart could work, but given my current problem, I think I would have the same issue in terms of representing the number of individuals "perched" at each time interval (00:00:30, 00:02:30, etc) on each day (03/14, 03/16, etc).

I found trouble trying to explain (above) so here is a figure, specifically of just 03/14 data. This figure is correct and exactly what I want, except I want to cover all days for which data were collected and not just 03/14. For some brief context, these are two treatments for captive birds. I simply want to see how many individuals were perched in the cage at each time interval during the 1 hr recording time at each date that I recorded. Figure: https://github.com/blhodinka/RHelp/issues/3#issue-677237534

I totally understand.

In that case, my suggestion would be to add a column with "day". Since you're using lubridate you can create that with

zf.behav$day <- day(zf.behav$date_time)

Then when you create the plot add a

facet_wrap(.~ day)

layer.

Because things don't look good right now because the time steps on your x-axis (days) are too larger to properly show 2-hourly change. And since the difference between the observations is always 2 days, in essence you're not really interested in the 2 days, right? With the above code you only show the time frames that are of interest to you, with sort of a break in between (they'll be separate little plots)

You could also try something without a date x-axis, but either way, you have to find a way to scale things that are satisfactory for you.

1 Like

And PS your jitter also did not show much, because 2-hourly intervals in 2 days are 1/24th steps (breaks) - very tiny. Adding a 40% jitter both positive and negative (default, total 80%) is still VERY small

2 Likes

This seems like the way to go--Looking at the particular time intervals (2 min intervals) are more important to me, thus, parsing things out by facet wrapping each experiment day might be the way to go.

Thank you for these suggestions!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.