Query on time-series plot

Hello All,
I am pretty new in R. I am facing one problem in plotting time series data. Actually, I do not have continuous data (per day). There are several missing days, which I haven't included in the .csv file. Please see the below photo of the data.

image

I have tried to remove missing days for plotting using bdscale library. It did remove missing days but, the x-axis date is not appropriate. Please see below photo of a graph. I might did some mistakes.

I will be grateful to you if you could kindly help me to resolve my error. Also, I am really sorry if my codes are messy ;).

Thank you in advance,
Ashish

library(ggplot2)
library(bdscale)
library(scales)
data <- read.csv("Trial.csv")
data$Time <-as.POSIXct(strptime(data$Time, format="%m/%d/%Y"))
tiff("Trial_timeseries.tiff", units="in", width=10, height=5, res=300)
mydata<-ggplot(data,aes(x=Time,y=etc, color=type, linetype=type))+
  xlab("Date")+ 
  ylab("value")+
  geom_line(size=1.4)+
 scale_x_bd(business.dates=data$Time, labels = date_format("%y-%m-%d"), max.major.breaks=100)+
theme_bw()
mydata
dev.off()
```r

Hello,

Can you use dput() or str() on data right before you pass the data to ggplot and paste here? I'd like to see the structure of your data.

1 Like

Thank you so much for your response. Please find below as per your requirements:

structure(list(Time = c("12/21/2015", "12/31/2015", "1/1/2016",
"1/2/2016", "1/18/2016", "1/19/2016", "1/20/2016", "1/21/2016",
"1/27/2016", "1/29/2016", "2/1/2016", "2/3/2016", "2/5/2016",
"2/8/2016", "2/10/2016", "2/12/2016", "2/15/2016", "2/17/2016",
"2/19/2016", "2/22/2016", "2/24/2016", "2/26/2016", "2/29/2016",
"11/10/2016", "11/12/2016", "11/14/2016", "11/16/2016", "11/25/2016",
"11/28/2016", "11/30/2016", "12/6/2016", "12/7/2016", "12/9/2016",
"12/12/2016", "12/14/2016", "12/16/2016", "12/19/2016", "12/23/2016",
"12/26/2016", "12/28/2016", "12/30/2016", "1/2/2017", "12/21/2015",
"12/31/2015", "1/1/2016", "1/2/2016", "1/18/2016", "1/19/2016",
"1/20/2016", "1/21/2016", "1/27/2016", "1/29/2016", "2/1/2016",
"2/3/2016", "2/5/2016", "2/8/2016", "2/10/2016", "2/12/2016",
"2/15/2016", "2/17/2016", "2/19/2016", "2/22/2016", "2/24/2016",
"2/26/2016", "2/29/2016", "11/10/2016", "11/12/2016", "11/14/2016",
"11/16/2016", "11/25/2016", "11/28/2016", "11/30/2016", "12/6/2016",
"12/7/2016", "12/9/2016", "12/12/2016", "12/14/2016", "12/16/2016",
"12/19/2016", "12/23/2016", "12/26/2016", "12/28/2016", "12/30/2016",
"1/2/2017"), etc = c(8.85, 19.11, 16.4, 12.72, 12.82, 6.83, 5.42,
9.02, 9.5, 8.64, 7.02, 6.85, 17.5, 10.83, 17.33, 7.54, 9.05,
4.97, 5.06, 8.28, 14.13, 8.38, 9.28, 6.05, 5.2, 3.17, 3.22, 14.63,
0.81, 10.29, 5.17, 8.09, 2.77, 5.76, 9.29, 6.96, 1.27, 25.46,
5.29, 5.92, 3.2, 13.5, 20.09, 66.11, 55.21, 47.24, 48.71, 29.32,
12.74, 28.39, 27.03, 23.35, 18.58, 17.27, 57.33, 35.9, 47.47,
15.17, 21.22, 14.31, 12.15, 26.17, 35.19, 28.08, 31.27, 30.61,
15.74, 6.22, 9.8, 44.9, 1.02, 24.72, 18.23, 24.42, 9.68, 19.05,
24.53, 30.37, 2.28, 75.95, 20.86, 20.32, 10.89, 47.49), type = c("Low",
"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
"Low", "Low", "Low", "Low", "Low", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High")), class = "data.frame", row.names = c(NA,
-84L))

It looks your date field is a character, it should be a number. Dates are a numbers that starts from 1970. You should use as.Date() to convert your field to a date.

Thank you so much for your prompt response. Could you have a look at my codes and let me know where I did a mistake in codes. Actually, I have converted dates using "data$Time <-as.POSIXct(strptime(data$Time, format="%m/%d/%Y"))".

Does str(data) say that your date column is a date type?

You mention having several missing days, but there are jumps of more than a week and a very long gap in the middle, from the end of February to the start of November. That gap does not appear in your graph. Is that your intention? By setting business.dates equal to all the dates in your data, then increasing time by one just means moving to the next observation, whether that is the next day or in eight months.

Hello,

I would reconsidering using bdscale. Do you really need this package? Not sure it brings you any value. In the code below you will see you data transformed into a tsibble, or time series tibble, and use the tsibble package to test for things that matter for time series like missing time gaps. The last graph has your data with NA's filled in for the missing weeks. Like @EconProf said, you are missing a lot of data between observations.

R has some great packages for time series, tsibble being one, and the fable packages for time series modeling. The learning curve can be large sometimes, but it will be worth it in the end. Welcome to R!

library(tidyverse)
library(bdscale)
time_dta <- as.data.frame(
  list(Time = c("12/21/2015", "12/31/2015", "1/1/2016",
                "1/2/2016", "1/18/2016", "1/19/2016", "1/20/2016", "1/21/2016",
                "1/27/2016", "1/29/2016", "2/1/2016", "2/3/2016", "2/5/2016",
                "2/8/2016", "2/10/2016", "2/12/2016", "2/15/2016", "2/17/2016",
                "2/19/2016", "2/22/2016", "2/24/2016", "2/26/2016", "2/29/2016",
                "11/10/2016", "11/12/2016", "11/14/2016", "11/16/2016", "11/25/2016",
                "11/28/2016", "11/30/2016", "12/6/2016", "12/7/2016", "12/9/2016",
                "12/12/2016", "12/14/2016", "12/16/2016", "12/19/2016", "12/23/2016",
                "12/26/2016", "12/28/2016", "12/30/2016", "1/2/2017", "12/21/2015",
                "12/31/2015", "1/1/2016", "1/2/2016", "1/18/2016", "1/19/2016",
                "1/20/2016", "1/21/2016", "1/27/2016", "1/29/2016", "2/1/2016",
                "2/3/2016", "2/5/2016", "2/8/2016", "2/10/2016", "2/12/2016",
                "2/15/2016", "2/17/2016", "2/19/2016", "2/22/2016", "2/24/2016",
                "2/26/2016", "2/29/2016", "11/10/2016", "11/12/2016", "11/14/2016",
                "11/16/2016", "11/25/2016", "11/28/2016", "11/30/2016", "12/6/2016",
                "12/7/2016", "12/9/2016", "12/12/2016", "12/14/2016", "12/16/2016",
                "12/19/2016", "12/23/2016", "12/26/2016", "12/28/2016", "12/30/2016",
                "1/2/2017"), etc = c(8.85, 19.11, 16.4, 12.72, 12.82, 6.83, 5.42,
                                     9.02, 9.5, 8.64, 7.02, 6.85, 17.5, 10.83, 17.33, 7.54, 9.05,
                                     4.97, 5.06, 8.28, 14.13, 8.38, 9.28, 6.05, 5.2, 3.17, 3.22, 14.63,
                                     0.81, 10.29, 5.17, 8.09, 2.77, 5.76, 9.29, 6.96, 1.27, 25.46,
                                     5.29, 5.92, 3.2, 13.5, 20.09, 66.11, 55.21, 47.24, 48.71, 29.32,
                                     12.74, 28.39, 27.03, 23.35, 18.58, 17.27, 57.33, 35.9, 47.47,
                                     15.17, 21.22, 14.31, 12.15, 26.17, 35.19, 28.08, 31.27, 30.61,
                                     15.74, 6.22, 9.8, 44.9, 1.02, 24.72, 18.23, 24.42, 9.68, 19.05,
                                     24.53, 30.37, 2.28, 75.95, 20.86, 20.32, 10.89, 47.49), type = c("Low",
                                                                                                      "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
                                                                                                      "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
                                                                                                      "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
                                                                                                      "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
                                                                                                      "Low", "Low", "Low", "Low", "Low", "High", "High", "High", "High",
                                                                                                      "High", "High", "High", "High", "High", "High", "High", "High",
                                                                                                      "High", "High", "High", "High", "High", "High", "High", "High",
                                                                                                      "High", "High", "High", "High", "High", "High", "High", "High",
                                                                                                      "High", "High", "High", "High", "High", "High", "High", "High",
                                                                                                      "High", "High", "High", "High", "High", "High"))
)

str(time_dta)
as.character(trimws(time_dta$Time)) %>% str()
time_dta <- time_dta %>% 
  mutate(date_str = as.character(.data$Time),
         date = as.Date(date_str, format = '%m/%d/%Y'))

str(time_dta)

time_ts <- tsibble::tsibble(time_dta, key = type, index = date)
str(time_ts)
tsibble::key(time_ts)
tsibble::index(time_ts)

# Your data has duplicates because you have High and Low
tsibble::is_duplicated(time_ts)
tsibble::are_duplicated(time_ts)

time_ts <- time_ts %>% tsibble::group_by_key(.data$type)

# Your data has duplicates because you have High and Low and NA's
tsibble::duplicates(time_ts)
tsibble::is_duplicated(time_ts)
tsibble::are_duplicated(time_ts)

tsibble::has_gaps(time_ts)

time_ts %>% ggplot(aes(x=date,y=etc, color=type)) +
  xlab("Date") + 
  ylab("value") +
  geom_line() +
  #scale_x_bd(business.dates=date, labels = date_format("%Y-%m-%d"), max.major.breaks=100) +
  theme_bw()

time_ts_complete <- tsibble::fill_gaps(time_ts)

tsibble::duplicates(time_ts_complete)
tsibble::is_duplicated(time_ts_complete)
tsibble::are_duplicated(time_ts_complete)

time_ts_complete %>% ggplot(aes(x=date,y=etc, color=type)) +
  xlab("Date") + 
  ylab("value") +
  geom_line(size = 2) +
  #scale_x_bd(business.dates=data$Time, labels = date_format("%Y-%m-%d"), max.major.breaks=100)+
  theme_bw()

Yes. I wanted to hide the dates for which data is not available. I was able to plot as per my requirement using business.dates, but names in the x-axis not coming properly. As per @fredoxvii suggestions, I have converted the Time column as a date format using as.Date(). Still, there are no changes in the figure. It would be good if you have any suggestions :).

Thank you so much for the codes. It is not necessary to use bdscale. I have tried your codes using tsibble package. It gives appropriate x-axis legends. But, the Figure showed missing values (please see below figure), I would like to exclude these missing values.

For that, I tried to use buisness.dates to remove the missing dates. It removed the missing data, but, the problem remains the same (x-axis label are not good).

Then I would recommend not using dates on the x axis. You need something else.

You can save the new date variable as a character in the format of yyyy-mm-dd. On the x axis, that will plot in the correct order of time without having a time variable.

This is what you are looking for, but you will have to play the axes options for a better looking graph.

str(time_dta)
as.character(trimws(time_dta$Time)) %>% str()
time_dta <- time_dta %>% 
  mutate(date_str = as.character(.data$Time),
         date = as.Date(date_str, format = '%m/%d/%Y'),
         date_chr = as.character(date)) %>% 
  arrange(date_chr,type)

str(time_dta)

time_dta %>% ggplot(aes(x=date_chr,y=etc, color = type, group=type)) +
  xlab("Date") + 
  ylab("value") +
  geom_line() +
  scale_x(breaks = 'weeks')
  theme_bw()

Although I would say that this is very missing leading and I would never recommend doing something like this in the real world.

Thank you so much @fredoxvii. You are right it is not good to represent data i this kind of graph. Earlier I have a thought to break the data (remove the x-time) only from March 2016 to October 2016, as this is the longest period for which data is not available. What do you think about this kind of representation? Do you think it it will be a good option? If you have any suggestions I am happy to hear from you :slight_smile: .

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.