Handling Data for Daylight-Saving and Non-Daylight-Saving for HFT

I faced similar problem with Handling data on the days when we switch to daylight savings time and back in R which is the datetime of the dataset.

dataset :

## # A tibble: 7,200 x 5
##    index                open  high   low close
##    <dttm>              <dbl> <dbl> <dbl> <dbl>
##  1 2015-01-04 22:00:58  121.  121.  120.  121.
##  2 2015-01-04 22:01:58  121.  121.  121.  121.
##  3 2015-01-04 22:02:54  121.  121.  121.  121.
##  4 2015-01-04 22:03:59  121.  121.  121.  121.
##  5 2015-01-04 22:04:59  121.  121.  120.  121.
##  6 2015-01-04 22:05:59  121.  121.  120.  121.
##  7 2015-01-04 22:06:59  121.  121.  120.  121.
##  8 2015-01-04 22:07:59  121.  121.  121.  121.
##  9 2015-01-04 22:08:59  121.  121.  121.  121.
## 10 2015-01-04 22:09:54  121.  121.  121.  121.
## # ... with 7,190 more rows

inconsistancy data point within a day.

## # A tibble: 6 x 2
##   `as.Date(index)`     n
##   <date>           <int>
## 1 2015-01-04         120
## 2 2015-01-05        1440
## 3 2015-01-06        1440
## 4 2015-01-07        1440
## 5 2015-01-08        1440
## 6 2015-01-09        1320

And then I tried to convert all timezone, however some dataset is wierd as we can see from list 2,3,4, the initial time started from 23:00:00 while the rest started from 00:00:00. Normally Daylight-Saving must be a seasonal period am I right? Its weird if the Daylight-Saving only few days... But someone shade me a light for cope with the issue?

> llply(fls[[7]][1:10], function(x) {
+     readRDS(x) %>% data.table
+  })
[[1]]
                    index     open     high      low    close
   1: 2015-01-05 00:00:58 120.5040 120.5900 120.4810 120.5550
   2: 2015-01-05 00:01:58 120.5580 120.6205 120.5250 120.5740
   3: 2015-01-05 00:02:54 120.5960 120.6330 120.5605 120.5900
   4: 2015-01-05 00:03:59 120.5920 120.6570 120.5370 120.6035
   5: 2015-01-05 00:04:59 120.6060 120.6540 120.4890 120.5870
  ---                                                        
7196: 2015-01-09 23:55:37 118.5190 118.5190 118.4915 118.4965
7197: 2015-01-09 23:56:59 118.4945 118.5050 118.4825 118.4965
7198: 2015-01-09 23:57:59 118.4985 118.5235 118.4910 118.5015
7199: 2015-01-09 23:58:51 118.4995 118.5530 118.4805 118.5390
7200: 2015-01-09 23:59:01 118.5480 118.5480 118.5480 118.5480

[[2]]
                    index     open     high      low    close
   1: 2015-03-08 23:00:33 120.8550 120.8650 120.8500 120.8600
   2: 2015-03-08 23:01:58 120.8600 120.8735 120.8495 120.8625
   3: 2015-03-08 23:02:43 120.8560 120.8560 120.8495 120.8495
   4: 2015-03-08 23:03:56 120.8515 120.8515 120.8445 120.8450
   5: 2015-03-08 23:04:27 120.8470 120.8490 120.8470 120.8490
  ---                                                        
7189: 2015-03-13 22:54:59 121.4520 121.4570 121.4300 121.4385
7190: 2015-03-13 22:55:59 121.4435 121.4470 121.4350 121.4455
7191: 2015-03-13 22:56:59 121.4450 121.4530 121.3995 121.4305
7192: 2015-03-13 22:57:56 121.4325 121.4585 121.4230 121.4370
7193: 2015-03-13 22:58:59 121.4310 121.4620 121.3690 121.4370

[[3]]
                    index     open     high     low   close
   1: 2015-03-15 23:00:57 121.4040 121.4250 121.388 121.402
   2: 2015-03-15 23:01:55 121.4030 121.4090 121.378 121.378
   3: 2015-03-15 23:02:58 121.3770 121.4000 121.374 121.390
   4: 2015-03-15 23:03:59 121.3945 121.3945 121.379 121.389
   5: 2015-03-15 23:04:59 121.3910 121.4265 121.388 121.407
  ---                                                      
7182: 2015-03-20 22:54:59 120.0895 120.1560 120.086 120.135
7183: 2015-03-20 22:55:58 120.1370 120.1495 120.065 120.086
7184: 2015-03-20 22:56:58 120.0835 120.0915 120.052 120.053
7185: 2015-03-20 22:57:59 120.0740 120.0860 120.048 120.060
7186: 2015-03-20 22:58:59 120.0630 120.0760 120.032 120.043

[[4]]
                    index     open     high      low    close
   1: 2015-03-22 23:00:50 119.8820 119.8920 119.8810 119.8920
   2: 2015-03-22 23:01:48 119.8905 119.8980 119.8825 119.8940
   3: 2015-03-22 23:02:43 119.8970 119.8995 119.8935 119.8940
   4: 2015-03-22 23:03:13 119.8945 119.8950 119.8945 119.8945
   5: 2015-03-22 23:04:59 119.8940 119.8970 119.8900 119.8955
  ---                                                        
7192: 2015-03-27 22:55:59 119.1480 119.1750 119.1370 119.1450
7193: 2015-03-27 22:56:58 119.1505 119.1545 119.1360 119.1410
7194: 2015-03-27 22:57:59 119.1405 119.1500 119.1345 119.1420
7195: 2015-03-27 22:58:59 119.1390 119.1585 119.1195 119.1310
7196: 2015-03-27 22:59:00 119.1400 119.1400 119.1400 119.1400

[[5]]
                    index     open     high      low    close
   1: 2015-03-30 00:00:55 119.2260 119.2315 119.2200 119.2315
   2: 2015-03-30 00:01:49 119.2295 119.2365 119.2165 119.2165
   3: 2015-03-30 00:02:53 119.2170 119.2210 119.2165 119.2210
   4: 2015-03-30 00:03:42 119.2220 119.2250 119.2210 119.2240
   5: 2015-03-30 00:04:52 119.2210 119.2240 119.2100 119.2175
  ---                                                        
7131: 2015-04-03 23:55:42 118.9840 118.9915 118.9575 118.9605
7132: 2015-04-03 23:56:53 118.9750 118.9965 118.9510 118.9700
7133: 2015-04-03 23:57:59 118.9565 118.9610 118.9565 118.9595
7134: 2015-04-03 23:58:00 118.9575 118.9725 118.9575 118.9725
7135: 2015-04-03 23:59:00 118.9840 118.9845 118.9840 118.9845

[[6]]
                    index     open     high      low    close
   1: 2015-04-06 00:01:53 118.8950 118.8950 118.8890 118.8910
   2: 2015-04-06 00:02:56 118.8915 118.9180 118.8810 118.9050
   3: 2015-04-06 00:03:34 118.9160 118.9180 118.8865 118.8900
   4: 2015-04-06 00:04:56 118.8905 118.8910 118.8895 118.8895
   5: 2015-04-06 00:05:53 118.8895 118.8995 118.8415 118.8715
  ---                                                        
7168: 2015-04-10 23:55:58 120.3175 120.3195 120.2795 120.2830
7169: 2015-04-10 23:56:59 120.2835 120.2835 120.2365 120.2420
7170: 2015-04-10 23:57:57 120.2410 120.2420 120.1565 120.1865
7171: 2015-04-10 23:58:59 120.1855 120.1950 120.1730 120.1880
7172: 2015-04-10 23:59:00 120.2105 120.2175 120.2105 120.2175

[[7]]
                    index     open     high      low    close
   1: 2015-04-13 00:00:59 120.3420 120.3430 120.3420 120.3420
   2: 2015-04-13 00:01:16 120.3425 120.3485 120.3340 120.3395
   3: 2015-04-13 00:02:53 120.3370 120.3425 120.3300 120.3360
   4: 2015-04-13 00:03:21 120.3330 120.3330 120.3195 120.3195
   5: 2015-04-13 00:04:45 120.3185 120.3185 120.3095 120.3110
  ---                                                        
7163: 2015-04-17 23:55:59 118.9310 118.9435 118.9260 118.9435
7164: 2015-04-17 23:56:55 118.9440 118.9440 118.9255 118.9275
7165: 2015-04-17 23:57:59 118.9280 118.9375 118.9085 118.9335
7166: 2015-04-17 23:58:31 118.9345 118.9485 118.8960 118.9300
7167: 2015-04-17 23:59:03 118.9375 118.9375 118.9375 118.9375

[[8]]
                    index     open     high      low    close
   1: 2015-04-20 00:01:37 119.0200 119.0200 119.0200 119.0200
   2: 2015-04-20 00:02:33 119.0285 119.0340 119.0000 119.0105
   3: 2015-04-20 00:03:34 119.0100 119.0105 118.9995 119.0050
   4: 2015-04-20 00:04:48 119.0030 119.0060 118.9975 119.0060
   5: 2015-04-20 00:05:44 119.0080 119.0240 119.0045 119.0090
  ---                                                        
7181: 2015-04-24 23:55:58 119.0070 119.0140 118.9675 119.0140
7182: 2015-04-24 23:56:58 119.0130 119.0160 118.9770 118.9810
7183: 2015-04-24 23:57:59 118.9765 118.9765 118.9090 118.9160
7184: 2015-04-24 23:58:57 118.9110 118.9730 118.9050 118.9400
7185: 2015-04-24 23:59:06 118.9460 118.9750 118.9460 118.9705

[[9]]
                    index     open     high      low    close
   1: 2015-04-27 00:00:53 118.8850 118.8885 118.8815 118.8850
   2: 2015-04-27 00:01:56 118.8905 118.8905 118.8895 118.8895
   3: 2015-04-27 00:02:28 118.8900 118.8925 118.8875 118.8900
   4: 2015-04-27 00:04:57 118.8920 118.9080 118.8910 118.8950
   5: 2015-04-27 00:05:59 118.8910 118.9360 118.8905 118.8945
  ---                                                        
7164: 2015-05-01 23:55:58 120.1280 120.1800 120.1275 120.1790
7165: 2015-05-01 23:56:47 120.1800 120.1935 120.1800 120.1905
7166: 2015-05-01 23:57:57 120.1910 120.2075 120.1910 120.2035
7167: 2015-05-01 23:58:59 120.2040 120.2400 120.2010 120.2290
7168: 2015-05-01 23:59:00 120.2165 120.2165 120.2110 120.2110

[[10]]
                    index     open     high      low    close
   1: 2015-05-04 00:00:58 120.2400 120.2470 120.2270 120.2345
   2: 2015-05-04 00:01:47 120.2340 120.2345 120.2340 120.2340
   3: 2015-05-04 00:02:00 120.2345 120.2345 120.2345 120.2345
   4: 2015-05-04 00:03:36 120.2375 120.2480 120.2360 120.2390
   5: 2015-05-04 00:04:44 120.2405 120.2410 120.2390 120.2390
  ---                                                        
7167: 2015-05-08 23:55:59 119.7460 119.7565 119.7440 119.7565
7168: 2015-05-08 23:56:55 119.7570 119.7590 119.7560 119.7585
7169: 2015-05-08 23:57:53 119.7570 119.7595 119.7545 119.7570
7170: 2015-05-08 23:58:59 119.7585 119.7880 119.7405 119.7800
7171: 2015-05-08 23:59:02 119.7820 119.8205 119.7820 119.8205

I wondered if I can try to validate the start data point every single file and set an duration interval to make sure the data-point for a certain period is consistant, however the daily closing price will be another issue because I need to model both intraday and interday dataset. High Frequency GARCH: The multiplicative component GARCH (mcsGARCH) model use two independent dataset which are:

  • daily interval price (getSymbols())
  • 1min interval price (download from somewhere)

For me, in order to avoid the inconsistancy of daily closed price I need to solve the Daylight-Saving issue and fixed the 1440 (1440 mins per day) data-point per day.

Besides that, I use dataset% %>% msts(seasonal.periods=c(1440, 10080)) to set two seasonal effects for daily and weekly as refer to Frequency parameter and its impact on auto.arima results.

start <- mdy_hm("3-11-2017 5:21", tz = "US/Eastern")
end <- mdy_hm("3-12-2017 5:21", tz = "US/Eastern")

Since we’re dealing with elapsed time between two dates, let’s start with Intervals. We can define an Interval using the %--% operator.

time.interval <- start %--% end
time.interval
## [1] 2017-03-11 05:21:00 EST--2017-03-12 05:21:00 EDT

Notice how Intervals print. They show the beginng date and end date. And also notice how the time zone changes from EST to EDT indicating that Daylight Savings has started. If we look at the structure of an Interval object we see it contains elapsed time in seconds, 82800, and the start date.

Source : Working with dates and time in R using the lubridate package

Referece

convertOHLC <- function(mbase, combine = FALSE, mean = FALSE, 
                        tz = 'Europe/Athens', .unit = 'minute') {
  
  require('BBmisc')
  pkgs <- c('magrittr', 'lubridate', 'tidyverse', 'tidyquant')
  lib(pkgs)
  rm(pkgs)
  
  .unit2 <- .unit %>% str_replace_all('s$', '')
  
  # to.period()
  # Valid period character strings include: "seconds", "minutes", 
  #   "hours", "days", "weeks", "months", "quarters", and "quarters". 
  # These are calculated internally via endpoints. See that function's 
  #   help page for further details.
  
  # https://github.com/business-science/tidyquant/issues/92#issuecomment-426126637
  mbaseA <- suppressAll(
    mbase %>% 
      dplyr::select(DateTime, Ask) %>% 
      mutate(DateTime = .POSIXct(mdy_hms(DateTime), tz = tz)) %>% 
      tk_xts(.) %>% to.period(period = .unit) %>% tk_tbl() %>% 
      mutate(index = ceiling_date(index, .unit2)) %>% #floor_date
      dplyr::rename_all(str_replace_all, '\\.', '') %>% 
      dplyr::rename_all(tolower))
  
  nch <- mbaseA$index[1] %>% substr(nchar(.)+2, nchar(.)+3)
  mbaseA %<>% ## https://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects
      mutate(nch = nch, index = if_else(nch == '23', index + hours(1), index)) %>% 
      dplyr::select(-nch)
  
  mbaseB <- suppressAll(
    mbase %>% 
      dplyr::select(DateTime, Bid) %>% 
      mutate(DateTime = .POSIXct(mdy_hms(DateTime), tz = tz)) %>% 
      tk_xts(.) %>% to.period(period = .unit) %>% tk_tbl() %>% 
      mutate(index = ceiling_date(index, .unit2)) %>% #floor_date
      dplyr::rename_all(str_replace_all, '\\.', '') %>% 
      dplyr::rename_all(tolower))
  
  nch <- mbaseB$index[1] %>% substr(nchar(.)+2, nchar(.)+3)
  mbaseB %<>% ## https://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects
    mutate(nch = nch, index = if_else(nch == '23', index + hours(1), index)) %>% 
    dplyr::select(-nch)
  
  if (combine == TRUE) {
    
    if (mean == TRUE) {
      mbaseA %<>% cbind(Type = 'Ask', .)
      mbaseB %<>% cbind(Type = 'Bid', .)
      tmp <- rbind(mbaseA, mbaseB) %>% tbl_df %>% 
        dplyr::select(index, Type, open, high, low, close) %>% 
        arrange(index)
      
      tmp %<>% ddply(.(index), summarise, 
                     open = mean(open, na.rm=TRUE), 
                     high = mean(high, na.rm=TRUE), 
                     low = mean(low, na.rm=TRUE), 
                     close = mean(close, na.rm=TRUE)) %>% tbl_df
    } else {
      
      tmp <- data_frame(
        open = (mbaseA$open + mbaseB$open)/2, 
        high = mbaseB$high, 
        low = mbaseA$low, 
        close = (mbaseA$open + mbaseB$open)/2) %>% tbl_df
    }
    
  } else {
    tmp <- list(Ask = mbaseA, Bid = mbaseB)
  }
  
  return(tmp)
  }

Due to .POSIXct() unable detect daylight saving date efficiently, here I simply use nch <- mbaseA$index[1] %>% substr(nchar(.)+2, nchar(.)+3) to detect the first element datetime of every single week, and then use if_else (normal ifelse unable detect datetime class) to validate and add another hour onto the datetime of whole week dataset.

1 Like