Completing the dataset for rolling calculations

Enuma · August 20, 2020, 8:43am

Hi. Consider a time-series dataset d with two columns value (observed value) and datetime (time of observation):

library(dplyr)

d = tibble(value = c(2, 5, 6, 3, 8), datetime = c("2020-08-20 10:10:10", "2020-08-20 10:10:12", "2020-08-20 10:10:16", "2020-08-20 10:10:17", "2020-08-20 10:10:22"))

I would like to perform rolling calculations on this dataset based on the datetime column. Let's say I want to create another column with a mean value with a rolling window of 5sec. What is the most efficient way to perform this?

So far I came up with a three-step solution - first extract 1st and last datetime value. Then create a datetime vector containing every second between 1st and last value. Then join it with the original dataset & perform rolling calcs. Is there any way how to perform all this in a single %>%? Or is there even an more elegant solution to this?

Thanks.

siddharthprabhu · August 20, 2020, 9:31am

Here's an approach using tidyr::complete() to fill the series and zoo::rollmeanr() to compute mean across the rolling window of 5 observations.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(zoo, warn.conflicts = FALSE)

d <- tibble(
  value = c(2, 5, 6, 3, 8),
  datetime = c(
    "2020-08-20 10:10:10", "2020-08-20 10:10:12", "2020-08-20 10:10:16",
    "2020-08-20 10:10:17", "2020-08-20 10:10:22"
  )
)

d %>%
  mutate(datetime = as.POSIXct(datetime)) %>%
  complete(datetime = seq.POSIXt(min(datetime), max(datetime), by = "sec")) %>%
  mutate(rolling_mean = rollmeanr(value, k = 5, fill = NA, na.rm = TRUE))
#> # A tibble: 13 x 3
#>    datetime            value rolling_mean
#>    <dttm>              <dbl>        <dbl>
#>  1 2020-08-20 10:10:10     2         NA  
#>  2 2020-08-20 10:10:11    NA         NA  
#>  3 2020-08-20 10:10:12     5         NA  
#>  4 2020-08-20 10:10:13    NA         NA  
#>  5 2020-08-20 10:10:14    NA          3.5
#>  6 2020-08-20 10:10:15    NA          5  
#>  7 2020-08-20 10:10:16     6          5.5
#>  8 2020-08-20 10:10:17     3          4.5
#>  9 2020-08-20 10:10:18    NA          4.5
#> 10 2020-08-20 10:10:19    NA          4.5
#> 11 2020-08-20 10:10:20    NA          4.5
#> 12 2020-08-20 10:10:21    NA          3  
#> 13 2020-08-20 10:10:22     8          8

^{Created on 2020-08-20 by the reprex package (v0.3.0)}

Enuma · August 20, 2020, 11:04am

Thank you. Works like a charm.

system · August 27, 2020, 11:04am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.