dates missing while changing 5-minute data to 4-minute data

Hi,

I have a dataset (df1) with 5-minute intervals and I want to convert it into 4-minute data.
data looks like

   date                   CO
   <dttm>              <dbl>
 1 2019-05-01 00:00:00   246
 2 2019-05-01 00:05:00   234
 3 2019-05-01 00:10:00   235
 4 2019-05-01 00:15:00   247
 5 2019-05-01 00:20:00   256
 6 2019-05-01 00:25:00   269
 7 2019-05-01 00:30:00   249
 8 2019-05-01 00:35:00   242
 9 2019-05-01 00:40:00   229
10 2019-05-01 00:45:00   234
11 2019-05-01 00:50:00   226
12 2019-05-01 00:55:00   222
13 2019-05-01 01:00:00   238
14 2019-05-01 01:05:00   232
15 2019-05-01 01:10:00   280

For converting/averging into 4-minutes, I am using the code

df1 %>% 
  arrange(date) %>% 
  distinct(date, .keep_all= TRUE) %>%   # remove duplicate rows based on header
  mutate(hour = floor_date(date,'4 minute')) %>%  
  group_by(hour) %>% 
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) 

But, after applying this code, I am missing some dates on e.g. 00:36:00 in the data

# A tibble: 26,496 × 2
   hour                   CO
   <dttm>              <dbl>
 1 2019-05-01 00:00:00   246
 2 2019-05-01 00:04:00   234
 3 2019-05-01 00:08:00   235
 4 2019-05-01 00:12:00   247
 5 2019-05-01 00:20:00   256
 6 2019-05-01 00:24:00   269
 7 2019-05-01 00:28:00   249
 8 2019-05-01 00:32:00   242
 9 2019-05-01 00:40:00   229
10 2019-05-01 00:44:00   234

Please let me know how to merge missing dates

data can be found here Dropbox - NCore_CO_5min.csv - Simplify your life

Thanks

Hi Kunal,

What are you expecting to see in the 00:36:00 row?
It looks like the 00:35:00 row in your 5-minute-data is correctly rounding down to 00:32:00, and then 00:40:00 stays at 00:40:00 since that is divisible by 4, so it wouldn't "round down" to 00:36:00. At the moment, your code doesn't know that you want a "00:36:00" interval, and it wouldn't know what value to put in the "CO" column if there was one.
Are you wanting to take the average of the 00:35:00 row and the 00:40:00 row and put that in the 00:36:00 row to kind of "smooth out" the interval?
Sorry for answering your question with more questions - this is my first attempt at an answer! But hopefully it gives you something useful to think about until someone more expert comes along.
Cheers and good luck!

1 Like

Thanks for your response.

In simple words:
I have data in 5 minutes intervals. I want to change it to 4 minutes.
It looks like my code did not do that.

This may or may not be a bad idea to do at all, at your own risk.

library(tidyverse)


(start_df <- structure(list(date = structure(c(
  1556665200, 1556665500, 1556665800,
  1556666100, 1556666400, 1556666700, 1556667000, 1556667300, 1556667600,
  1556667900, 1556668200, 1556668500, 1556668800, 1556669100, 1556669400
), class = c("POSIXct", "POSIXt"), tzone = ""), val = c(
  246,
  234, 235, 247, 256, 269, 249, 242, 229, 234, 226, 222, 238, 232,
  280
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame")))


(expanded_df <- expand(start_df, date = full_seq(date, 60 # 60 seconds, i.e. get every minute
                                                 )) |>
    left_join(start_df) |> 
    fill(val) |> mutate(minutes=minute(date),
                        minute_4 = minutes %% 4 == 0))


(just_4_df <- filter(expanded_df,
                    minute_4) |> select(date,val))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.