Rolling MAX which ignoring some NAs

kdr · October 16, 2020, 7:36pm

This is my first posting, I apologize if I don't do this correctly.

I am trying to get the MAX discharge value for that day and previous day (e.g. 2, 4, 8, 16, 32 days) and place them each into a new column. I was able to create these columns using the rollmax with the mutate function. For example here is a subset of my data and what I was able to reproduce:

library(tidyverse)
library(dplyr)
library(zoo)
library(DataCombine)

## subset of data for example
daily_flow <- data.frame(
                date = c('1995-02-25', '1995-02-26', '1995-02-27', '1995-02-28', '1995-03-01', '1995-03-02', '1995-03-03', 
                         '1995-03-04', '1995-03-05', '1995-03-06', '1995-03-07', '1995-03-08', '1995-03-09','1995-03-10', 
                         '1995-03-11', '1995-03-12', '1995-03-13', '1995-03-14'),
                discharge = c(NA, NA, NA, NA, 0.58, 0.596, 0.604, 0.617, 0.63, 0.642, 0.657, 0.67, 0.683, 0.696, 0.71, 0.714, 0.74, 
                              0.756)
)


## determining max value for that day and previous days (2, 4, and 8 day)
daily_flow_prevdays <- daily_flow %>% 
                          mutate("2_d" = rollmax(x = discharge, 2, align = "right", fill = NA))

daily_flow_prevdays <- daily_flow_prevdays %>% 
                          mutate("4_d" = rollmax(x = discharge, 4, align = "right", fill = NA))

daily_flow_prevdays <- daily_flow_prevdays %>% 
                          mutate("8_d" = rollmax(x = discharge, 8, align = "right", fill = NA))

print(daily_flow_prevdays)

As you can see I was able to produce what I was asking for; however, I still want the initial rows of 2, 4, and 8 day columns to be filled with the max value of the previous cells even if there are no values prior to that date.

For example:
|1995-03-01|0.58|0.58|0.58|0.58|
|1995-03-02|0.596|0.596|0.596|0.596|
|1995-03-03|0.604|0.604|0.604|0.604|
|1995-03-04|0.617|0.617|0.617|0.617|
|1995-03-05|0.63|0.63|0.63|0.63|
|1995-03-06|0.642|0.642|0.642|0.642|
|1995-03-07|0.657|0.657|0.657|0.657|
|1995-03-08|0.67|0.67|0.67|0.67|
|1995-03-09|0.683|0.683|0.683|0.683|
|1995-03-10|0.696|0.696|0.696|0.696|
|1995-03-11|0.71|0.71|0.71|0.71|
|1995-03-12|0.714|0.714|0.714|0.714|
|1995-03-13|0.74|0.74|0.74|0.74|
|1995-03-14|0.756|0.756|0.756|0.756|
**Sorry not sure how to based present this table

This example does appear to be transferring the discharge value to each 2, 4, and 8 day row but since discharge in increasing as the days continue, each day here becomes the new MAX.

I tried using the rollapply function to see if it would not include the previous NAs but I was not successful (also not sure if I was inputting it correctly)

## For example, trying to figure out rollapply on previous 2 days -- but did not work

rollapply(daily_flow, width = 2, FUN = function(discharge) mean(discharge, na.rm = TRUE), by = 1, partial = TRUE, fill = NA,
          align = "right")

Does anyone know how I can replace those NAs with the MAX discharge value even when there is no values (just NAs) prior to that date? My dataset has over 24000 rows, so finding a code where I do all of this in one line, that would be awesome.

I hope this makes sense. Thank you for your help in advance!

siddharthprabhu · October 17, 2020, 6:53am

Not sure if I've understood correctly but is this what you're trying to do?

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
library(zoo, warn.conflicts = FALSE)
#> Warning: package 'zoo' was built under R version 3.6.3

## subset of data for example
daily_flow <- data.frame(
  date = c(
    "1995-02-25", "1995-02-26", "1995-02-27", "1995-02-28", "1995-03-01", "1995-03-02",
    "1995-03-03", "1995-03-04", "1995-03-05", "1995-03-06", "1995-03-07", "1995-03-08",
    "1995-03-09", "1995-03-10", "1995-03-11", "1995-03-12", "1995-03-13", "1995-03-14"
  ),
  discharge = c(
    NA, NA, NA, NA, 0.58, 0.596, 0.604, 0.617, 0.63, 0.642, 0.657, 0.67, 0.683,
    0.696, 0.71, 0.714, 0.74, 0.756
  )
)

## determining max value for that day and previous days (2, 4, and 8 day)
daily_flow_prevdays <- daily_flow %>%
  mutate("2_d" = rollmax(x = discharge, 2, align = "right", fill = NA))

daily_flow_prevdays <- daily_flow_prevdays %>%
  mutate("4_d" = rollmax(x = discharge, 4, align = "right", fill = NA))

daily_flow_prevdays <- daily_flow_prevdays %>%
  mutate("8_d" = rollmax(x = discharge, 8, align = "right", fill = NA))

daily_flow_prevdays <- daily_flow_prevdays %>%
  mutate(
    `2_d` = if_else(is.na(`2_d`), discharge, `2_d`),
    `4_d` = if_else(is.na(`4_d`), discharge, `4_d`),
    `8_d` = if_else(is.na(`8_d`), discharge, `8_d`)
  )

print(daily_flow_prevdays)
#>          date discharge   2_d   4_d   8_d
#> 1  1995-02-25        NA    NA    NA    NA
#> 2  1995-02-26        NA    NA    NA    NA
#> 3  1995-02-27        NA    NA    NA    NA
#> 4  1995-02-28        NA    NA    NA    NA
#> 5  1995-03-01     0.580 0.580 0.580 0.580
#> 6  1995-03-02     0.596 0.596 0.596 0.596
#> 7  1995-03-03     0.604 0.604 0.604 0.604
#> 8  1995-03-04     0.617 0.617 0.617 0.617
#> 9  1995-03-05     0.630 0.630 0.630 0.630
#> 10 1995-03-06     0.642 0.642 0.642 0.642
#> 11 1995-03-07     0.657 0.657 0.657 0.657
#> 12 1995-03-08     0.670 0.670 0.670 0.670
#> 13 1995-03-09     0.683 0.683 0.683 0.683
#> 14 1995-03-10     0.696 0.696 0.696 0.696
#> 15 1995-03-11     0.710 0.710 0.710 0.710
#> 16 1995-03-12     0.714 0.714 0.714 0.714
#> 17 1995-03-13     0.740 0.740 0.740 0.740
#> 18 1995-03-14     0.756 0.756 0.756 0.756

^{Created on 2020-10-17 by the reprex package (v0.3.0)}

kdr · October 17, 2020, 1:48pm

Sorry, I must have not been clear enough. I will try to explain better.

The code that you send is not exactly what I'm trying to derive. I don't want to read the values in the discharge column, I want to still find the MAX discharge for each dates though the moving window even though there isn't rows/values for the window to read prior to that date.

For example, if I have a daily discharge value with these dates (below), some days have lower discharge rates and the previous day(s).

daily_flow2 <- data.frame(
  date = c('1996-03-15','1996-03-16','1996-03-17','1996-03-18','1996-03-19','1996-03-20','1996-03-21','1996-03-22','1996-03-23','1996-03-24','1996-03-25','1996-03-26','1996-03-27','1996-03-28','1996-03-29','1996-03-30','1996-03-31'),
  discharge = c(0.518,0.526,0.519,0.515,0.509,0.499,0.491,0.482,0.475,0.468,0.461,0.455,0.448,0.446,0.442,0.438,0.434)
)

daily_flow_prevdays2 <- daily_flow2 %>% 
                          mutate("2_d" = rollmax(x = discharge, 2, align = "right", fill = NA))

daily_flow_prevdays2 <- daily_flow_prevdays2 %>% 
                          mutate("4_d" = rollmax(x = discharge, 4, align = "right", fill = NA))

daily_flow_prevdays2 <- daily_flow_prevdays2 %>% 
                          mutate("8_d" = rollmax(x = discharge, 8, align = "right", fill = NA))

print(daily_flow_prevdays2)

Using the code you provided, it will just fill the NAs with that day discharge and not the MAX discharge over the previous days, looking like so below...

daily_flow_prevdays_fill <- daily_flow_prevdays2 %>%
  mutate(
    `2_d` = if_else(is.na(`2_d`), discharge, `2_d`),
    `4_d` = if_else(is.na(`4_d`), discharge, `4_d`),
    `8_d` = if_else(is.na(`8_d`), discharge, `8_d`)
  )

print(daily_flow_prevdays_fill)

Just so it is visually easier to see what I would like my output to look like, I created a data frame for it. These values were determined using Excel. Notice how, for example, date 1996-03-17 has a discharge of 0.519 but in the 4_d and 8_d column, the values is 0.526. This is because the previous date had a higher discharge rate than that day.

daily_flow_wanted_output <- data.frame(
  date = c('1996-03-15','1996-03-16','1996-03-17','1996-03-18','1996-03-19','1996-03-20','1996-03-21','1996-03-22','1996-03-23','1996-03-24','1996-03-25','1996-03-26','1996-03-27','1996-03-28','1996-03-29','1996-03-30','1996-03-31'),
  discharge = c(0.518,0.526,0.519,0.515,0.509,0.499,0.491,0.482,0.475,0.468,0.461,0.455,0.448,0.446,0.442,0.438,0.434),
  "2_d" = c(0.518,0.526,0.526,0.519,0.515,0.509,0.499,0.491,0.482,0.475,0.468,0.461,0.455,0.448,0.446,0.442,0.438),
  "4_d" = c(0.518,0.526,0.526,0.526,0.526,0.519,0.515,0.509,0.499,0.491,0.482,0.475,0.468,0.461,0.455,0.448,0.446),
  "8_d" = c(0.518,0.526,0.526,0.526,0.526,0.526,0.526,0.526,0.526,0.519,0.515,0.509,0.499,0.491,0.482,0.475,0.468)
)

print(daily_flow_wanted_output)

Is something like this possible to achieve, so I can still determine the MAX values even if there isn't enough days previous for the moving window or to somehow ignore the previous days NAs?

I hope this makes more sense.

siddharthprabhu · October 17, 2020, 5:47pm

Thank you for the additional explanation. I had misunderstood your objective.

I believe this is what you are looking for. I have applied it to your second sample data frame for easier verification.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
library(zoo, warn.conflicts = FALSE)
#> Warning: package 'zoo' was built under R version 3.6.3

daily_flow2 <- data.frame(
  date = c("1996-03-15", "1996-03-16", "1996-03-17", "1996-03-18", "1996-03-19", "1996-03-20", "1996-03-21", "1996-03-22", "1996-03-23", "1996-03-24", "1996-03-25", "1996-03-26", "1996-03-27", "1996-03-28", "1996-03-29", "1996-03-30", "1996-03-31"),
  discharge = c(0.518, 0.526, 0.519, 0.515, 0.509, 0.499, 0.491, 0.482, 0.475, 0.468, 0.461, 0.455, 0.448, 0.446, 0.442, 0.438, 0.434)
)

daily_flow2_output <- daily_flow2 %>%
  mutate(`2_d` = rollapplyr(discharge, 2L, max, fill = NA, partial = TRUE)) %>%
  mutate(`4_d` = rollapplyr(discharge, 4L, max, fill = NA, partial = TRUE)) %>%
  mutate(`8_d` = rollapplyr(discharge, 8L, max, fill = NA, partial = TRUE))

print(daily_flow2_output)
#>          date discharge   2_d   4_d   8_d
#> 1  1996-03-15     0.518 0.518 0.518 0.518
#> 2  1996-03-16     0.526 0.526 0.526 0.526
#> 3  1996-03-17     0.519 0.526 0.526 0.526
#> 4  1996-03-18     0.515 0.519 0.526 0.526
#> 5  1996-03-19     0.509 0.515 0.526 0.526
#> 6  1996-03-20     0.499 0.509 0.519 0.526
#> 7  1996-03-21     0.491 0.499 0.515 0.526
#> 8  1996-03-22     0.482 0.491 0.509 0.526
#> 9  1996-03-23     0.475 0.482 0.499 0.526
#> 10 1996-03-24     0.468 0.475 0.491 0.519
#> 11 1996-03-25     0.461 0.468 0.482 0.515
#> 12 1996-03-26     0.455 0.461 0.475 0.509
#> 13 1996-03-27     0.448 0.455 0.468 0.499
#> 14 1996-03-28     0.446 0.448 0.461 0.491
#> 15 1996-03-29     0.442 0.446 0.455 0.482
#> 16 1996-03-30     0.438 0.442 0.448 0.475
#> 17 1996-03-31     0.434 0.438 0.446 0.468

^{Created on 2020-10-17 by the reprex package (v0.3.0)}

kdr · October 18, 2020, 2:18am

I think we are getting closer. This worked for the top (first year) of my dataset but it did not produce this outcome at the beginning of any of the following years; it still gave me the same outcome as when I used the 'rollmax' function. Since the discharge period for my dataset runs from March to end of October, I have NAs from November to end of February. I would prefer not to separate by data into individual years then remerge it together to accomplish this. I have about 7 decades worth of data I am trying to do this for.

So here is a dataset that has NA in the middle of it. Using the code you provided, the top dates get filled but the dates following a large chunk of NAs, do not get filled like at the top.

df <- data.frame(
  date = c('1996-03-01','1996-03-02','1996-03-03','1996-03-04','1996-03-05','1996-03-06','1996-03-07','1996-03-08','1996-03-09','1996-03-10','1996-03-11','1996-03-12','1996-03-13','1996-03-14','1996-03-15','1996-03-16','1996-03-17','1996-03-18','1996-03-19','1996-03-20','1996-03-21','1996-03-22','1996-03-23','1996-03-24','1996-03-25','1996-03-26','1996-03-27','1996-03-28','1996-03-29','1996-03-30','1996-03-31','1996-04-01','1996-04-02','1996-04-03','1996-04-04','1996-04-05','1996-04-06','1996-04-07','1996-04-08','1996-04-09','1996-04-10','1996-04-11','1996-04-12','1996-04-13','1996-04-14','1996-04-15','1996-04-16','1996-04-17','1996-04-18','1996-04-19','1996-04-20','1996-04-21','1996-04-22','1996-04-23','1996-04-24','1996-04-25','1996-04-26','1996-04-27','1996-04-28','1996-04-29','1996-04-30'),
  discharge = c(0.236,0.241,0.254,0.322,0.363,0.42,0.463,0.515,0.506,0.497,0.484,0.464,0.478,0.495,0.518,0.526,0.519,0.515,0.509,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4.13,8.41,8.62,8.67,8.69,8.8,8.87,9.56,10.1,10.4,10.5,10.7,10.8,10.9,11,12.7,18.1,27.3,34.2,63.3)
)


df_prev <- df %>%
  mutate(`2_d` = rollapplyr(discharge, 2L, max, fill = NA, partial = TRUE)) %>%
  mutate(`4_d` = rollapplyr(discharge, 4L, max, fill = NA, partial = TRUE)) %>%
  mutate(`8_d` = rollapplyr(discharge, 8L, max, fill = NA, partial = TRUE))

print(df_prev)

Now my goal is to have an outcome where at the beginning of each year is filled with a MAX values even though there are NAs prior to the first discharge value of the year, (producing similar results that you were able to provide me in your last reply). Below is an example of what I am trying to accomplish when there are NA values between each year; however, below I put NAs in the middle of two month just to give an example of what I mean (I am just showing the result of the "4_d" and "8_d" output.

df_wanted_output <- data.frame(
  date = c('1996-03-01','1996-03-02','1996-03-03','1996-03-04','1996-03-05','1996-03-06','1996-03-07','1996-03-08','1996-03-09','1996-03-10','1996-03-11','1996-03-12','1996-03-13','1996-03-14','1996-03-15','1996-03-16','1996-03-17','1996-03-18','1996-03-19','1996-03-20','1996-03-21','1996-03-22','1996-03-23','1996-03-24','1996-03-25','1996-03-26','1996-03-27','1996-03-28','1996-03-29','1996-03-30','1996-03-31','1996-04-01','1996-04-02','1996-04-03','1996-04-04','1996-04-05','1996-04-06','1996-04-07','1996-04-08','1996-04-09','1996-04-10','1996-04-11','1996-04-12','1996-04-13','1996-04-14','1996-04-15','1996-04-16','1996-04-17','1996-04-18','1996-04-19','1996-04-20','1996-04-21','1996-04-22','1996-04-23','1996-04-24','1996-04-25','1996-04-26','1996-04-27','1996-04-28','1996-04-29','1996-04-30'),
  discharge = c(0.236,0.241,0.254,0.322,0.363,0.42,0.463,0.515,0.506,0.497,0.484,0.464,0.478,0.495,0.518,0.526,0.519,0.515,0.509,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4.13,8.41,8.62,8.67,8.69,8.8,8.87,9.56,10.1,10.4,10.5,10.7,10.8,10.9,11,12.7,18.1,27.3,34.2,63.3),
  "4_d" = c(0.236,0.241,0.254,0.322,0.363,0.42,0.463,0.515,0.515,0.515,0.515,0.506,0.497,0.495,0.518,0.526,0.526,0.526,0.526,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4.13,8.41,8.62,8.67,8.69,8.8,8.87,9.56,10.1,10.4,10.5,10.7,10.8,10.9,11,12.7,18.1,27.3,34.2,63.3),
  "8_d" = c(0.236,0.241,0.254,0.322,0.363,0.42,0.463,0.515,0.515,0.515,0.515,0.515,0.515,0.515,0.518,0.526,0.526,0.526,0.526,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4.13,8.41,8.62,8.67,8.69,8.8,8.87,9.56,10.1,10.4,10.5,10.7,10.8,10.9,11,12.7,18.1,27.3,34.2,63.3)
)

print(df_wanted_output)

Is there anyway this can be accomplished?

Also, just so I can better understand what you did in your previous reply...

The 'warn.conflict = FALSE' when loading the package, does this allow for the NAs to be ignored?
What is the reasoning for the width argument to have an 'L' after the number?

siddharthprabhu · October 19, 2020, 5:18pm

I don't see a way to fill the values after the middle chunk of NAs in the same function call. A workaround would be to temporarily filter out those values, compute rolling max and then bring the removed values back using a join (as shown below). I'm not sure whether this is an acceptable solution for you.

library(dplyr, warn.conflicts = FALSE)
library(zoo, warn.conflicts = FALSE)

df <- data.frame(
  date = c("1996-03-01", "1996-03-02", "1996-03-03", "1996-03-04", "1996-03-05", "1996-03-06", "1996-03-07", "1996-03-08", "1996-03-09", "1996-03-10", "1996-03-11", "1996-03-12", "1996-03-13", "1996-03-14", "1996-03-15", "1996-03-16", "1996-03-17", "1996-03-18", "1996-03-19", "1996-03-20", "1996-03-21", "1996-03-22", "1996-03-23", "1996-03-24", "1996-03-25", "1996-03-26", "1996-03-27", "1996-03-28", "1996-03-29", "1996-03-30", "1996-03-31", "1996-04-01", "1996-04-02", "1996-04-03", "1996-04-04", "1996-04-05", "1996-04-06", "1996-04-07", "1996-04-08", "1996-04-09", "1996-04-10", "1996-04-11", "1996-04-12", "1996-04-13", "1996-04-14", "1996-04-15", "1996-04-16", "1996-04-17", "1996-04-18", "1996-04-19", "1996-04-20", "1996-04-21", "1996-04-22", "1996-04-23", "1996-04-24", "1996-04-25", "1996-04-26", "1996-04-27", "1996-04-28", "1996-04-29", "1996-04-30"),
  discharge = c(0.236, 0.241, 0.254, 0.322, 0.363, 0.42, 0.463, 0.515, 0.506, 0.497, 0.484, 0.464, 0.478, 0.495, 0.518, 0.526, 0.519, 0.515, 0.509, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.13, 8.41, 8.62, 8.67, 8.69, 8.8, 8.87, 9.56, 10.1, 10.4, 10.5, 10.7, 10.8, 10.9, 11, 12.7, 18.1, 27.3, 34.2, 63.3)
)

df_prev <- df %>%
  filter(!is.na(discharge)) %>% 
  mutate(`2_d` = rollapplyr(discharge, 2L, max, fill = NA, partial = TRUE)) %>%
  mutate(`4_d` = rollapplyr(discharge, 4L, max, fill = NA, partial = TRUE)) %>%
  mutate(`8_d` = rollapplyr(discharge, 8L, max, fill = NA, partial = TRUE)) %>% 
  right_join(df, by = c("date", "discharge")) %>% 
  arrange(date)

print(df_prev)
#>          date discharge    2_d    4_d    8_d
#> 1  1996-03-01     0.236  0.236  0.236  0.236
#> 2  1996-03-02     0.241  0.241  0.241  0.241
#> 3  1996-03-03     0.254  0.254  0.254  0.254
#> 4  1996-03-04     0.322  0.322  0.322  0.322
#> 5  1996-03-05     0.363  0.363  0.363  0.363
#> 6  1996-03-06     0.420  0.420  0.420  0.420
#> 7  1996-03-07     0.463  0.463  0.463  0.463
#> 8  1996-03-08     0.515  0.515  0.515  0.515
#> 9  1996-03-09     0.506  0.515  0.515  0.515
#> 10 1996-03-10     0.497  0.506  0.515  0.515
#> 11 1996-03-11     0.484  0.497  0.515  0.515
#> 12 1996-03-12     0.464  0.484  0.506  0.515
#> 13 1996-03-13     0.478  0.478  0.497  0.515
#> 14 1996-03-14     0.495  0.495  0.495  0.515
#> 15 1996-03-15     0.518  0.518  0.518  0.518
#> 16 1996-03-16     0.526  0.526  0.526  0.526
#> 17 1996-03-17     0.519  0.526  0.526  0.526
#> 18 1996-03-18     0.515  0.519  0.526  0.526
#> 19 1996-03-19     0.509  0.515  0.526  0.526
#> 20 1996-03-20        NA     NA     NA     NA
#> 21 1996-03-21        NA     NA     NA     NA
#> 22 1996-03-22        NA     NA     NA     NA
#> 23 1996-03-23        NA     NA     NA     NA
#> 24 1996-03-24        NA     NA     NA     NA
#> 25 1996-03-25        NA     NA     NA     NA
#> 26 1996-03-26        NA     NA     NA     NA
#> 27 1996-03-27        NA     NA     NA     NA
#> 28 1996-03-28        NA     NA     NA     NA
#> 29 1996-03-29        NA     NA     NA     NA
#> 30 1996-03-30        NA     NA     NA     NA
#> 31 1996-03-31        NA     NA     NA     NA
#> 32 1996-04-01        NA     NA     NA     NA
#> 33 1996-04-02        NA     NA     NA     NA
#> 34 1996-04-03        NA     NA     NA     NA
#> 35 1996-04-04        NA     NA     NA     NA
#> 36 1996-04-05        NA     NA     NA     NA
#> 37 1996-04-06        NA     NA     NA     NA
#> 38 1996-04-07        NA     NA     NA     NA
#> 39 1996-04-08        NA     NA     NA     NA
#> 40 1996-04-09        NA     NA     NA     NA
#> 41 1996-04-10        NA     NA     NA     NA
#> 42 1996-04-11     4.130  4.130  4.130  4.130
#> 43 1996-04-12     8.410  8.410  8.410  8.410
#> 44 1996-04-13     8.620  8.620  8.620  8.620
#> 45 1996-04-14     8.670  8.670  8.670  8.670
#> 46 1996-04-15     8.690  8.690  8.690  8.690
#> 47 1996-04-16     8.800  8.800  8.800  8.800
#> 48 1996-04-17     8.870  8.870  8.870  8.870
#> 49 1996-04-18     9.560  9.560  9.560  9.560
#> 50 1996-04-19    10.100 10.100 10.100 10.100
#> 51 1996-04-20    10.400 10.400 10.400 10.400
#> 52 1996-04-21    10.500 10.500 10.500 10.500
#> 53 1996-04-22    10.700 10.700 10.700 10.700
#> 54 1996-04-23    10.800 10.800 10.800 10.800
#> 55 1996-04-24    10.900 10.900 10.900 10.900
#> 56 1996-04-25    11.000 11.000 11.000 11.000
#> 57 1996-04-26    12.700 12.700 12.700 12.700
#> 58 1996-04-27    18.100 18.100 18.100 18.100
#> 59 1996-04-28    27.300 27.300 27.300 27.300
#> 60 1996-04-29    34.200 34.200 34.200 34.200
#> 61 1996-04-30    63.300 63.300 63.300 63.300

^{Created on 2020-10-19 by the reprex package (v0.3.0)}

The warn.conflicts = FALSE has nothing to do with the solution. I'm just preventing dplyr and zoo from printing the conflict warnings that get generated when the packages are loaded. This limits unnecessary output in the reprex.

The L suffix is used to denote integers in R. This is just good practice in my opinion since the width parameter of rollapply() is designed to take integers as inputs for the rolling window size.

The partial = TRUE argument to rollapply() is what's doing the magic here. It lets the function max() compute rolling values even when the window size is smaller than the specified width (such as at the top of your data set) when there aren't enough prior observations of discharge.

kdr · October 19, 2020, 7:56pm

No worries. I will probably just use excel for this and export it as a csv then import it into R. Thank you to trying and thank you for the explanation!

kdr · October 22, 2020, 5:49pm

For anyone, that is interested in a solution to find MAX values using a moving window while ignoring NA , here is a wrap around.

df <- data.frame(
  date = c('1996-03-01','1996-03-02','1996-03-03','1996-03-04','1996-03-05','1996-03-06','1996-03-07','1996-03-08','1996-03-09','1996-03-10','1996-03-11','1996-03-12','1996-03-13','1996-03-14','1996-03-15','1996-03-16','1996-03-17','1996-03-18','1996-03-19','1996-03-20','1996-03-21','1996-03-22','1996-03-23','1996-03-24','1996-03-25','1996-03-26','1996-03-27','1996-03-28','1996-03-29','1996-03-30','1996-03-31','1996-04-01','1996-04-02','1996-04-03','1996-04-04','1996-04-05','1996-04-06','1996-04-07','1996-04-08','1996-04-09','1996-04-10','1996-04-11','1996-04-12','1996-04-13','1996-04-14','1996-04-15','1996-04-16','1996-04-17','1996-04-18','1996-04-19','1996-04-20','1996-04-21','1996-04-22','1996-04-23','1996-04-24','1996-04-25','1996-04-26','1996-04-27','1996-04-28','1996-04-29','1996-04-30'),
  discharge = c(0.236,0.241,0.254,0.322,0.363,0.42,0.463,0.515,0.506,0.497,0.484,0.464,0.478,0.495,0.518,0.526,0.519,0.515,0.509,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4.13,8.41,8.62,8.61,8.56,8.68,8.87,9.56,10.1,10.4,10.5,10.7,10.8,10.9,11,12.7,18.1,27.3,34.2,63.3)
)



df_prev <- df %>%
  mutate(`2_d` = rollapplyr(discharge, 2L, max, na.rm = TRUE, fill = NA, partial = TRUE)) %>%
  mutate(`4_d` = rollapplyr(discharge, 4L, max, na.rm = TRUE, fill = NA, partial = TRUE)) %>%
  mutate(`8_d` = rollapplyr(discharge, 8L, max, na.rm = TRUE, fill = NA, partial = TRUE))

# na.rm = TRUE removes NA and replaces them with -inf

# replacing -inf back to NA
df_prev[sapply(df_prev, is.infinite)] <- NA

system · October 29, 2020, 5:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.