How to dynamically create a new `group` column according to the date?

I am trying to summarise data on a rolling basis depending on what I input.

For example, I could have the following input grouping_value <- 7 which I would then hope to have the earliest Date in my data frame group by. So if my earliest Date is 2021-01-01 then I would hope that a new column would be mutated like Group_2021-01-01_to_Group_2021-01-08 and the next group would be Group_2021-01-09_to_Group_2021-01-16 and so on. It would group according to the grouping_value.

Raw data:

ID Date Sales
1 2021-01-01 22
1 2022-01-03 32
1 2021-01-11 42
1 2022-01-16 32
1 2022-01-13 27
2 2022-01-12 42

grouping_value <- 7

This grouping_value would take the earliest date of 2021-01-01 and add 7 to get the group until it has cycled through the whole dataframe. I could also change the group_value <- 30 to group according to 30 days theoretically .

earliest_date_in_data <-
  data%>%
  summarise(min_Date = min(Date, na.rm = TRUE))

Group Added:

ID Date Sales Group
1 2021-01-01 22 Group_2021-01-01_to_Group_2021-01-08
1 2021-01-03 32 Group_2021-01-01_to_Group_2021-01-08
1 2021-01-11 42 Group_2021-01-09_to_Group_2021-01-16
1 2021-01-16 32 Group_2021-01-09_to_Group_2021-01-16
1 2021-01-13 27 Group_2021-01-09_to_Group_2021-01-16
2 2021-01-12 48 Group_2021-01-09_to_Group_2021-01-16

I could then group_by this Group.
The only thing that has occurred to me is to do a case_when() to group by dates but that would lack the dynamic element. I'd appreciate any help or ideas. Thank you

Here is a solution with two differences from your request.

  1. The Group labels are simply numbers, not the text you show.
  2. When the grouping_value is 7, the groups have 7 days. In your example, a value of 7 leads to groups of 8 days, e.g. Jan. 1 - Jan. 8. Is that really what you want?

I did not drop the DateDelta column because I think it is easier to see how the groups are formed if that is present. It can easily be dropped.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
StartDay <- ymd("2022-01-01")
EndDay <- ymd("2022-02-10")
DF <- data.frame(Date = seq.Date(from = StartDay,
                                to = EndDay, by = 5))
DF
#>         Date
#> 1 2022-01-01
#> 2 2022-01-06
#> 3 2022-01-11
#> 4 2022-01-16
#> 5 2022-01-21
#> 6 2022-01-26
#> 7 2022-01-31
#> 8 2022-02-05
#> 9 2022-02-10

GroupFunc <- function(DAT, GrpVal) {
  DAT <- DAT |> mutate(DateDelta = as.numeric(Date - min(Date)),
                       Group = DateDelta %/% GrpVal)
} 
NewDF <- GroupFunc(DF, 7)
NewDF
#>         Date DateDelta Group
#> 1 2022-01-01         0     0
#> 2 2022-01-06         5     0
#> 3 2022-01-11        10     1
#> 4 2022-01-16        15     2
#> 5 2022-01-21        20     2
#> 6 2022-01-26        25     3
#> 7 2022-01-31        30     4
#> 8 2022-02-05        35     5
#> 9 2022-02-10        40     5

Created on 2022-03-08 by the reprex package (v2.0.1)

Thank you so much FJCC! Yes, you're right, I wanted it to be 7 and not 8. I think |> is a base R pipe, but could I ask what %/% is?

The %/% operator is for integer division. It returns just the integer part of the division calculation. So 4 %/% 5 = 0 because 4/5 = 0.8 and only the zero is kept. It is very useful for grouping things.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.