How to build a time-series data frame from start and end dates?

Hi! I want to build a time-series data frame. I have several observations and start and end dates. I want to create additional observations disaggregated by week. For instance:

I have:
id1 01/01/2000 31/12/2000

I want to get:
id1 2000-01
id1 2000-02
id1 2000-03
...
id1 2000-52

Thanks a lot for the help!

Hi @hgarbino

# Try with this;
start_date <- as.Date("2000-01-01")
end_date <- as.Date("2000-12-31")


date_seq <- seq(from = start_date, to = end_date, by = "week")

df <- data.frame(
  id = "id1",
  date = date_seq,
  stringsAsFactors = FALSE )

df$year_week <- format(df$date, "%Y-%U")

# id       date year_week
# 1 id1 2000-01-01   2000-00
# 2 id1 2000-01-08   2000-01
# 3 id1 2000-01-15   2000-02
# 4 id1 2000-01-22   2000-03
# 5 id1 2000-01-29   2000-04
# 6 id1 2000-02-05   2000-05

1 Like

That looks like a great start, thanks a lot! Do you know how I would be able to apply that to all my observations? I have more than 10,000 observations, all of them with different start and end dates.

Thanks!!

For make a good response is better see the reproducible example of data:

Im could suggest you check the lubridate library for get the year_week format from date data.

# But if you data frame is df, this code help you. If all 10.000 observation have this format 2000-01-01 this code # run well
df$year_week <- format(df$date, "%Y-%U")
1 Like

Muito obrigado pela ajuda, @M_AcostaCH!

My problem is that different observations have different values for seq.

A minimal reproducible example would be this:

grid <- c(1, 2, 3, 4)
name <- c('A', 'A', 'B', 'B')
start_date <- c('26/12/2022','26/12/2022','23/01/2023','23/01/2023')
end_date <- c('22/01/2023', '22/01/2023', '05/02/2023', '05/02/2023')
df <- data.frame(grid, name, start_date, end_date)
df

  grid name start_date   end_date
1    1    A 26/12/2022 22/01/2023
2    2    A 26/12/2022 22/01/2023
3    3    B 23/01/2023 05/02/2023
4    4    B 23/01/2023 05/02/2023

I want to transform this into:

  grid name year_week
1    1    A 2022-52
2    1    A 2023-01
3    1    A 2023-02
4    1    A 2023-03
5    2    A 2022-52
6    2    A 2023-01
7    2    A 2023-02
8    2    A 2023-03
9    3    B 2023-04
10   3    B 2023-05
11   4    B 2023-04
12   4    B 2023-05

Thank you!

grid <- c(1, 2, 3, 4)
name <- c('A', 'A', 'B', 'B')
start_date <- c('26/12/2022','26/12/2022','23/01/2023','23/01/2023')
end_date <- c('22/01/2023', '22/01/2023', '05/02/2023', '05/02/2023')
df <- data.frame(grid, name, start_date, end_date)
df
#>   grid name start_date   end_date
#> 1    1    A 26/12/2022 22/01/2023
#> 2    2    A 26/12/2022 22/01/2023
#> 3    3    B 23/01/2023 05/02/2023
#> 4    4    B 23/01/2023 05/02/2023

df$start_date <- as.Date(df$start_date, format = "%d/%m/%Y")
df$end_date <- as.Date(df$end_date, format = "%d/%m/%Y")


df_long <- pivot_longer(df, cols = c(start_date, end_date), names_to = "date_type", values_to = "date")

df_long$date2 <- format(df_long$date, "%Y-%U")

df_long

# grid  name   date_type      date    date2  
#   <dbl>      <chr> <chr>  <date>     <chr>  
# 1     1 A     start_date 2022-12-26 2022-52
# 2     1 A     end_date   2023-01-22 2023-04
# 3     2 A     start_date 2022-12-26 2022-52
# 4     2 A     end_date   2023-01-22 2023-04
# 5     3 B     start_date 2023-01-23 2023-04
# 6     3 B     end_date   2023-02-05 2023-06
# 7     4 B     start_date 2023-01-23 2023-04


# Check your  transform because some week are different.
1 Like

Thanks, @M_AcostaCH, but this doesn't solve my problem.

I want to have one observation per grid per name per week. I think I should combine your first suggestion and expand the observations based on the sequence derived from the start and end dates. Perhaps looping it would be the solution? I am not sure... I have looked into many similar questions and still can't find the proper solution.

I think I may understand what you want. Try this.
Starting with your data renamed to dat1.

library(data.table)
dat1 <- structure(list(grid = c(1, 2, 3, 4), name = c("A", "A", "B", 
        "B"), start_date = c("25/10/2006", "25/10/2006", "13/05/2003", 
        "13/05/2003"), end_date = c("31/12/2014", "31/12/2014", "04/04/2004", 
         "04/04/2004")), class = "data.frame", row.names = c(NA, -4L))

(DT <-  as.data.table(dat1))
DT[ , ts := paste0(name, grid)]
DT[ ,  sdate := dmy(start_date)][, edate := dmy(end_date)]
DT1 <- DT[, .(ts, sdate, edate)]

##======================================================

aa <- DT1[ , .(seq(sdate[1], edate[1], by = "week"))]
pl <-  rep(DT$ts[1], nrow(aa))
Tot <- cbind(aa, pl)

for( i in 2 : nrow(DT1)) {
  bb <- DT1[ , .(seq(sdate[i], edate[i], by = "week"))]
  pl <-  rep(DT$ts[i], nrow(aa))
  bb  <- cbind(aa, pl)
  Tot <- rbind(Tot, bb)
}
names(Tot) <- c("dates", "name&grid.No")
Tot

2 Likes

Thanks a lot, @jrkrideau! It works now!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.