How to build a time-series data frame from start and end dates?

hgarbino · February 20, 2023, 1:51pm

Hi! I want to build a time-series data frame. I have several observations and start and end dates. I want to create additional observations disaggregated by week. For instance:

I have:
id1 01/01/2000 31/12/2000

I want to get:
id1 2000-01
id1 2000-02
id1 2000-03
...
id1 2000-52

Thanks a lot for the help!

M_AcostaCH · February 20, 2023, 2:53pm

Hi @hgarbino

# Try with this;
start_date <- as.Date("2000-01-01")
end_date <- as.Date("2000-12-31")


date_seq <- seq(from = start_date, to = end_date, by = "week")

df <- data.frame(
  id = "id1",
  date = date_seq,
  stringsAsFactors = FALSE )

df$year_week <- format(df$date, "%Y-%U")

# id       date year_week
# 1 id1 2000-01-01   2000-00
# 2 id1 2000-01-08   2000-01
# 3 id1 2000-01-15   2000-02
# 4 id1 2000-01-22   2000-03
# 5 id1 2000-01-29   2000-04
# 6 id1 2000-02-05   2000-05

hgarbino · February 20, 2023, 3:28pm

That looks like a great start, thanks a lot! Do you know how I would be able to apply that to all my observations? I have more than 10,000 observations, all of them with different start and end dates.

Thanks!!

M_AcostaCH · February 20, 2023, 3:41pm

For make a good response is better see the reproducible example of data:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Im could suggest you check the lubridate library for get the year_week format from date data.

# But if you data frame is df, this code help you. If all 10.000 observation have this format 2000-01-01 this code # run well
df$year_week <- format(df$date, "%Y-%U")

hgarbino · February 20, 2023, 4:47pm

Muito obrigado pela ajuda, @M_AcostaCH!

My problem is that different observations have different values for seq.

A minimal reproducible example would be this:

grid <- c(1, 2, 3, 4)
name <- c('A', 'A', 'B', 'B')
start_date <- c('26/12/2022','26/12/2022','23/01/2023','23/01/2023')
end_date <- c('22/01/2023', '22/01/2023', '05/02/2023', '05/02/2023')
df <- data.frame(grid, name, start_date, end_date)
df

  grid name start_date   end_date
1    1    A 26/12/2022 22/01/2023
2    2    A 26/12/2022 22/01/2023
3    3    B 23/01/2023 05/02/2023
4    4    B 23/01/2023 05/02/2023

I want to transform this into:

  grid name year_week
1    1    A 2022-52
2    1    A 2023-01
3    1    A 2023-02
4    1    A 2023-03
5    2    A 2022-52
6    2    A 2023-01
7    2    A 2023-02
8    2    A 2023-03
9    3    B 2023-04
10   3    B 2023-05
11   4    B 2023-04
12   4    B 2023-05

Thank you!

M_AcostaCH · February 20, 2023, 6:27pm

grid <- c(1, 2, 3, 4)
name <- c('A', 'A', 'B', 'B')
start_date <- c('26/12/2022','26/12/2022','23/01/2023','23/01/2023')
end_date <- c('22/01/2023', '22/01/2023', '05/02/2023', '05/02/2023')
df <- data.frame(grid, name, start_date, end_date)
df
#>   grid name start_date   end_date
#> 1    1    A 26/12/2022 22/01/2023
#> 2    2    A 26/12/2022 22/01/2023
#> 3    3    B 23/01/2023 05/02/2023
#> 4    4    B 23/01/2023 05/02/2023

df$start_date <- as.Date(df$start_date, format = "%d/%m/%Y")
df$end_date <- as.Date(df$end_date, format = "%d/%m/%Y")


df_long <- pivot_longer(df, cols = c(start_date, end_date), names_to = "date_type", values_to = "date")

df_long$date2 <- format(df_long$date, "%Y-%U")

df_long

# grid  name   date_type      date    date2  
#   <dbl>      <chr> <chr>  <date>     <chr>  
# 1     1 A     start_date 2022-12-26 2022-52
# 2     1 A     end_date   2023-01-22 2023-04
# 3     2 A     start_date 2022-12-26 2022-52
# 4     2 A     end_date   2023-01-22 2023-04
# 5     3 B     start_date 2023-01-23 2023-04
# 6     3 B     end_date   2023-02-05 2023-06
# 7     4 B     start_date 2023-01-23 2023-04


# Check your  transform because some week are different.

hgarbino · February 20, 2023, 8:00pm

Thanks, @M_AcostaCH, but this doesn't solve my problem.

I want to have one observation per grid per name per week. I think I should combine your first suggestion and expand the observations based on the sequence derived from the start and end dates. Perhaps looping it would be the solution? I am not sure... I have looked into many similar questions and still can't find the proper solution.

jrkrideau · February 21, 2023, 12:33am

I think I may understand what you want. Try this.
Starting with your data renamed to dat1.

library(data.table)
dat1 <- structure(list(grid = c(1, 2, 3, 4), name = c("A", "A", "B", 
        "B"), start_date = c("25/10/2006", "25/10/2006", "13/05/2003", 
        "13/05/2003"), end_date = c("31/12/2014", "31/12/2014", "04/04/2004", 
         "04/04/2004")), class = "data.frame", row.names = c(NA, -4L))

(DT <-  as.data.table(dat1))
DT[ , ts := paste0(name, grid)]
DT[ ,  sdate := dmy(start_date)][, edate := dmy(end_date)]
DT1 <- DT[, .(ts, sdate, edate)]

##======================================================

aa <- DT1[ , .(seq(sdate[1], edate[1], by = "week"))]
pl <-  rep(DT$ts[1], nrow(aa))
Tot <- cbind(aa, pl)

for( i in 2 : nrow(DT1)) {
  bb <- DT1[ , .(seq(sdate[i], edate[i], by = "week"))]
  pl <-  rep(DT$ts[i], nrow(aa))
  bb  <- cbind(aa, pl)
  Tot <- rbind(Tot, bb)
}
names(Tot) <- c("dates", "name&grid.No")
Tot

hgarbino · February 21, 2023, 9:58am

Thanks a lot, @jrkrideau! It works now!

system · February 28, 2023, 9:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.