How to generate loop in date sequence?

I am having a problem with capturing certain values in my database to apply a correlation to lists.

My dataset has the following format, here only the values from the first two locations.

month year precipitation temperature
Jan 2014 149,6 26,434432
Feb 2014 251 26,192469
Mar 2014 161 26,379851
Apr 2014 244,3 26,158357
May 2014 760,7 25,848118
Jun 2014 185,5 26,676528
Jul 2014 44,3 26,655914
Aug 2014 3,8 26,85959
Sep 2014 0,5 27,271498
Oct 2014 2,2 0
Nov 2014 1,8 27,512732
Dec 2014 29,2 27,666532
Jan 2015 30,2 27,422312
Feb 2015 90,7 27,034226
Mar 2015 425,9 26,151944
Apr 2015 366,5 26,217222
May 2015 399,7 26,529839
Jun 2015 106,7 26,482361
Jul 2015 75,9 26,681855
Aug 2015 0 26,952016
Sep 2015 0 27,293333
Oct 2015 0 27,618817
Nov 2015 0,3 27,906528
Dec 2015 11,3 28,453629
Jan 2016 204,5 27,386962
Feb 2016 140,8 27,195259
Mar 2016 362,1 26,78414
Apr 2016 327,4 27,35
May 2016 264,9 27,212903
Jun 2016 133,4 26,838599
Jul 2016 52,1 26,644489
Aug 2016 13,4 27,16413
Sep 2016 1,8 27,425139
Oct 2016 0 27,935887
Nov 2016 0 28,212361
Dec 2016 42,9 27,853091
Jan 2017 380 26,873656
Feb 2017 355,4 25,938839
Mar 2017 442,8 25,816263
Apr 2017 362,1 26,486806
May 2017 329,9 26,72379
Jun 2017 120,6 26,829647
Jul 2017 201,4 26,430181
Aug 2017 4,8 0
Sep 2017 0 0
Oct 2017 1,2 27,88125
Nov 2017 0,6 27,604722
Dec 2017 63,2 27,641264
Jan 2018 253 26,639247
Feb 2018 531 25,598512
Mar 2018 251,4 26,650269
Apr 2018 478,2 25,727778
May 2018 374,2 25,767742
Jun 2018 106,3 26,402639
Jul 2018 75,4 26,162903
Aug 2018 27,2 26,484677
Sep 2018 11,6 27,076389
Oct 2018 4,2 27,53293
Nov 2018 8,5 27,963889
Dec 2018 209,8 26,327107
Jan 2019 327,9 0
Feb 2019 521,6 0
Mar 2019 818,2 25,942721
Apr 2019 433,7 25,956945
May 2019 226,7 26,466263
Jun 2019 218,3 26,475417
Jul 2019 107,4 26,397043
Aug 2019 3,4 26,905645
Sep 2019 4,9 27,213056
Oct 2019 2,1 27,478419
Nov 2019 16 27,710833
Dec 2019 72,8 28,079839
Jan 2020 556,7 26,296838
Feb 2020 435,2 26,259339
Mar 2020 657,8 25,947849
Apr 2020 538,1 26,135972
May 2020 236 26,918432
Jun 2020 136,3 26,792917
Jul 2020 54,5 26,629301
Aug 2020 11,9 26,892473
Sep 2020 0 27,460694
Oct 2020 0 27,748118
Nov 2020 35 27,759722
Dec 2020 46,9 27,760081
Jan 2014 149,6 26,434432
Feb 2014 251 26,192469
Mar 2014 161 26,379851
Apr 2014 244,3 26,158357
May 2014 760,7 25,848118
Jun 2014 185,5 26,676528
Jul 2014 44,3 26,655914
Aug 2014 3,8 26,85959
Sep 2014 0,5 27,271498
Oct 2014 2,2 0
Nov 2014 1,8 27,512732
Dec 2014 29,2 27,666532
Jan 2015 30,2 27,422312
Feb 2015 90,7 27,034226
Mar 2015 425,9 26,151944
Apr 2015 366,5 26,217222
May 2015 399,7 26,529839
Jun 2015 106,7 26,482361
Jul 2015 75,9 26,681855
Aug 2015 0 26,952016
Sep 2015 0 27,293333
Oct 2015 0 27,618817
Nov 2015 0,3 27,906528
Dec 2015 11,3 28,453629
Jan 2016 204,5 27,386962
Feb 2016 140,8 27,195259
Mar 2016 362,1 26,78414
Apr 2016 327,4 27,35
May 2016 264,9 27,212903
Jun 2016 133,4 26,838599
Jul 2016 52,1 26,644489
Aug 2016 13,4 27,16413
Sep 2016 1,8 27,425139
Oct 2016 0 27,935887
Nov 2016 0 28,212361
Dec 2016 42,9 27,853091
Jan 2017 380 26,873656
Feb 2017 355,4 25,938839
Mar 2017 442,8 25,816263
Apr 2017 362,1 26,486806
May 2017 329,9 26,72379
Jun 2017 120,6 26,829647
Jul 2017 201,4 26,430181
Aug 2017 4,8 0
Sep 2017 0 0
Oct 2017 1,2 27,88125
Nov 2017 0,6 27,604722
Dec 2017 63,2 27,641264
Jan 2018 253 26,639247
Feb 2018 531 25,598512
Mar 2018 251,4 26,650269
Apr 2018 478,2 25,727778
May 2018 374,2 25,767742
Jun 2018 106,3 26,402639
Jul 2018 75,4 26,162903
Aug 2018 27,2 26,484677
Sep 2018 11,6 27,076389
Oct 2018 4,2 27,53293
Nov 2018 8,5 27,963889
Dec 2018 209,8 26,327107
Jan 2019 327,9 0
Feb 2019 521,6 0
Mar 2019 818,2 25,942721
Apr 2019 433,7 25,956945
May 2019 226,7 26,466263
Jun 2019 218,3 26,475417
Jul 2019 107,4 26,397043
Aug 2019 3,4 26,905645
Sep 2019 4,9 27,213056
Oct 2019 2,1 27,478419
Nov 2019 16 27,710833
Dec 2019 72,8 28,079839
Jan 2020 556,7 26,296838
Feb 2020 435,2 26,259339
Mar 2020 657,8 25,947849
Apr 2020 538,1 26,135972
May 2020 236 26,918432
Jun 2020 136,3 26,792917
Jul 2020 54,5 26,629301
Aug 2020 11,9 26,892473
Sep 2020 0 27,460694
Oct 2020 0 27,748118
Nov 2020 35 27,759722
Dec 2020 46,9 27,760081
![image 45x500](upload://hP2HFPbhMWbzf1VtVFNDhOYX5ZI.png)

The dataset consists of 123 different locations, where each location has climate data from January 2014 to December 2020, making a dataset with 10332 rows.
A while ago I posted this problem that I got part of solved here, but I have not yet found a way to realize a criteria to achieve the compilation in the desired way.

What the code is currently doing is to select the period of a location where it generates the lists corresponding to the values of the weather variables for the last six months of the location, for example.

But I would like it to repeat this process for all locations, without including values from the previous location. One "solution" for it to go through all neighborhoods and not just one was to extend the date well beyond December 2020 to complete the rows of the dataset, since the dates are only for marking from where to extract the values of the variables. The problem that occurs is that when generating the list for the second location, the code is conditioned to take the December 2020 value from the location before it in the dataset and put it in the list for the current location.

What I would like is for the list values to correspond only to the values of the current location and not to enter values from previous locations. I don't know if because I put the criteria of dates with very high values just to get around the problem of getting only one neighborhood and not the 123 is generating a problem, since I can't generate a loop or something similar. That is, after December 2020 the process restarts and the current list only takes values from that location to form the list with the values of the last six months.

Someone could give me a hand! Thank you

Hm...can you rephrase exactly what you want? The wording is a little confusing.

What does "previous location" mean? "Previous" implies that elements are ordered. Is there an order to the locations?

1 Like

Sorry about the English!

When I enter "previous location" I mean a neighborhood that has data from January 2014 to December 2020, and in the sequence of the dataset, at the end of the period December 2020, enters again a neighborhood with data from January 2014 to December 2020, and so on, until the 123 neighborhoods are complete. I indicate "previous location" or "current location" as indicative of which neighborhood (place) the algorithm is taking the data from in the dataset. I don't know if I expressed myself well!

Maybe this? I used some code from the previous solution from @cactusoxbird .

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union


make_lag <- function(df, n_lag) {
  # lag the numeric columns of a data frame
  df %>%
    select_if(is.numeric) %>%
    lag(n_lag) %>% 
    rename_all(str_c, "_lag", n_lag)
}

make_lags <- function(df, lags) {
  # bind a data frame with lagged numeric columns.  lags is an integer vector of lags, e.g. 1:5
  bind_cols(df, map2(list(df), lags, make_lag))
}

# make fake data
clim <- 
  expand_grid(
    neighborhood = factor(seq(1, 10)),
    clim_date = seq(from = ymd("2014-01-01"), to = ymd("2014-12-01"), by = "month"),
  ) %>%
  mutate(
    precip = rnorm(n = n(), mean = 500, sd = 50),
    temp = rnorm(n = n(), mean = 35, sd = 10),
    humidity = rnorm(n = n(), mean = 80, sd = 2)
  ) %>%
  # add lags within each neighborhood
  group_by(neighborhood) %>% group_split() %>%
  map_dfr(make_lags, 1:5)
  
print(clim)
#> # A tibble: 120 x 20
#>    neighborhood clim_date  precip  temp humidity precip_lag1 temp_lag1
#>    <fct>        <date>      <dbl> <dbl>    <dbl>       <dbl>     <dbl>
#>  1 1            2014-01-01   525.  42.1     82.6         NA       NA  
#>  2 1            2014-02-01   445.  36.2     82.7        525.      42.1
#>  3 1            2014-03-01   440.  40.5     80.7        445.      36.2
#>  4 1            2014-04-01   473.  43.3     77.9        440.      40.5
#>  5 1            2014-05-01   540.  29.6     79.6        473.      43.3
#>  6 1            2014-06-01   523.  12.1     78.4        540.      29.6
#>  7 1            2014-07-01   483.  30.7     80.9        523.      12.1
#>  8 1            2014-08-01   538.  30.1     80.7        483.      30.7
#>  9 1            2014-09-01   533.  21.6     81.7        538.      30.1
#> 10 1            2014-10-01   523.  28.8     80.6        533.      21.6
#> # ... with 110 more rows, and 13 more variables: humidity_lag1 <dbl>,
#> #   precip_lag2 <dbl>, temp_lag2 <dbl>, humidity_lag2 <dbl>, precip_lag3 <dbl>,
#> #   temp_lag3 <dbl>, humidity_lag3 <dbl>, precip_lag4 <dbl>, temp_lag4 <dbl>,
#> #   humidity_lag4 <dbl>, precip_lag5 <dbl>, temp_lag5 <dbl>,
#> #   humidity_lag5 <dbl>

Created on 2022-01-11 by the reprex package (v2.0.1)

1 Like

Thanks a lot and it's working for me!

1 Like