How to generate loop in date sequence?

nanda2021 · January 10, 2022, 6:59pm

I am having a problem with capturing certain values in my database to apply a correlation to lists.

My dataset has the following format, here only the values from the first two locations.

month	year	precipitation	temperature
Jan	2014	149,6	26,434432
Feb	2014	251	26,192469
Mar	2014	161	26,379851
Apr	2014	244,3	26,158357
May	2014	760,7	25,848118
Jun	2014	185,5	26,676528
Jul	2014	44,3	26,655914
Aug	2014	3,8	26,85959
Sep	2014	0,5	27,271498
Oct	2014	2,2	0
Nov	2014	1,8	27,512732
Dec	2014	29,2	27,666532
Jan	2015	30,2	27,422312
Feb	2015	90,7	27,034226
Mar	2015	425,9	26,151944
Apr	2015	366,5	26,217222
May	2015	399,7	26,529839
Jun	2015	106,7	26,482361
Jul	2015	75,9	26,681855
Aug	2015	0	26,952016
Sep	2015	0	27,293333
Oct	2015	0	27,618817
Nov	2015	0,3	27,906528
Dec	2015	11,3	28,453629
Jan	2016	204,5	27,386962
Feb	2016	140,8	27,195259
Mar	2016	362,1	26,78414
Apr	2016	327,4	27,35
May	2016	264,9	27,212903
Jun	2016	133,4	26,838599
Jul	2016	52,1	26,644489
Aug	2016	13,4	27,16413
Sep	2016	1,8	27,425139
Oct	2016	0	27,935887
Nov	2016	0	28,212361
Dec	2016	42,9	27,853091
Jan	2017	380	26,873656
Feb	2017	355,4	25,938839
Mar	2017	442,8	25,816263
Apr	2017	362,1	26,486806
May	2017	329,9	26,72379
Jun	2017	120,6	26,829647
Jul	2017	201,4	26,430181
Aug	2017	4,8	0
Sep	2017	0	0
Oct	2017	1,2	27,88125
Nov	2017	0,6	27,604722
Dec	2017	63,2	27,641264
Jan	2018	253	26,639247
Feb	2018	531	25,598512
Mar	2018	251,4	26,650269
Apr	2018	478,2	25,727778
May	2018	374,2	25,767742
Jun	2018	106,3	26,402639
Jul	2018	75,4	26,162903
Aug	2018	27,2	26,484677
Sep	2018	11,6	27,076389
Oct	2018	4,2	27,53293
Nov	2018	8,5	27,963889
Dec	2018	209,8	26,327107
Jan	2019	327,9	0
Feb	2019	521,6	0
Mar	2019	818,2	25,942721
Apr	2019	433,7	25,956945
May	2019	226,7	26,466263
Jun	2019	218,3	26,475417
Jul	2019	107,4	26,397043
Aug	2019	3,4	26,905645
Sep	2019	4,9	27,213056
Oct	2019	2,1	27,478419
Nov	2019	16	27,710833
Dec	2019	72,8	28,079839
Jan	2020	556,7	26,296838
Feb	2020	435,2	26,259339
Mar	2020	657,8	25,947849
Apr	2020	538,1	26,135972
May	2020	236	26,918432
Jun	2020	136,3	26,792917
Jul	2020	54,5	26,629301
Aug	2020	11,9	26,892473
Sep	2020	0	27,460694
Oct	2020	0	27,748118
Nov	2020	35	27,759722
Dec	2020	46,9	27,760081
Jan	2014	149,6	26,434432
Feb	2014	251	26,192469
Mar	2014	161	26,379851
Apr	2014	244,3	26,158357
May	2014	760,7	25,848118
Jun	2014	185,5	26,676528
Jul	2014	44,3	26,655914
Aug	2014	3,8	26,85959
Sep	2014	0,5	27,271498
Oct	2014	2,2	0
Nov	2014	1,8	27,512732
Dec	2014	29,2	27,666532
Jan	2015	30,2	27,422312
Feb	2015	90,7	27,034226
Mar	2015	425,9	26,151944
Apr	2015	366,5	26,217222
May	2015	399,7	26,529839
Jun	2015	106,7	26,482361
Jul	2015	75,9	26,681855
Aug	2015	0	26,952016
Sep	2015	0	27,293333
Oct	2015	0	27,618817
Nov	2015	0,3	27,906528
Dec	2015	11,3	28,453629
Jan	2016	204,5	27,386962
Feb	2016	140,8	27,195259
Mar	2016	362,1	26,78414
Apr	2016	327,4	27,35
May	2016	264,9	27,212903
Jun	2016	133,4	26,838599
Jul	2016	52,1	26,644489
Aug	2016	13,4	27,16413
Sep	2016	1,8	27,425139
Oct	2016	0	27,935887
Nov	2016	0	28,212361
Dec	2016	42,9	27,853091
Jan	2017	380	26,873656
Feb	2017	355,4	25,938839
Mar	2017	442,8	25,816263
Apr	2017	362,1	26,486806
May	2017	329,9	26,72379
Jun	2017	120,6	26,829647
Jul	2017	201,4	26,430181
Aug	2017	4,8	0
Sep	2017	0	0
Oct	2017	1,2	27,88125
Nov	2017	0,6	27,604722
Dec	2017	63,2	27,641264
Jan	2018	253	26,639247
Feb	2018	531	25,598512
Mar	2018	251,4	26,650269
Apr	2018	478,2	25,727778
May	2018	374,2	25,767742
Jun	2018	106,3	26,402639
Jul	2018	75,4	26,162903
Aug	2018	27,2	26,484677
Sep	2018	11,6	27,076389
Oct	2018	4,2	27,53293
Nov	2018	8,5	27,963889
Dec	2018	209,8	26,327107
Jan	2019	327,9	0
Feb	2019	521,6	0
Mar	2019	818,2	25,942721
Apr	2019	433,7	25,956945
May	2019	226,7	26,466263
Jun	2019	218,3	26,475417
Jul	2019	107,4	26,397043
Aug	2019	3,4	26,905645
Sep	2019	4,9	27,213056
Oct	2019	2,1	27,478419
Nov	2019	16	27,710833
Dec	2019	72,8	28,079839
Jan	2020	556,7	26,296838
Feb	2020	435,2	26,259339
Mar	2020	657,8	25,947849
Apr	2020	538,1	26,135972
May	2020	236	26,918432
Jun	2020	136,3	26,792917
Jul	2020	54,5	26,629301
Aug	2020	11,9	26,892473
Sep	2020	0	27,460694
Oct	2020	0	27,748118
Nov	2020	35	27,759722
Dec	2020	46,9	27,760081
![image	45x500](upload://hP2HFPbhMWbzf1VtVFNDhOYX5ZI.png)

The dataset consists of 123 different locations, where each location has climate data from January 2014 to December 2020, making a dataset with 10332 rows.
A while ago I posted this problem that I got part of solved here, but I have not yet found a way to realize a criteria to achieve the compilation in the desired way.

What the code is currently doing is to select the period of a location where it generates the lists corresponding to the values of the weather variables for the last six months of the location, for example.

But I would like it to repeat this process for all locations, without including values from the previous location. One "solution" for it to go through all neighborhoods and not just one was to extend the date well beyond December 2020 to complete the rows of the dataset, since the dates are only for marking from where to extract the values of the variables. The problem that occurs is that when generating the list for the second location, the code is conditioned to take the December 2020 value from the location before it in the dataset and put it in the list for the current location.

What I would like is for the list values to correspond only to the values of the current location and not to enter values from previous locations. I don't know if because I put the criteria of dates with very high values just to get around the problem of getting only one neighborhood and not the 123 is generating a problem, since I can't generate a loop or something similar. That is, after December 2020 the process restarts and the current list only takes values from that location to form the list with the values of the last six months.

Someone could give me a hand! Thank you

arthur.t · January 10, 2022, 7:43pm

Hm...can you rephrase exactly what you want? The wording is a little confusing.

What does "previous location" mean? "Previous" implies that elements are ordered. Is there an order to the locations?

nanda2021 · January 10, 2022, 7:50pm

Sorry about the English!

When I enter "previous location" I mean a neighborhood that has data from January 2014 to December 2020, and in the sequence of the dataset, at the end of the period December 2020, enters again a neighborhood with data from January 2014 to December 2020, and so on, until the 123 neighborhoods are complete. I indicate "previous location" or "current location" as indicative of which neighborhood (place) the algorithm is taking the data from in the dataset. I don't know if I expressed myself well!

arthur.t · January 11, 2022, 1:15pm

Maybe this? I used some code from the previous solution from @cactusoxbird .

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union


make_lag <- function(df, n_lag) {
  # lag the numeric columns of a data frame
  df %>%
    select_if(is.numeric) %>%
    lag(n_lag) %>% 
    rename_all(str_c, "_lag", n_lag)
}

make_lags <- function(df, lags) {
  # bind a data frame with lagged numeric columns.  lags is an integer vector of lags, e.g. 1:5
  bind_cols(df, map2(list(df), lags, make_lag))
}

# make fake data
clim <- 
  expand_grid(
    neighborhood = factor(seq(1, 10)),
    clim_date = seq(from = ymd("2014-01-01"), to = ymd("2014-12-01"), by = "month"),
  ) %>%
  mutate(
    precip = rnorm(n = n(), mean = 500, sd = 50),
    temp = rnorm(n = n(), mean = 35, sd = 10),
    humidity = rnorm(n = n(), mean = 80, sd = 2)
  ) %>%
  # add lags within each neighborhood
  group_by(neighborhood) %>% group_split() %>%
  map_dfr(make_lags, 1:5)
  
print(clim)
#> # A tibble: 120 x 20
#>    neighborhood clim_date  precip  temp humidity precip_lag1 temp_lag1
#>    <fct>        <date>      <dbl> <dbl>    <dbl>       <dbl>     <dbl>
#>  1 1            2014-01-01   525.  42.1     82.6         NA       NA  
#>  2 1            2014-02-01   445.  36.2     82.7        525.      42.1
#>  3 1            2014-03-01   440.  40.5     80.7        445.      36.2
#>  4 1            2014-04-01   473.  43.3     77.9        440.      40.5
#>  5 1            2014-05-01   540.  29.6     79.6        473.      43.3
#>  6 1            2014-06-01   523.  12.1     78.4        540.      29.6
#>  7 1            2014-07-01   483.  30.7     80.9        523.      12.1
#>  8 1            2014-08-01   538.  30.1     80.7        483.      30.7
#>  9 1            2014-09-01   533.  21.6     81.7        538.      30.1
#> 10 1            2014-10-01   523.  28.8     80.6        533.      21.6
#> # ... with 110 more rows, and 13 more variables: humidity_lag1 <dbl>,
#> #   precip_lag2 <dbl>, temp_lag2 <dbl>, humidity_lag2 <dbl>, precip_lag3 <dbl>,
#> #   temp_lag3 <dbl>, humidity_lag3 <dbl>, precip_lag4 <dbl>, temp_lag4 <dbl>,
#> #   humidity_lag4 <dbl>, precip_lag5 <dbl>, temp_lag5 <dbl>,
#> #   humidity_lag5 <dbl>

^{Created on 2022-01-11 by the reprex package (v2.0.1)}

nanda2021 · January 12, 2022, 1:08pm

Thanks a lot and it's working for me!

system · January 19, 2022, 1:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.