thanks /u/jrmuirhead ... this works pretty very elegantly. i've got the data in the manner that i've wanted. I'd never have been able to figure out the cumsum and head(name,1) method!
however, there's something different that I'm able to see if i were to run the same code twice. reprex is produced below. I ran the piped command two times on the same data set and there seems to be two different results that are produced. the first time is as per your instructions and is absolutely correct. However, with the data still grouped on value_present, would u be able to help why does it just get reset to 0 and 1 when it run for the second time? I'm intrigued why does a double grouping on the same column change the output? not that it affects my data, but i'd like to expand my understanding of the code. thanks once again!
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.3
#> Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
#> when loading 'dplyr'
#> Warning: package 'tibble' was built under R version 4.0.2
t <- tibble(date = seq(as.Date("2021-01-01"), as.Date("2021-01-15"),
by = "1 day"),
name = c(NA,NA,"a",NA,NA,NA,NA,"b",NA,NA,"a",NA,NA,NA,NA))
#t$expected <- c(NA,NA,"a","a","a","a",NA,"b","b","b","a","a","a","a",NA)
t
#> # A tibble: 15 x 2
#> date name
#> <date> <chr>
#> 1 2021-01-01 <NA>
#> 2 2021-01-02 <NA>
#> 3 2021-01-03 a
#> 4 2021-01-04 <NA>
#> 5 2021-01-05 <NA>
#> 6 2021-01-06 <NA>
#> 7 2021-01-07 <NA>
#> 8 2021-01-08 b
#> 9 2021-01-09 <NA>
#> 10 2021-01-10 <NA>
#> 11 2021-01-11 a
#> 12 2021-01-12 <NA>
#> 13 2021-01-13 <NA>
#> 14 2021-01-14 <NA>
#> 15 2021-01-15 <NA>
t <- t %>%
mutate(value_present = cumsum(!is.na(name))) %>%
group_by(value_present) %>%
mutate(new_name = case_when(date - min(date) <= 3 ~ head(name, 1)))
t
#> # A tibble: 15 x 4
#> # Groups: value_present [4]
#> date name value_present new_name
#> <date> <chr> <int> <chr>
#> 1 2021-01-01 <NA> 0 <NA>
#> 2 2021-01-02 <NA> 0 <NA>
#> 3 2021-01-03 a 1 a
#> 4 2021-01-04 <NA> 1 a
#> 5 2021-01-05 <NA> 1 a
#> 6 2021-01-06 <NA> 1 a
#> 7 2021-01-07 <NA> 1 <NA>
#> 8 2021-01-08 b 2 b
#> 9 2021-01-09 <NA> 2 b
#> 10 2021-01-10 <NA> 2 b
#> 11 2021-01-11 a 3 a
#> 12 2021-01-12 <NA> 3 a
#> 13 2021-01-13 <NA> 3 a
#> 14 2021-01-14 <NA> 3 a
#> 15 2021-01-15 <NA> 3 <NA>
str(t)
#> tibble [15 x 4] (S3: grouped_df/tbl_df/tbl/data.frame)
#> $ date : Date[1:15], format: "2021-01-01" "2021-01-02" ...
#> $ name : chr [1:15] NA NA "a" NA ...
#> $ value_present: int [1:15] 0 0 1 1 1 1 1 2 2 2 ...
#> $ new_name : chr [1:15] NA NA "a" "a" ...
#> - attr(*, "groups")= tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
#> ..$ value_present: int [1:4] 0 1 2 3
#> ..$ .rows : list<int> [1:4]
#> .. ..$ : int [1:2] 1 2
#> .. ..$ : int [1:5] 3 4 5 6 7
#> .. ..$ : int [1:3] 8 9 10
#> .. ..$ : int [1:5] 11 12 13 14 15
#> .. ..@ ptype: int(0)
#> ..- attr(*, ".drop")= logi TRUE
#running the same code for the second time
t <- t %>%
mutate(value_present = cumsum(!is.na(name))) %>%
group_by(value_present) %>%
mutate(new_name = case_when(date - min(date) <= 3 ~ head(name, 1)))
t
#> # A tibble: 15 x 4
#> # Groups: value_present [2]
#> date name value_present new_name
#> <date> <chr> <int> <chr>
#> 1 2021-01-01 <NA> 0 <NA>
#> 2 2021-01-02 <NA> 0 <NA>
#> 3 2021-01-03 a 1 a
#> 4 2021-01-04 <NA> 1 a
#> 5 2021-01-05 <NA> 1 a
#> 6 2021-01-06 <NA> 1 a
#> 7 2021-01-07 <NA> 1 <NA>
#> 8 2021-01-08 b 1 <NA>
#> 9 2021-01-09 <NA> 1 <NA>
#> 10 2021-01-10 <NA> 1 <NA>
#> 11 2021-01-11 a 1 <NA>
#> 12 2021-01-12 <NA> 1 <NA>
#> 13 2021-01-13 <NA> 1 <NA>
#> 14 2021-01-14 <NA> 1 <NA>
#> 15 2021-01-15 <NA> 1 <NA>
Created on 2021-07-24 by the reprex package (v0.3.0)