Calculating growth factor for data set.

Hi @Maninder,

could you try to give your sample data in form of a reprex? It will be much easier for people just to copy and paste self-contained and working code into their RStudio to get the problem and figure out a solution.

Solution idea

The key to your solution looks like a combination of arrange for sorting by date, group_by for grouping by state and dplyr::lag to get a value from previous rows. And yes, thanks to "tidyverse magic" the lag function will respect the grouping. I'm trying an example with dummy data:

library(tidyverse)

covid <- tibble(STATE = c("NSW", "NT", "QLD")) %>% 
  mutate(data = map(STATE, ~tibble(DATE = seq(lubridate::today(), by = "1 day", length.out = 4),
                                   NEW_CASES = runif(4, 0, 100)))) %>% 
  unnest(data)

# and this could be your solution
covid %>% 
  # order for that lag combines the right rows
  arrange(DATE) %>% 
  group_by(STATE) %>% 
  mutate(CASES_YESTERDAY = lag(NEW_CASES),
         CASES_BEFORE_YESTERDAY = lag(NEW_CASES, 2)) %>% 
  mutate(GROWTH_RATE_1D = NEW_CASES / CASES_YESTERDAY,
         GROWTH_RATE_2D = NEW_CASES / CASES_BEFORE_YESTERDAY) %>% 
  # re-arrange (not necessary) only to check if the grouping was respected
  arrange(STATE, DATE)
#> # A tibble: 12 x 7
#> # Groups:   STATE [3]
#>    STATE DATE       NEW_CASES CASES_YESTERDAY CASES_BEFORE_YE~ GROWTH_RATE_1D
#>    <chr> <date>         <dbl>           <dbl>            <dbl>          <dbl>
#>  1 NSW   2020-09-23     69.2            NA               NA           NA     
#>  2 NSW   2020-09-24      3.67           69.2             NA            0.0530
#>  3 NSW   2020-09-25     14.8             3.67            69.2          4.05  
#>  4 NSW   2020-09-26     14.7            14.8              3.67         0.988 
#>  5 NT    2020-09-23     13.5            NA               NA           NA     
#>  6 NT    2020-09-24     91.9            13.5             NA            6.82  
#>  7 NT    2020-09-25     62.4            91.9             13.5          0.679 
#>  8 NT    2020-09-26     32.2            62.4             91.9          0.516 
#>  9 QLD   2020-09-23     58.0            NA               NA           NA     
#> 10 QLD   2020-09-24     50.0            58.0             NA            0.862 
#> 11 QLD   2020-09-25     13.9            50.0             58.0          0.278 
#> 12 QLD   2020-09-26     71.3            13.9             50.0          5.13  
#> # ... with 1 more variable: GROWTH_RATE_2D <dbl>

Created on 2020-09-23 by the reprex package (v0.3.0)

1 Like