Calculating growth factor for data set.

smichal · September 23, 2020, 8:22pm

could you try to give your sample data in form of a reprex? It will be much easier for people just to copy and paste self-contained and working code into their RStudio to get the problem and figure out a solution.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Solution idea

The key to your solution looks like a combination of arrange for sorting by date, group_by for grouping by state and dplyr::lag to get a value from previous rows. And yes, thanks to "tidyverse magic" the lag function will respect the grouping. I'm trying an example with dummy data:

library(tidyverse)

covid <- tibble(STATE = c("NSW", "NT", "QLD")) %>% 
  mutate(data = map(STATE, ~tibble(DATE = seq(lubridate::today(), by = "1 day", length.out = 4),
                                   NEW_CASES = runif(4, 0, 100)))) %>% 
  unnest(data)

# and this could be your solution
covid %>% 
  # order for that lag combines the right rows
  arrange(DATE) %>% 
  group_by(STATE) %>% 
  mutate(CASES_YESTERDAY = lag(NEW_CASES),
         CASES_BEFORE_YESTERDAY = lag(NEW_CASES, 2)) %>% 
  mutate(GROWTH_RATE_1D = NEW_CASES / CASES_YESTERDAY,
         GROWTH_RATE_2D = NEW_CASES / CASES_BEFORE_YESTERDAY) %>% 
  # re-arrange (not necessary) only to check if the grouping was respected
  arrange(STATE, DATE)
#> # A tibble: 12 x 7
#> # Groups:   STATE [3]
#>    STATE DATE       NEW_CASES CASES_YESTERDAY CASES_BEFORE_YE~ GROWTH_RATE_1D
#>    <chr> <date>         <dbl>           <dbl>            <dbl>          <dbl>
#>  1 NSW   2020-09-23     69.2            NA               NA           NA     
#>  2 NSW   2020-09-24      3.67           69.2             NA            0.0530
#>  3 NSW   2020-09-25     14.8             3.67            69.2          4.05  
#>  4 NSW   2020-09-26     14.7            14.8              3.67         0.988 
#>  5 NT    2020-09-23     13.5            NA               NA           NA     
#>  6 NT    2020-09-24     91.9            13.5             NA            6.82  
#>  7 NT    2020-09-25     62.4            91.9             13.5          0.679 
#>  8 NT    2020-09-26     32.2            62.4             91.9          0.516 
#>  9 QLD   2020-09-23     58.0            NA               NA           NA     
#> 10 QLD   2020-09-24     50.0            58.0             NA            0.862 
#> 11 QLD   2020-09-25     13.9            50.0             58.0          0.278 
#> 12 QLD   2020-09-26     71.3            13.9             50.0          5.13  
#> # ... with 1 more variable: GROWTH_RATE_2D <dbl>

^{Created on 2020-09-23 by the reprex package (v0.3.0)}