Calculating growth factor for data set.

Hello all,

I am trying to do here is compare the daily growth factor across all the eight states in Australia from 17 March 2020 to 16 August 2020. (The growth factor is calculated by dividing the new cases on the current day with the new cases of the previous day.)

This is the inner structure of the file.

str(covid)
'data.frame':	1640 obs. of  13 variables:
 $ date         : Date, format: "2020-01-25" "2020-01-25" ...
 $ state        : chr  "Australian Capital Territory" "New South Wales" "Northern Territory" 
"Queensland" ...
 $ state_abbrev : chr  "ACT" "NSW" "NT" "QLD" ...
 $ confirmed    : int  0 3 0 0 0 0 1 0 0 0 ...
 $ confirmed_cum: int  0 3 0 0 0 0 1 0 0 3 ...
 $ deaths       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ deaths_cum   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ tests        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ tests_cum    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ positives    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ positives_cum: int  0 0 0 0 0 0 0 0 0 0 ...
 $ recovered    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ recovered_cum: int  0 0 0 0 0 0 0 0 0 0 ...

I have tried some code but not been able to get it right and have posted a question before Previous discussion. Can anyone please help?

Hi @Maninder,

could you try to give your sample data in form of a reprex? It will be much easier for people just to copy and paste self-contained and working code into their RStudio to get the problem and figure out a solution.

Solution idea

The key to your solution looks like a combination of arrange for sorting by date, group_by for grouping by state and dplyr::lag to get a value from previous rows. And yes, thanks to "tidyverse magic" the lag function will respect the grouping. I'm trying an example with dummy data:

library(tidyverse)

covid <- tibble(STATE = c("NSW", "NT", "QLD")) %>% 
  mutate(data = map(STATE, ~tibble(DATE = seq(lubridate::today(), by = "1 day", length.out = 4),
                                   NEW_CASES = runif(4, 0, 100)))) %>% 
  unnest(data)

# and this could be your solution
covid %>% 
  # order for that lag combines the right rows
  arrange(DATE) %>% 
  group_by(STATE) %>% 
  mutate(CASES_YESTERDAY = lag(NEW_CASES),
         CASES_BEFORE_YESTERDAY = lag(NEW_CASES, 2)) %>% 
  mutate(GROWTH_RATE_1D = NEW_CASES / CASES_YESTERDAY,
         GROWTH_RATE_2D = NEW_CASES / CASES_BEFORE_YESTERDAY) %>% 
  # re-arrange (not necessary) only to check if the grouping was respected
  arrange(STATE, DATE)
#> # A tibble: 12 x 7
#> # Groups:   STATE [3]
#>    STATE DATE       NEW_CASES CASES_YESTERDAY CASES_BEFORE_YE~ GROWTH_RATE_1D
#>    <chr> <date>         <dbl>           <dbl>            <dbl>          <dbl>
#>  1 NSW   2020-09-23     69.2            NA               NA           NA     
#>  2 NSW   2020-09-24      3.67           69.2             NA            0.0530
#>  3 NSW   2020-09-25     14.8             3.67            69.2          4.05  
#>  4 NSW   2020-09-26     14.7            14.8              3.67         0.988 
#>  5 NT    2020-09-23     13.5            NA               NA           NA     
#>  6 NT    2020-09-24     91.9            13.5             NA            6.82  
#>  7 NT    2020-09-25     62.4            91.9             13.5          0.679 
#>  8 NT    2020-09-26     32.2            62.4             91.9          0.516 
#>  9 QLD   2020-09-23     58.0            NA               NA           NA     
#> 10 QLD   2020-09-24     50.0            58.0             NA            0.862 
#> 11 QLD   2020-09-25     13.9            50.0             58.0          0.278 
#> 12 QLD   2020-09-26     71.3            13.9             50.0          5.13  
#> # ... with 1 more variable: GROWTH_RATE_2D <dbl>

Created on 2020-09-23 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.