How to get number of new items when given daily running totals

I've got a dataset of US coronavirus cases by county. It includes a daily running total but not the number of new cases per day. What is the tidyverse way to get the daily new cases?

Here is what the data looks like:

| day_num | ttl_cases |
|39 |  7 |
|40 | 11 |
|41 | 13 |
|42 | 14 |

new_cases = today's ttl_cases - yesterday's ttl_cases

Thanks in advance for any help.

dplyr::lag() is what you need.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

df <- tribble(~ day_num, ~ ttl_cases,
              39, 7,
              40, 11,
              41, 13,
              42, 14)

df %>% mutate(new_cases = ttl_cases - lag(ttl_cases))
#> # A tibble: 4 x 3
#>   day_num ttl_cases new_cases
#>     <dbl>     <dbl>     <dbl>
#> 1      39         7        NA
#> 2      40        11         4
#> 3      41        13         2
#> 4      42        14         1

Created on 2020-04-03 by the reprex package (v0.3.0)

Booyaa!! Got a solution in less than 15 minutes. Works perfectly. Thanks.

1 Like

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.

In this case, it's so simple to recreate the data that asking for one feels like quibbling. Just keep in mind the advantages of making a question as easy as possible to answer. Anyway, I already had representative data.

suppressPackageStartupMessages(library(dplyr))
dat <- structure(list(
  confirmed =
    c(2, 5, 18, 28, 43, 61, 95, 139, 245, 388, 593, 978, 1501, 2336, 2922, 3513, 4747, 5823, 6566, 7161, 8042, 9000, 10075, 11364, 12729, 13938, 14991, 16169, 17361, 18407, 19644, 20610, 21638, 23049, 24811, 27017, 29406, 32332, 35408, 38309),
  deaths =
    c(2, 2, 4, 5, 8, 12, 16, 19, 26, 34, 43, 54, 66, 77, 92, 107, 124, 145, 194, 237, 291, 354, 429, 514, 611, 724, 853, 988, 1135, 1284, 1433, 1556, 1685, 1812, 1934, 2077, 2234, 2378, 2517, 2640),
  recovered =
    c(0, 0, 0, 0, 0, 0, 0, 49, 49, 73, 123, 175, 291, 291, 552, 739, 913, 1669, 2134, 2394, 2731, 2959, 2959, 2959, 2959, 4590, 4590, 5389, 5389, 5710, 6745, 7635, 7931, 7931, 8913, 9625, 10457, 11133, 11679, 12391)),
  class =
    c("spec_tbl_df", "tbl_df", "tbl", "data.frame"),
  row.names =
    c(NA, -40L), spec = structure(list(cols = list(confirmed = structure(list(),
  class =
    c("collector_double", "collector")), deaths = structure(list(),
  class =
    c("collector_double", "collector")), recovered = structure(list(),
  class =
    c("collector_double", "collector"))), default = structure(list(),
  class =
    c("collector_guess", "collector")), skip = 1),
  class = "col_spec"))
  
dat %>% mutate(new_cases = confirmed - lag(confirmed,1))
#> # A tibble: 40 x 4
#>    confirmed deaths recovered new_cases
#>        <dbl>  <dbl>     <dbl>     <dbl>
#>  1         2      2         0        NA
#>  2         5      2         0         3
#>  3        18      4         0        13
#>  4        28      5         0        10
#>  5        43      8         0        15
#>  6        61     12         0        18
#>  7        95     16         0        34
#>  8       139     19        49        44
#>  9       245     26        49       106
#> 10       388     34        73       143
#> # … with 30 more rows

Created on 2020-04-03 by the reprex package (v0.3.0)

Note that the first in the series will always be NA. If that needs to be rounded off, something like

dat %>% mutate(newcases = ifelse(is.na(newcases),confirmed,newcases))
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.