Convert from cumulated to daily observations

Hi everybody, I'm a total newbie in RStudio so please bear with me in case my question sounds stupid.
I have a simple dataframe consisting of daily observations (time series), across three different stations.
So each observation includes:

  • the day,
  • station name and
  • observed value
    (three columns only).

Unfortunately the values are cumulated and include the past ones.
I would like to mutate() the dataframe to include the daily observations for each station (I must perform several tests): what is the simplest approach?
Many thanks for your help!

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Hi @andresrcs, thanks for the suggestion.

Answering your question, I am using tidyverse, and I am working on a dataframe as follows:

# A tibble: 272 x 3
   station  days       cases
   <chr>   <date>     <dbl>
 1 AA	   2020-01-22     3
 2 BB	   2020-01-22     2
 3 CC	   2020-01-22     6
 4 AA	   2020-01-23     5
 5 BB	   2020-01-23     5
 6 CC	   2020-01-23     8
 7 AA      2020-01-24     9
 8 BB      2020-01-24    10
 9 CC      2020-01-24    11
10 AA      2020-01-25    13
11 BB      2020-01-25    15
12 CC      2020-01-25    12


And I would like to obtain the following:
# A tibble: 272 x 4
   station  days        cases	   dailies
   <chr>   <date>       <dbl>	     <dbl>
 1 AA	   2020-01-22     3	           3
 2 BB	   2020-01-22     2	           2
 3 CC	   2020-01-22     6	           6
 4 AA	   2020-01-23     5	           2
 5 BB	   2020-01-23     5	           3
 6 CC	   2020-01-23     8   	       2
 7 AA      2020-01-24     9	           4
 8 BB      2020-01-24    10	           5
 9 CC      2020-01-24    11	           3
10 AA      2020-01-25    13	           4
11 BB      2020-01-25    15	           5
12 CC      2020-01-25    12	           1

My current strategy would be to subset the dataframe by each station and subtract with a loop; but I am not sure how to merge the additional variable back in the initial df.
I am open to use whatever package to obtain the result in a more elegant way. Thanks in advance for any suggestion.

Here is one method.

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

DF <- read.csv("~/R/Play/Dummy.csv")
DF2 <- DF %>% group_by(Station) %>% 
   arrange(Days) %>% 
   mutate(LagValue = lag(Cases)) %>% 
  replace_na(replace = list(LagValue = 0)) %>% 
  mutate(Dailies = Cases - LagValue)
DF2
#> # A tibble: 12 x 5
#> # Groups:   Station [3]
#>    Station Days       Cases LagValue Dailies
#>    <chr>   <chr>      <int>    <dbl>   <dbl>
#>  1 AA      2020-01-22     3        0       3
#>  2 BB      2020-01-22     2        0       2
#>  3 CC      2020-01-22     6        0       6
#>  4 AA      2020-01-23     5        3       2
#>  5 BB      2020-01-23     5        2       3
#>  6 CC      2020-01-23     8        6       2
#>  7 AA      2020-01-24     9        5       4
#>  8 BB      2020-01-24    10        5       5
#>  9 CC      2020-01-24    11        8       3
#> 10 AA      2020-01-25    13        9       4
#> 11 BB      2020-01-25    15       10       5
#> 12 CC      2020-01-25    12       11       1

Created on 2020-10-21 by the reprex package (v0.3.0)

Thanks @FJCC, extremely cool and simple.
Can I just ask you what is the reason of the line

warn.conflicts = FALSE

that you added to the library instruction?
Again thanks for the great help!

I use the warn.conflicts argument when I am making examples because it prevents the output of several warnings about dplyr functions that mask functions from base packages. Those warnings are normal and harmless but they clutter the output. Since I did not show the output in the case, I should have removed it.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.