Convert from cumulated to daily observations

Lucamanu · October 21, 2020, 1:45pm

Hi everybody, I'm a total newbie in RStudio so please bear with me in case my question sounds stupid.
I have a simple dataframe consisting of daily observations (time series), across three different stations.
So each observation includes:

the day,
station name and
observed value
(three columns only).

Unfortunately the values are cumulated and include the past ones.
I would like to mutate() the dataframe to include the daily observations for each station (I must perform several tests): what is the simplest approach?
Many thanks for your help!

andresrcs · October 21, 2020, 1:55pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Lucamanu · October 21, 2020, 5:41pm

Hi @andresrcs, thanks for the suggestion.

Answering your question, I am using tidyverse, and I am working on a dataframe as follows:

# A tibble: 272 x 3
   station  days       cases
   <chr>   <date>     <dbl>
 1 AA	   2020-01-22     3
 2 BB	   2020-01-22     2
 3 CC	   2020-01-22     6
 4 AA	   2020-01-23     5
 5 BB	   2020-01-23     5
 6 CC	   2020-01-23     8
 7 AA      2020-01-24     9
 8 BB      2020-01-24    10
 9 CC      2020-01-24    11
10 AA      2020-01-25    13
11 BB      2020-01-25    15
12 CC      2020-01-25    12


And I would like to obtain the following:
# A tibble: 272 x 4
   station  days        cases	   dailies
   <chr>   <date>       <dbl>	     <dbl>
 1 AA	   2020-01-22     3	           3
 2 BB	   2020-01-22     2	           2
 3 CC	   2020-01-22     6	           6
 4 AA	   2020-01-23     5	           2
 5 BB	   2020-01-23     5	           3
 6 CC	   2020-01-23     8   	       2
 7 AA      2020-01-24     9	           4
 8 BB      2020-01-24    10	           5
 9 CC      2020-01-24    11	           3
10 AA      2020-01-25    13	           4
11 BB      2020-01-25    15	           5
12 CC      2020-01-25    12	           1

My current strategy would be to subset the dataframe by each station and subtract with a loop; but I am not sure how to merge the additional variable back in the initial df.
I am open to use whatever package to obtain the result in a more elegant way. Thanks in advance for any suggestion.

FJCC · October 21, 2020, 6:35pm

Here is one method.

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

DF <- read.csv("~/R/Play/Dummy.csv")
DF2 <- DF %>% group_by(Station) %>% 
   arrange(Days) %>% 
   mutate(LagValue = lag(Cases)) %>% 
  replace_na(replace = list(LagValue = 0)) %>% 
  mutate(Dailies = Cases - LagValue)
DF2
#> # A tibble: 12 x 5
#> # Groups:   Station [3]
#>    Station Days       Cases LagValue Dailies
#>    <chr>   <chr>      <int>    <dbl>   <dbl>
#>  1 AA      2020-01-22     3        0       3
#>  2 BB      2020-01-22     2        0       2
#>  3 CC      2020-01-22     6        0       6
#>  4 AA      2020-01-23     5        3       2
#>  5 BB      2020-01-23     5        2       3
#>  6 CC      2020-01-23     8        6       2
#>  7 AA      2020-01-24     9        5       4
#>  8 BB      2020-01-24    10        5       5
#>  9 CC      2020-01-24    11        8       3
#> 10 AA      2020-01-25    13        9       4
#> 11 BB      2020-01-25    15       10       5
#> 12 CC      2020-01-25    12       11       1

^{Created on 2020-10-21 by the reprex package (v0.3.0)}

Lucamanu · October 22, 2020, 7:25am

Thanks @FJCC, extremely cool and simple.
Can I just ask you what is the reason of the line

warn.conflicts = FALSE

that you added to the library instruction?
Again thanks for the great help!

FJCC · October 22, 2020, 11:54am

I use the warn.conflicts argument when I am making examples because it prevents the output of several warnings about dplyr functions that mask functions from base packages. Those warnings are normal and harmless but they clutter the output. Since I did not show the output in the case, I should have removed it.

system · October 29, 2020, 11:54am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.