Hi everybody, I'm a total newbie in RStudio so please bear with me in case my question sounds stupid.
I have a simple dataframe consisting of daily observations (time series), across three different stations.
So each observation includes:
the day,
station name and
observed value
(three columns only).
Unfortunately the values are cumulated and include the past ones.
I would like to mutate() the dataframe to include the daily observations for each station (I must perform several tests): what is the simplest approach?
Many thanks for your help!
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
Answering your question, I am using tidyverse, and I am working on a dataframe as follows:
# A tibble: 272 x 3
station days cases
<chr> <date> <dbl>
1 AA 2020-01-22 3
2 BB 2020-01-22 2
3 CC 2020-01-22 6
4 AA 2020-01-23 5
5 BB 2020-01-23 5
6 CC 2020-01-23 8
7 AA 2020-01-24 9
8 BB 2020-01-24 10
9 CC 2020-01-24 11
10 AA 2020-01-25 13
11 BB 2020-01-25 15
12 CC 2020-01-25 12
And I would like to obtain the following:
# A tibble: 272 x 4
station days cases dailies
<chr> <date> <dbl> <dbl>
1 AA 2020-01-22 3 3
2 BB 2020-01-22 2 2
3 CC 2020-01-22 6 6
4 AA 2020-01-23 5 2
5 BB 2020-01-23 5 3
6 CC 2020-01-23 8 2
7 AA 2020-01-24 9 4
8 BB 2020-01-24 10 5
9 CC 2020-01-24 11 3
10 AA 2020-01-25 13 4
11 BB 2020-01-25 15 5
12 CC 2020-01-25 12 1
My current strategy would be to subset the dataframe by each station and subtract with a loop; but I am not sure how to merge the additional variable back in the initial df.
I am open to use whatever package to obtain the result in a more elegant way. Thanks in advance for any suggestion.
I use the warn.conflicts argument when I am making examples because it prevents the output of several warnings about dplyr functions that mask functions from base packages. Those warnings are normal and harmless but they clutter the output. Since I did not show the output in the case, I should have removed it.