Dplyr:: mutate( =cumsum()).. doesn't work

GusMono · July 4, 2020, 10:58pm

I got a doubt, dyplr mutate( =cumsum()) isn't working.
I have this data.frame

I wanna have a cumulative by "carga_provincia_nombre" and "fecha_apertura", so when I run this:

aux2<- as.data.frame(aux %>% arrange(carga_provincia_nombre,fecha_apertura) %>% group_by(carga_provincia_nombre,fecha_apertura) %>% mutate(cum_cases=cumsum(Total_adults)))

I get this: As you can see, it doesn't create a cumulative by "fecha_apertura", it continues repeating same "fecha_apertura" many times by "carga_provincia_nombre"

head(aux2 %>% arrange(carga_provincia_nombre) %>% filter(carga_provincia_nombre!="Buenos Aires",carga_provincia_nombre!="CABA",fecha_apertura>="2020-05-01",fecha_apertura<="2020-05-31"),400)
fecha_apertura carga_provincia_nombre Total_adults cum_cases
1 2020-05-01 Chaco 1 1
2 2020-05-01 Chaco 1 2
3 2020-05-01 Chaco 1 3
4 2020-05-02 Chaco 1 1
5 2020-05-02 Chaco 1 2
6 2020-05-02 Chaco 1 3
7 2020-05-02 Chaco 1 4
8 2020-05-02 Chaco 1 5
9 2020-05-02 Chaco 1 6
10 2020-05-02 Chaco 1 7
11 2020-05-02 Chaco 1 8
12 2020-05-02 Chaco 1 9
13 2020-05-03 Chaco 1 1
14 2020-05-03 Chaco 1 2
15 2020-05-03 Chaco 1 3
16 2020-05-03 Chaco 1 4
17 2020-05-03 Chaco 1 5
18 2020-05-03 Chaco 1 6
19 2020-05-03 Chaco 1 7
20 2020-05-03 Chaco 1 8
21 2020-05-03 Chaco 1 9
22 2020-05-03 Chaco 1 10
23 2020-05-03 Chaco 1 11
24 2020-05-04 Chaco 1 1

Don't know why it doesn't accumulate and show just one line per "carga_provincia_nombre" & "fecha_apertura" with a accumulated field "cum_cases"

I am not using plyr, just using dyplr.
Hope someone would help me with this.

nirgrahamuk · July 4, 2020, 11:13pm

I think you are confusing cumulative sums with sums.
sums sum up a series of numbers and give a single result
the sum of 1+2+3 is 6
the cumsum of 1,2,3 is 1,3,6

Returns a vector whose elements are the cumulative sums, products, minima or maxima of the elements of the argument.

dromano · July 4, 2020, 11:15pm

Hi @GusMono: Could you give, say, a small table with maybe five rows, two provinces, and show us what table looks like before you do anything to it, and then show what your ideal result table would look like? That would help folks understand better what you trying to do.

GusMono · July 5, 2020, 2:47pm

Hi David, this aux data.frame is a subset that extract dead adults by day. Since the source of it have much more additional attributes I only extracte this attributes but I some times I get more than one per day.

Source #1
fecha carga_provincia_nombre Total_deaths
3 2020-03-25 Chaco 1
4 2020-03-25 Chaco 1
26 2020-05-15 Chaco 1
27 2020-05-15 Chaco 1
28 2020-05-15 Chaco 1
136 2020-06-17 Río Negro 1
137 2020-06-17 Río Negro 1
138 2020-06-17 Río Negro 1
139 2020-06-18 Río Negro 1
140 2020-06-19 Río Negro 1
141 2020-06-20 Río Negro 1
142 2020-06-21 Río Negro 1
143 2020-06-21 Río Negro 1

taking the example above I want a data.frame having the following results.

result #1
fecha carga_provincia_nombre Total_deaths
2020-03-25 Chaco 2
2020-05-15 Chaco 3
2020-06-17 Río Negro 3
2020-06-18 Río Negro 1
2020-06-19 Río Negro 1
2020-06-20 Río Negro 1
2020-06-21 Río Negro 2
As you can see a total Grouped by "fecha" & "carga_provincia_nombre"

Using the group_by I got the following:

 fecha        carga_provincia_nombre Total_deaths

3 2020-03-25 Chaco 1
4 2020-03-25 Chaco 2
26 2020-05-15 Chaco 1
27 2020-05-15 Chaco 2
28 2020-05-15 Chaco 3
136 2020-06-17 Río Negro 1
137 2020-06-17 Río Negro 2
138 2020-06-17 Río Negro 3
139 2020-06-18 Río Negro 1
140 2020-06-19 Río Negro 1
141 2020-06-20 Río Negro 1
142 2020-06-21 Río Negro 1
143 2020-06-21 Río Negro 2

This is a sample only 2 provinces and few days.
I am newbie using this community and I don't know if there is a better way to share this data.frame with you. (in case yes, pls let me know).

aux %>% filter(carga_provincia_nombre=="Chaco" | carga_provincia_nombre=="Río Negro")
         fecha carga_provincia_nombre Total_deaths
1   2020-03-13                  Chaco            1
2   2020-03-24                  Chaco            1
3   2020-03-25                  Chaco            1
4   2020-03-25                  Chaco            1
5   2020-03-31                  Chaco            1
6   2020-04-01                  Chaco            1
7   2020-04-02                  Chaco            1
8   2020-04-08                  Chaco            1
9   2020-04-13                  Chaco            1
10  2020-04-14                  Chaco            1
11  2020-04-19                  Chaco            1
12  2020-04-24                  Chaco            1
13  2020-04-27                  Chaco            1
14  2020-04-28                  Chaco            1
15  2020-05-01                  Chaco            1
16  2020-05-01                  Chaco            1
17  2020-05-01                  Chaco            1
18  2020-05-02                  Chaco            1
19  2020-05-02                  Chaco            1
20  2020-05-05                  Chaco            1
21  2020-05-10                  Chaco            1
22  2020-05-11                  Chaco            1
23  2020-05-11                  Chaco            1
24  2020-05-13                  Chaco            1
25  2020-05-14                  Chaco            1
26  2020-05-15                  Chaco            1
27  2020-05-15                  Chaco            1
28  2020-05-15                  Chaco            1
29  2020-05-16                  Chaco            1
30  2020-05-16                  Chaco            1
31  2020-05-17                  Chaco            1
32  2020-05-18                  Chaco            1
33  2020-05-18                  Chaco            1
34  2020-05-18                  Chaco            1
35  2020-05-18                  Chaco            1
36  2020-05-19                  Chaco            1
37  2020-05-20                  Chaco            1
38  2020-05-20                  Chaco            1
39  2020-05-22                  Chaco            1
40  2020-05-22                  Chaco            1
41  2020-05-22                  Chaco            1
42  2020-05-23                  Chaco            1
43  2020-05-24                  Chaco            1
44  2020-05-25                  Chaco            1
45  2020-05-25                  Chaco            1
46  2020-05-26                  Chaco            1
47  2020-05-27                  Chaco            1
48  2020-05-28                  Chaco            1
49  2020-05-28                  Chaco            1
50  2020-05-28                  Chaco            1
51  2020-05-28                  Chaco            1
52  2020-05-29                  Chaco            1
53  2020-05-29                  Chaco            1
54  2020-05-31                  Chaco            1
55  2020-05-31                  Chaco            1
56  2020-06-01                  Chaco            1
57  2020-06-01                  Chaco            1
58  2020-06-01                  Chaco            1
59  2020-06-03                  Chaco            1
60  2020-06-04                  Chaco            1
61  2020-06-04                  Chaco            1
62  2020-06-05                  Chaco            1
63  2020-06-06                  Chaco            1
64  2020-06-07                  Chaco            1
65  2020-06-08                  Chaco            1
66  2020-06-10                  Chaco            1
67  2020-06-10                  Chaco            1
68  2020-06-11                  Chaco            1
69  2020-06-12                  Chaco            1
70  2020-06-12                  Chaco            1
71  2020-06-12                  Chaco            1
72  2020-06-12                  Chaco            1
73  2020-06-14                  Chaco            1
74  2020-06-14                  Chaco            1
75  2020-06-14                  Chaco            1
76  2020-06-14                  Chaco            1
77  2020-06-14                  Chaco            1
78  2020-06-17                  Chaco            1
79  2020-06-17                  Chaco            1
80  2020-06-18                  Chaco            1
81  2020-06-18                  Chaco            1
82  2020-06-19                  Chaco            1
83  2020-06-19                  Chaco            1
84  2020-06-19                  Chaco            1
85  2020-06-19                  Chaco            1
86  2020-06-19                  Chaco            1
87  2020-06-19                  Chaco            1
88  2020-06-19                  Chaco            1
89  2020-06-19                  Chaco            1
90  2020-06-21                  Chaco            1
91  2020-06-22                  Chaco            1
92  2020-06-22                  Chaco            1
93  2020-06-23                  Chaco            1
94  2020-06-23                  Chaco            1
95  2020-06-25                  Chaco            1
96  2020-06-25                  Chaco            1
97  2020-06-26                  Chaco            1
98  2020-06-27                  Chaco            1
99  2020-06-27                  Chaco            1
100 2020-06-28                  Chaco            1
101 2020-06-29                  Chaco            1
102 2020-06-29                  Chaco            1
103 2020-07-02                  Chaco            1
104 2020-07-03                  Chaco            1
105 2020-07-03                  Chaco            1
106 2020-07-03                  Chaco            1
107 2020-07-04                  Chaco            1
108 2020-04-09              Río Negro            1
109 2020-04-13              Río Negro            1
110 2020-04-19              Río Negro            1
111 2020-04-20              Río Negro            1
112 2020-04-20              Río Negro            1
113 2020-04-25              Río Negro            1
114 2020-04-29              Río Negro            1
115 2020-05-03              Río Negro            1
116 2020-05-06              Río Negro            1
117 2020-05-10              Río Negro            1
118 2020-05-13              Río Negro            1
119 2020-05-16              Río Negro            1
120 2020-05-19              Río Negro            1
121 2020-05-21              Río Negro            1
122 2020-05-22              Río Negro            1
123 2020-05-22              Río Negro            1
124 2020-05-25              Río Negro            1
125 2020-06-03              Río Negro            1
126 2020-06-03              Río Negro            1
127 2020-06-06              Río Negro            1
128 2020-06-07              Río Negro            1
129 2020-06-08              Río Negro            1
130 2020-06-08              Río Negro            1
131 2020-06-10              Río Negro            1
132 2020-06-11              Río Negro            1
133 2020-06-12              Río Negro            1
134 2020-06-13              Río Negro            1
135 2020-06-15              Río Negro            1
136 2020-06-17              Río Negro            1
137 2020-06-17              Río Negro            1
138 2020-06-17              Río Negro            1
139 2020-06-18              Río Negro            1
140 2020-06-19              Río Negro            1
141 2020-06-20              Río Negro            1
142 2020-06-21              Río Negro            1
143 2020-06-21              Río Negro            1
144 2020-06-22              Río Negro            1
145 2020-06-22              Río Negro            1
146 2020-06-23              Río Negro            1
147 2020-06-24              Río Negro            1
>

dromano · July 5, 2020, 3:30pm

Thanks, @GusMono, and I'm happy to give you suggestions on how to post questions here. From your ideal output, @nirgrahamuk's observation points to the issue -- the function cumsum() returns a vector of the same length as the input:

a <- 1:4
a
#> [1] 1 2 3 4
cumsum(a)
#> [1]  1  3  6 10

where what you want is a single number (the last one produced by cumsum():

a
#> [1] 1 2 3 4
sum(a)
#> [1] 10

To share the 'Source #1' table, if its name is, say, source1, then you would run the command

dput(source1)

and then copy and paste the output (from the console) into your post, but place it between a pair of triple backticks (```), like this:

```
# <-- place output of dput(source1) here
```

That allows others to copy and recreate your data on their own machines so they can help more easily. In your case, this is what you should get:

structure(list(fecha = structure(c(18346, 18346, 18397, 18397, 
18397, 18430, 18430, 18430, 18431, 18432, 18433, 18434, 18434
), class = "Date"), carga_provincia_nombre = c("Chaco", "Chaco", 
"Chaco", "Chaco", "Chaco", "Río Negro", "Río Negro", "Río Negro", 
"Río Negro", "Río Negro", "Río Negro", "Río Negro", "Río Negro"
), Total_deaths = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -13L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_double", 
    "collector")), fecha = structure(list(format = ""), class = c("collector_date", 
    "collector")), carga_provincia_nombre = structure(list(), class = c("collector_character", 
    "collector")), Total_deaths = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

If you copy and paste this into your own machine and execute, you'll see it recreates source1.

In full, you'd want your post to look like this, with all the commands you need included:

library(tidyverse)
source1 <- 
structure(list(fecha = structure(c(18346, 18346, 18397, 18397, 
                                   18397, 18430, 18430, 18430, 18431, 18432, 18433, 18434, 18434
), class = "Date"), carga_provincia_nombre = c("Chaco", "Chaco", 
                                               "Chaco", "Chaco", "Chaco", "Río Negro", "Río Negro", "Río Negro", 
                                               "Río Negro", "Río Negro", "Río Negro", "Río Negro", "Río Negro"
), Total_deaths = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), class = c("spec_tbl_df", 
                                                                       "tbl_df", "tbl", "data.frame"), row.names = c(NA, -13L), spec = structure(list(
                                                                         cols = list(id = structure(list(), class = c("collector_double", 
                                                                                                                      "collector")), fecha = structure(list(format = ""), class = c("collector_date", 
                                                                                                                                                                                    "collector")), carga_provincia_nombre = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                        "collector")), Total_deaths = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                  "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                        "collector")), skip = 1), class = "col_spec"))
source1
#> # A tibble: 13 x 3
#>    fecha      carga_provincia_nombre Total_deaths
#>    <date>     <chr>                         <dbl>
#>  1 2020-03-25 Chaco                             1
#>  2 2020-03-25 Chaco                             1
#>  3 2020-05-15 Chaco                             1
#>  4 2020-05-15 Chaco                             1
#>  5 2020-05-15 Chaco                             1
#>  6 2020-06-17 Río Negro                         1
#>  7 2020-06-17 Río Negro                         1
#>  8 2020-06-17 Río Negro                         1
#>  9 2020-06-18 Río Negro                         1
#> 10 2020-06-19 Río Negro                         1
#> 11 2020-06-20 Río Negro                         1
#> 12 2020-06-21 Río Negro                         1
#> 13 2020-06-21 Río Negro                         1
source1 %>% 
  group_by(fecha, carga_provincia_nombre) %>% 
  summarise(cum_cases = sum(Total_deaths))
#> # A tibble: 7 x 3
#> # Groups:   fecha [7]
#>   fecha      carga_provincia_nombre cum_cases
#>   <date>     <chr>                      <dbl>
#> 1 2020-03-25 Chaco                          2
#> 2 2020-05-15 Chaco                          3
#> 3 2020-06-17 Río Negro                      3
#> 4 2020-06-18 Río Negro                      1
#> 5 2020-06-19 Río Negro                      1
#> 6 2020-06-20 Río Negro                      1
#> 7 2020-06-21 Río Negro                      2

^{Created on 2020-07-05 by the reprex package (v0.3.0)}
I hope this helps!

GusMono · July 11, 2020, 11:03pm

Thanks @dromano it worked perfectly well.

Thanks very much.
Gustavo

system · August 1, 2020, 11:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.