growth rate question with dplyr

hardwax · March 25, 2020, 2:22pm

Hi, I have a data table with countries dates and cases. I'd like to created a growth rate column using mutate and lag but one trouble I'm getting is the first growth rate line for a country uses the last cases number of the previous country in the series, also the 0s divided by 0s should be blank.
How can I fix this?

%>%
mutate(growth = (cases / lag(cases) -1))

Thanks

andresrcs · March 25, 2020, 2:34pm

Use group_by(country)

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

hardwax · March 25, 2020, 2:36pm

I have tried grouping by country before, it didn't work

andresrcs · March 25, 2020, 2:37pm

It should, so to been able to help you we are going to need a reprex, please read the guide I gave you and try to make one.

hardwax · March 25, 2020, 2:46pm

ok I will try to do one sorry.
I think it worked grouping by country, how do I remove Inf, NaN ?
inf is when I get 0 to 1. NaN is 0 to 0

thanks

nirgrahamuk · March 25, 2020, 2:56pm

library(tidyverse)
df <- tibble(
  a = c(1, NA, Inf, 4),
  b = c(9, NA, Inf, 8)
)
df
df <- mutate_all(df, ~ ifelse(is.na(.), 0, .)) %>%
  mutate_all(~ ifelse(is.infinite(.), -9, .))
df

hardwax · March 25, 2020, 3:38pm

cleaning the data did not work. Example below

1	Country	Code	Date	Cases	Growth
2	Afghanistan	AFG	2020-01-23	0	NA
3	Afghanistan	AFG	2020-01-24	0	NaN
4	Afghanistan	AFG	2020-01-25	0	NaN
5	Afghanistan	AFG	2020-01-26	0	NaN
6	Afghanistan	AFG	2020-01-27	0	NaN
7	Afghanistan	AFG	2020-01-28	0	NaN
8	Afghanistan	AFG	2020-01-29	0	NaN
9	Afghanistan	AFG	2020-01-30	0	NaN
10	Afghanistan	AFG	2020-01-31	0	NaN
11	Afghanistan	AFG	2020-02-01	0	NaN
12	Afghanistan	AFG	2020-02-02	0	NaN
13	Afghanistan	AFG	2020-02-03	1	Inf

dromano · March 25, 2020, 3:44pm

Hi @hardwax, a good way to start putting a reprex together so folks can help, is to at least post your data, like this:

```
<--- paste output of dput(head(your_table, 20)) here, including ```
```

When you run dput(head(your_table, 20)) you should see the output in the console, usually in the lower left pane -- that's what you'll want to paste here.

hardwax · March 25, 2020, 4:07pm

ok great !

structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

hardwax · March 25, 2020, 4:09pm

head output .... no Inf in this part but they exist in column deathgrowth

# A tibble: 20 x 6
# Groups:   country [1]
   country     iso3c date       confirmed deaths deathgrowth
   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
 1 Afghanistan AFG   2020-01-22         0      0          NA
 2 Afghanistan AFG   2020-01-23         0      0          NA
 3 Afghanistan AFG   2020-01-24         0      0         NaN
 4 Afghanistan AFG   2020-01-25         0      0         NaN
 5 Afghanistan AFG   2020-01-26         0      0         NaN
 6 Afghanistan AFG   2020-01-27         0      0         NaN
 7 Afghanistan AFG   2020-01-28         0      0         NaN
 8 Afghanistan AFG   2020-01-29         0      0         NaN
 9 Afghanistan AFG   2020-01-30         0      0         NaN
10 Afghanistan AFG   2020-01-31         0      0         NaN
11 Afghanistan AFG   2020-02-01         0      0         NaN
12 Afghanistan AFG   2020-02-02         0      0         NaN
13 Afghanistan AFG   2020-02-03         0      0         NaN
14 Afghanistan AFG   2020-02-04         0      0         NaN
15 Afghanistan AFG   2020-02-05         0      0         NaN
16 Afghanistan AFG   2020-02-06         0      0         NaN
17 Afghanistan AFG   2020-02-07         0      0         NaN
18 Afghanistan AFG   2020-02-08         0      0         NaN
19 Afghanistan AFG   2020-02-09         0      0         NaN
20 Afghanistan AFG   2020-02-10         0      0         NaN

dromano · March 25, 2020, 4:19pm

Very helpful -- thanks, @hardwax: This helps folks understand your situation better. For example, it tells us you have non-numeric columns, so you can use a tweak of @nirgrahamuk's suggestion:

library(tidyverse)
your_table <- 
structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
                                                                                                                                                18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
                                                                                                                                                18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
                                                                                                                                                18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
                                                                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
                                                                                                                                                                                                                                                                                                     NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
                                                                                                                                                                                                                                                                                                     NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
                                                                                                                                                                                                                                                                                                                                                 country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                                             -1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
### end of 'structure()' command
your_table %>% 
  # change NA's and NaN's to 0 for numeric columns
  mutate_if(is.numeric, ~ if_else(is.na(.), 0, .)) %>% 
  head()
#> `mutate_if()` ignored the following grouping variables:
#> Column `country`
#> # A tibble: 6 x 6
#> # Groups:   country [1]
#>   country     iso3c date       confirmed deaths deathgrowth
#>   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
#> 1 Afghanistan AFG   2020-01-22         0      0           0
#> 2 Afghanistan AFG   2020-01-23         0      0           0
#> 3 Afghanistan AFG   2020-01-24         0      0           0
#> 4 Afghanistan AFG   2020-01-25         0      0           0
#> 5 Afghanistan AFG   2020-01-26         0      0           0
#> 6 Afghanistan AFG   2020-01-27         0      0           0

^{Created on 2020-03-25 by the reprex package (v0.3.0)}
Also, notice my code includes the package I'm using as well as creates the table and uses it -- this makes it easy for folks to copy and paste when they're helping you, and is one step closer to the reprex @andresrcs asked for.

hardwax · March 25, 2020, 4:36pm

perfect !! thanks for the advices, will try to improve next time.

system · April 15, 2020, 4:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.