growth rate question with dplyr

Hi, I have a data table with countries dates and cases. I'd like to created a growth rate column using mutate and lag but one trouble I'm getting is the first growth rate line for a country uses the last cases number of the previous country in the series, also the 0s divided by 0s should be blank.
How can I fix this?

%>%
mutate(growth = (cases / lag(cases) -1))

Thanks

Use group_by(country)

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

I have tried grouping by country before, it didn't work :frowning:

It should, so to been able to help you we are going to need a reprex, please read the guide I gave you and try to make one.

ok I will try to do one sorry.
I think it worked grouping by country, how do I remove Inf, NaN ?
inf is when I get 0 to 1. NaN is 0 to 0

thanks

library(tidyverse)
df <- tibble(
  a = c(1, NA, Inf, 4),
  b = c(9, NA, Inf, 8)
)
df
df <- mutate_all(df, ~ ifelse(is.na(.), 0, .)) %>%
  mutate_all(~ ifelse(is.infinite(.), -9, .))
df

cleaning the data did not work. Example below

1 Country Code Date 0 Cases Growth
2 Afghanistan AFG 2020-01-23 0 0 NA
3 Afghanistan AFG 2020-01-24 0 0 NaN
4 Afghanistan AFG 2020-01-25 0 0 NaN
5 Afghanistan AFG 2020-01-26 0 0 NaN
6 Afghanistan AFG 2020-01-27 0 0 NaN
7 Afghanistan AFG 2020-01-28 0 0 NaN
8 Afghanistan AFG 2020-01-29 0 0 NaN
9 Afghanistan AFG 2020-01-30 0 0 NaN
10 Afghanistan AFG 2020-01-31 0 0 NaN
11 Afghanistan AFG 2020-02-01 0 0 NaN
12 Afghanistan AFG 2020-02-02 0 0 NaN
13 Afghanistan AFG 2020-02-03 0 1 Inf

Hi @hardwax, a good way to start putting a reprex together so folks can help, is to at least post your data, like this:

```
<--- paste output of dput(head(your_table, 20)) here, including ```
```

When you run dput(head(your_table, 20)) you should see the output in the console, usually in the lower left pane -- that's what you'll want to paste here.

ok great !

structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

head output .... no Inf in this part but they exist in column deathgrowth

# A tibble: 20 x 6
# Groups:   country [1]
   country     iso3c date       confirmed deaths deathgrowth
   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
 1 Afghanistan AFG   2020-01-22         0      0          NA
 2 Afghanistan AFG   2020-01-23         0      0          NA
 3 Afghanistan AFG   2020-01-24         0      0         NaN
 4 Afghanistan AFG   2020-01-25         0      0         NaN
 5 Afghanistan AFG   2020-01-26         0      0         NaN
 6 Afghanistan AFG   2020-01-27         0      0         NaN
 7 Afghanistan AFG   2020-01-28         0      0         NaN
 8 Afghanistan AFG   2020-01-29         0      0         NaN
 9 Afghanistan AFG   2020-01-30         0      0         NaN
10 Afghanistan AFG   2020-01-31         0      0         NaN
11 Afghanistan AFG   2020-02-01         0      0         NaN
12 Afghanistan AFG   2020-02-02         0      0         NaN
13 Afghanistan AFG   2020-02-03         0      0         NaN
14 Afghanistan AFG   2020-02-04         0      0         NaN
15 Afghanistan AFG   2020-02-05         0      0         NaN
16 Afghanistan AFG   2020-02-06         0      0         NaN
17 Afghanistan AFG   2020-02-07         0      0         NaN
18 Afghanistan AFG   2020-02-08         0      0         NaN
19 Afghanistan AFG   2020-02-09         0      0         NaN
20 Afghanistan AFG   2020-02-10         0      0         NaN

Very helpful -- thanks, @hardwax: This helps folks understand your situation better. For example, it tells us you have non-numeric columns, so you can use a tweak of @nirgrahamuk's suggestion:

library(tidyverse)
your_table <- 
structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
                                                                                                                                                18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
                                                                                                                                                18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
                                                                                                                                                18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
                                                                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
                                                                                                                                                                                                                                                                                                     NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
                                                                                                                                                                                                                                                                                                     NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
                                                                                                                                                                                                                                                                                                                                                 country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                                             -1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
### end of 'structure()' command
your_table %>% 
  # change NA's and NaN's to 0 for numeric columns
  mutate_if(is.numeric, ~ if_else(is.na(.), 0, .)) %>% 
  head()
#> `mutate_if()` ignored the following grouping variables:
#> Column `country`
#> # A tibble: 6 x 6
#> # Groups:   country [1]
#>   country     iso3c date       confirmed deaths deathgrowth
#>   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
#> 1 Afghanistan AFG   2020-01-22         0      0           0
#> 2 Afghanistan AFG   2020-01-23         0      0           0
#> 3 Afghanistan AFG   2020-01-24         0      0           0
#> 4 Afghanistan AFG   2020-01-25         0      0           0
#> 5 Afghanistan AFG   2020-01-26         0      0           0
#> 6 Afghanistan AFG   2020-01-27         0      0           0

Created on 2020-03-25 by the reprex package (v0.3.0)
Also, notice my code includes the package I'm using as well as creates the table and uses it -- this makes it easy for folks to copy and paste when they're helping you, and is one step closer to the reprex @andresrcs asked for.

perfect !! thanks for the advices, will try to improve next time.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.