growth rate question with dplyr

Hi, I have a data table with countries dates and cases. I'd like to created a growth rate column using mutate and lag but one trouble I'm getting is the first growth rate line for a country uses the last cases number of the previous country in the series, also the 0s divided by 0s should be blank.
How can I fix this?

mutate(growth = (cases / lag(cases) -1))


Use group_by(country)

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

I have tried grouping by country before, it didn't work :frowning:

It should, so to been able to help you we are going to need a reprex, please read the guide I gave you and try to make one.

ok I will try to do one sorry.
I think it worked grouping by country, how do I remove Inf, NaN ?
inf is when I get 0 to 1. NaN is 0 to 0


df <- tibble(
  a = c(1, NA, Inf, 4),
  b = c(9, NA, Inf, 8)
df <- mutate_all(df, ~ ifelse(, 0, .)) %>%
  mutate_all(~ ifelse(is.infinite(.), -9, .))

cleaning the data did not work. Example below

1 Country Code Date 0 Cases Growth
2 Afghanistan AFG 2020-01-23 0 0 NA
3 Afghanistan AFG 2020-01-24 0 0 NaN
4 Afghanistan AFG 2020-01-25 0 0 NaN
5 Afghanistan AFG 2020-01-26 0 0 NaN
6 Afghanistan AFG 2020-01-27 0 0 NaN
7 Afghanistan AFG 2020-01-28 0 0 NaN
8 Afghanistan AFG 2020-01-29 0 0 NaN
9 Afghanistan AFG 2020-01-30 0 0 NaN
10 Afghanistan AFG 2020-01-31 0 0 NaN
11 Afghanistan AFG 2020-02-01 0 0 NaN
12 Afghanistan AFG 2020-02-02 0 0 NaN
13 Afghanistan AFG 2020-02-03 0 1 Inf

Hi @hardwax, a good way to start putting a reprex together so folks can help, is to at least post your data, like this:

<--- paste output of dput(head(your_table, 20)) here, including ```

When you run dput(head(your_table, 20)) you should see the output in the console, usually in the lower left pane -- that's what you'll want to paste here.

ok great !

structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

head output .... no Inf in this part but they exist in column deathgrowth

# A tibble: 20 x 6
# Groups:   country [1]
   country     iso3c date       confirmed deaths deathgrowth
   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
 1 Afghanistan AFG   2020-01-22         0      0          NA
 2 Afghanistan AFG   2020-01-23         0      0          NA
 3 Afghanistan AFG   2020-01-24         0      0         NaN
 4 Afghanistan AFG   2020-01-25         0      0         NaN
 5 Afghanistan AFG   2020-01-26         0      0         NaN
 6 Afghanistan AFG   2020-01-27         0      0         NaN
 7 Afghanistan AFG   2020-01-28         0      0         NaN
 8 Afghanistan AFG   2020-01-29         0      0         NaN
 9 Afghanistan AFG   2020-01-30         0      0         NaN
10 Afghanistan AFG   2020-01-31         0      0         NaN
11 Afghanistan AFG   2020-02-01         0      0         NaN
12 Afghanistan AFG   2020-02-02         0      0         NaN
13 Afghanistan AFG   2020-02-03         0      0         NaN
14 Afghanistan AFG   2020-02-04         0      0         NaN
15 Afghanistan AFG   2020-02-05         0      0         NaN
16 Afghanistan AFG   2020-02-06         0      0         NaN
17 Afghanistan AFG   2020-02-07         0      0         NaN
18 Afghanistan AFG   2020-02-08         0      0         NaN
19 Afghanistan AFG   2020-02-09         0      0         NaN
20 Afghanistan AFG   2020-02-10         0      0         NaN

Very helpful -- thanks, @hardwax: This helps folks understand your situation better. For example, it tells us you have non-numeric columns, so you can use a tweak of @nirgrahamuk's suggestion:

your_table <- 
structure(list(country = c("Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
                           "Afghanistan", "Afghanistan"), iso3c = c("AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
                                                                    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"), date = structure(c(18283, 
                                                                                                                                                18284, 18285, 18286, 18287, 18288, 18289, 18290, 18291, 18292, 
                                                                                                                                                18293, 18294, 18295, 18296, 18297, 18298, 18299, 18300, 18301, 
                                                                                                                                                18302), class = "Date"), confirmed = c(0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deaths = c(0, 0, 0, 0, 0, 
                                                                                                                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), deathgrowth = c(NA, 
                                                                                                                                                                                                                                                                                                     NA, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
                                                                                                                                                                                                                                                                                                     NaN, NaN, NaN, NaN, NaN, NaN)), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
                                                                                                                                                                                                                                                                                                                                                 country = "Afghanistan", .rows = list(1:20)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                                             -1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
### end of 'structure()' command
your_table %>% 
  # change NA's and NaN's to 0 for numeric columns
  mutate_if(is.numeric, ~ if_else(, 0, .)) %>% 
#> `mutate_if()` ignored the following grouping variables:
#> Column `country`
#> # A tibble: 6 x 6
#> # Groups:   country [1]
#>   country     iso3c date       confirmed deaths deathgrowth
#>   <chr>       <chr> <date>         <dbl>  <dbl>       <dbl>
#> 1 Afghanistan AFG   2020-01-22         0      0           0
#> 2 Afghanistan AFG   2020-01-23         0      0           0
#> 3 Afghanistan AFG   2020-01-24         0      0           0
#> 4 Afghanistan AFG   2020-01-25         0      0           0
#> 5 Afghanistan AFG   2020-01-26         0      0           0
#> 6 Afghanistan AFG   2020-01-27         0      0           0

Created on 2020-03-25 by the reprex package (v0.3.0)
Also, notice my code includes the package I'm using as well as creates the table and uses it -- this makes it easy for folks to copy and paste when they're helping you, and is one step closer to the reprex @andresrcs asked for.

perfect !! thanks for the advices, will try to improve next time.

