Percent Changes in R

I made this data frame in R:

library(dplyr)

year = c("2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020")

var = rnorm(11,100, 11)

df = data.frame(year, var)

I would like to create a new variable that looks in the percent change (in "var") between each pair of consecutive rows. I tried the following code

df %>%
  arrange(year) %>%
  group_by(year) %>%
  mutate(pct_change = (lag(var) - var)/var * 100)

This keeps giving me an NA result:

# A tibble: 11 x 3
# Groups:   year [11]
   year    var pct_change
   <chr> <dbl>      <dbl>
 1 2010  111.          NA
 2 2011   99.6         NA
 3 2012  102.          NA
 4 2013   87.2         NA
 5 2014   99.3         NA
 6 2015  103.          NA
 7 2016  110.          NA
 8 2017  109.          NA
 9 2018   79.8         NA
10 2019  113.          NA
11 2020  123.          NA

Could someone please show me how to resolve this problem?

E.g. (99.6 - 11/ 111)*100 , (102 - 99.6/99.6)*100, etc.

Thank you!

Hey,

you can either use data.table::shift() or, easier, just collapse::fgrowth() (and I think there is a tidyverse equivalent as well, but those two just came two my mind).

library(collapse)

df |> fmutate(growth = fgrowth(var))
#>    year       var     growth
#> 1  2010 100.12044         NA
#> 2  2011  88.30013 -11.806085
#> 3  2012  99.18907  12.331730
#> 4  2013  87.75803 -11.524495
#> 5  2014  93.92577   7.028123
#> 6  2015  91.69077  -2.379541
#> 7  2016  98.62826   7.566185
#> 8  2017  95.55181  -3.119235
#> 9  2018 100.72837   5.417541
#> 10 2019  99.00040  -1.715478
#> 11 2020 101.40930   2.433226

library(data.table)
df |> fmutate(growth = (var - shift(var))/shift(var) * 100)
#>    year       var     growth
#> 1  2010 100.12044         NA
#> 2  2011  88.30013 -11.806085
#> 3  2012  99.18907  12.331730
#> 4  2013  87.75803 -11.524495
#> 5  2014  93.92577   7.028123
#> 6  2015  91.69077  -2.379541
#> 7  2016  98.62826   7.566185
#> 8  2017  95.55181  -3.119235
#> 9  2018 100.72837   5.417541
#> 10 2019  99.00040  -1.715478
#> 11 2020 101.40930   2.433226

library(dplyr)

df |> mutate(growth = (var - dplyr::lag(var))/dplyr::lag(var) * 100)
#>    year       var     growth
#> 1  2010 100.12044         NA
#> 2  2011  88.30013 -11.806085
#> 3  2012  99.18907  12.331730
#> 4  2013  87.75803 -11.524495
#> 5  2014  93.92577   7.028123
#> 6  2015  91.69077  -2.379541
#> 7  2016  98.62826   7.566185
#> 8  2017  95.55181  -3.119235
#> 9  2018 100.72837   5.417541
#> 10 2019  99.00040  -1.715478
#> 11 2020 101.40930   2.433226

Created on 2022-08-24 by the reprex package (v2.0.1)

Kind regards

Edit:

Looking at your code, if you only have one entry per year, your main problem is the grouping (since if you group the df, dplyr will try to look inbetween years to compute the growth rate, e.g. the percentage change between val1 and val2 in one year. Added the dplyr solution above.

If you group by year but there is only one value for each year then there are no lagged values in the groups. Just drop the group_by(). The first value is NA because there is no lagged value for the first year. Given that the years are already in chronological order, there is also no need to arrange them.

I also changed the percentage calculation to (new - old/old ) from (old - new/new).

library(dplyr)

year = c("2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020")
var = rnorm(11,100, 11)
df = data.frame(year, var)

df %>%
  arrange(year) %>%
  mutate(pct_change = (var - lag(var))/lag(var) * 100)
#>    year       var pct_change
#> 1  2010  98.71342         NA
#> 2  2011 113.27372  14.750068
#> 3  2012 115.20882   1.708344
#> 4  2013  99.20715 -13.889272
#> 5  2014 100.50832   1.311566
#> 6  2015  95.05203  -5.428698
#> 7  2016 103.34003   8.719436
#> 8  2017 119.17665  15.324772
#> 9  2018  99.58940 -16.435476
#> 10 2019 104.67725   5.108821
#> 11 2020 108.33521   3.494514

Created on 2022-08-23 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.