I have a dataset that looks like this:
surface_data <- structure(list(year = 1995:2015, surface_area = c(0.0791523008380127,
0.097225347899475, 0.0822670001840209, 0.0336522849847411, 0.117791995169983,
0.0461216963919067, 0.130880043279724, 0.141475758216248, 0.0529759044537353,
0.105950554455872, 0.109061354341431, 0.124643054853821, 0.125276420983276,
0.105325262088928, 0.147077049232788, 0.125895508190918, 0.0735345832772827,
0.102212300140076, 0.100343940362244, 0.0760356945687866, 0.350289531413268
)), class = "data.frame", row.names = c(NA, -21L))
head(surface_data)
#> year surface_area
#> 1 1995 0.07915230
#> 2 1996 0.09722535
#> 3 1997 0.08226700
#> 4 1998 0.03365228
#> 5 1999 0.11779200
#> 6 2000 0.04612170
I'd like to add a column using mutate()
that shows the change between rows based on a specified lag (not necessarily the previous row). The idea being:
\displaystyle \frac{(current value) - (lagged value)}{mean(current value, lagged value)}*100
What I've tried unsuccessfully is this:
library(tidyverse)
surface_data %>%
mutate(surface_change = ((surface_area -
lag(x = surface_area, n = 1, order_by = year)) /
mean(c(surface_area,
lag(x = surface_area, n = 1, order_by = year)),
na.rm = T)) * 100) %>%
head()
#> year surface_area surface_change
#> 1 1995 0.07915230 NA
#> 2 1996 0.09722535 17.21608
#> 3 1997 0.08226700 -14.24907
#> 4 1998 0.03365228 -46.30956
#> 5 1999 0.11779200 80.15008
#> 6 2000 0.04612170 -68.27193
The problem is that the denominator of the above equation is constant:
surface_data %>%
mutate(numerator = surface_area -
lag(x = surface_area, n = 1, order_by = year),
denominator = mean(c(surface_area,
lag(x = surface_area, n = 1, order_by = year)),
na.rm = T),
surface_change = (numerator / denominator) * 100)%>%
head()
#> year surface_area numerator denominator surface_change
#> 1 1995 0.07915230 NA 0.1049777 NA
#> 2 1996 0.09722535 0.01807305 0.1049777 17.21608
#> 3 1997 0.08226700 -0.01495835 0.1049777 -14.24907
#> 4 1998 0.03365228 -0.04861472 0.1049777 -46.30956
#> 5 1999 0.11779200 0.08413971 0.1049777 80.15008
#> 6 2000 0.04612170 -0.07167030 0.1049777 -68.27193
Created on 2019-03-19 by the reprex package (v0.2.1)
My questions: Why doesn't mean()
behave as intended here? What is the best/simplest way to get the output below? I would like to maintain the ability to specify how many rows back to look for the lagged value. Thanks!
surface_data %>%
mutate(surface_change = ((surface_area -
lag(x = surface_area, n = 1, order_by = year)) /
((surface_area +
lag(x = surface_area, n = 1, order_by = year)) / 2)) * 100) %>%
head()
#> year surface_area surface_change
#> 1 1995 0.07915230 NA
#> 2 1996 0.09722535 20.49358
#> 3 1997 0.08226700 -16.66739
#> 4 1998 0.03365228 -83.87684
#> 5 1999 0.11779200 111.11639
#> 6 2000 0.04612170 -87.44883