So, I've got a dataset which has a value pertaining for each year, as shown below.
I want to calculate the growth rate, for each year, using the dplyr package, and then calculate the average growth rate towards the end. Any idea how to go about it ?
I have read that we need to use the for loop in this, but I am not sure on how to go about it. Any help will be appreciated !
You can use the lag() / lead() function in dplyr, that takes the entry in the previous or next row in the dataset!
let's assume you stored your data in the dataframe called growth you can do the following:
growth_rate = growth %>%
# first sort by year
arrange(year) %>%
mutate(Diff_year = year - lag(year), # Difference in time (just in case there are gaps)
Diff_growth = route - lag(route), # Difference in route between years
Rate_percent = (Diff_growth / Diff_year)/route * 100) # growth rate in percent
Giving (used some random data as you didn't supplied the data in a way we could use it:
While the lag / lead approach will give you a good result you can also consider a slightly more mathy approach.
Assuming your growth is exponential you consider the formula y = a * (1 + r) ^ x which can be solved via nonlinear least squares = stats::nls()
What approach is more appropriate would depend on your application; when calculating average bear in mind you are comparing rates, so geometric mean might be more appropriate than arithmetic.
Any idea how do I avoid the complex numbers as the output below ?
This is the code that I have written to get the necessary output. It seems to work fine for all age groups under the level_2 column other than this one. (there were more age groups, but since RStudio prohibits the sharing of csv files, I have to show you this screenshot)