Individual return calculation for two or more years


#1

hello dear R Community
i'm learning R and cannot find the solution for the following problem.

see the following Dataframe:

year <- c(2008:2017)
SP500Prices <- c(903, 1115, 1257, 1258, 1426, 1848, 2058, 2043, 2238, 2673)
dfAnalyse <- data.frame(year,SP500Prices)

as you see i have the data from the S&P500 Index and i want to calculate the yearly returnes as followed

dfAnalyse$yearReturn <- c(NA,diff(log(SP500Prices)))
dfAnalyse$yearReturn <- round(dfAnalyse$yearReturn, 2)

that works very well, but now i want a new column with the two year return (and also for the three year return), do you have any idea how to generate these two new columns?

the result for the 2 year return should be the difference from the year 2008 to 2010, then from the year 2009 until 2011 and so on...

i guess for a pro like you it's pretty easy to get this, but i'm a beginner and hope someone can help me because i cannot find this solution in the forum or on google.. thank you very much


#2

To add returns for each time period individually, you can use the lag argument to diff. For example, the code below adds 2-year returns.

library(tidyverse)
library(zoo)

# I've added a couple more years
year = c(2008:2019)
SP500Prices = c(903, 1115, 1257, 1258, 1426, 1848, 2058, 2043, 2238, 2673, 2770, 2880)
dfAnalyse <- data.frame(year,SP500Prices)

# Using base diff function
dfAnalyse$return2 = c(NA, NA, diff(log(dfAnalyse$SP500Prices), lag=2))

# diff pads with NA automatically when run on a zoo object (the zoo package is for working 
#  with time series and running the zoo() function turns SP500Prices into a zoo object), 
#  so convert to zoo before running diff
dfAnalyse$return2 = diff(log(zoo(dfAnalyse$SP500Prices)), lag=2, na.pad=TRUE)

The R help can sometimes be cryptic for new R users, but it's good to get into the habit of checking the help to see if a function has options that do what you want. For example, typing ?diff brings up the help for diff. Scrolling down shows that there's a lag argument that has a default value of 1. Setting it to 2 gives the difference between the current value and the value two rows above.

It's a pain to add many columns individually. Instead, you can add all of the return time periods at once by iterating over each of the return periods using map from the purrr package (which is part of the tidyverse suite of packages). bind_cols adds all of these new return columns to the original data frame. (I think the code could probably be streamlined further with nesting, but I'm not sure how to make that work yet.)

# Use map function to do all lags at once
dfAnalyse = dfAnalyse %>% 
  bind_cols(
    map(1:11, function(i) {

     # Generate the name of the new column as a string
     # Use str_pad() from the stringr package (also part of tidyverse) to pad single-digit years with a left zero
      col_name = paste0("return", str_pad(i, width=2, side="left", pad="0"))

     # Calculate return for the given time period
      dfAnalyse %>% 
        mutate(!!col_name := diff(zoo(log(SP500Prices)), lag=i, na.pad=TRUE)) %>% 
        select(matches("return"))
    })
  )
dfAnalyse

   year SP500Prices      return01   return02  return03  return04  return05  return06  return07  return08  return09  return10 return11
1  2008         903            NA         NA        NA        NA        NA        NA        NA        NA        NA        NA       NA
2  2009        1115  0.2108871305         NA        NA        NA        NA        NA        NA        NA        NA        NA       NA
3  2010        1257  0.1198735247 0.33076066        NA        NA        NA        NA        NA        NA        NA        NA       NA
4  2011        1258  0.0007952287 0.12066875 0.3315559        NA        NA        NA        NA        NA        NA        NA       NA
5  2012        1426  0.1253501637 0.12614539 0.2460189 0.4569060        NA        NA        NA        NA        NA        NA       NA
6  2013        1848  0.2592306512 0.38458081 0.3853760 0.5052496 0.7161367        NA        NA        NA        NA        NA       NA
7  2014        2058  0.1076306642 0.36686132 0.4922115 0.4930067 0.6128802 0.8237674        NA        NA        NA        NA       NA
8  2015        2043 -0.0073153216 0.10031534 0.3595460 0.4848962 0.4856914 0.6055649 0.8164520        NA        NA        NA       NA
9  2016        2238  0.0911632941 0.08384797 0.1914786 0.4507093 0.5760595 0.5768547 0.6967282 0.9076153        NA        NA       NA
10 2017        2673  0.1776188273 0.26878212 0.2614668 0.3690975 0.6283281 0.7536783 0.7544735 0.8743470 1.0852342        NA       NA
11 2018        2770  0.0356458830 0.21326471 0.3044280 0.2971127 0.4047433 0.6639740 0.7893242 0.7901194 0.9099929 1.1208800       NA
12 2019        2880  0.0389429739 0.07458886 0.2522077 0.3433710 0.3360557 0.4436863 0.7029170 0.8282671 0.8290624 0.9489359 1.159823

#3

thank you very much joel for your comprehensive and quick response, that helped me a lot and made my day!

i see the zoo library is very good in this situation. no more questions, with the library's and the time series you also answered possible questions in the future :wink: happy day