macro momentum analysis (need help with r code)

polymath548 · January 17, 2020, 2:39pm

Hi, I have 2 data frames which are as follows

df_macro <- data.frame(
  us_gdp = runif(12),
  uk_gdp = c(NA,runif(11)),
  jp_gdp = c(NA,NA,runif(10)),
  us_inf = runif(12),
  uk_inf = c(NA,NA,NA,runif(9)),
  jp_inf = c(NA,runif(11)),
  us_monetary_policy = runif(12),
  uk_monetary_policy = runif(12),
  jp_monetary_policy = runif(12),
  row.names = c("Jan-78","Feb-78","Mar-78", "Apr-78", "May-78", "June-78", "July-78", "Aug-78", "Sep-78",
                  "Oct-78","Nov-78", "Dec-78"))

df_asset <- data.frame(
  us_equity = runif(12),
  uk_equity = runif(12),
  jp_equity = runif(12),
  us_bond = runif(12),
  uk_bond = runif(12),
  jp_bond = runif(12),
  row.names = c("Jan-78","Feb-78","Mar-78", "Apr-78", "May-78", "June-78", "July-78", "Aug-78", "Sep-78",
                "Oct-78","Nov-78", "Dec-78")
  )
equity_data <- df_asset[,stringr::str_detect(names(df_asset), 'equity')]
gdp_data <- df_macro[,stringr::str_detect(names(df_macro), 'gdp')]

Now, I want to take monthly change of the variables in df_macro and then rank them accordingly. Once I do this, I need to find out the top and bottom country for each variable i.e. gdp, inflation and monetary policy.

After doing this, I need to extract values from the df_asset class. Here first I subset equity data and then for each month based on the rank of the country, I need to extract the top and bottom ranked countries' equity data. This I have to do for all macro themes and asset classes. i.e. For GDP, I have to get data for equity and bond separately, for inflation this has to be done separately etc.

I started writing a function which takes macro_theme and asset_class data and then it gives me the output which is a matrix containing the top and bottom asset class values for each month. This will not have any column names as the top and bottom ranked countries in each month are different. This is where I am stuck, I am providing you the code below

signal <- function(macro_theme = NULL, asset_class = NULL){
  monthly_return <- diff(log(asset_class),1)*100
  #calculate annual change in macro variable
  macro_theme_pct <- zoo(matrix(ncol = ncol(macro_theme), nrow=nrow(macro_theme)), index(macro_theme))
  #macro_theme_pct <- as.data.frame(macro_theme_pct)
  colnames(macro_theme_pct) <- colnames(macro_theme)
  for (i in 1:ncol(macro_theme_pct)){
    macro_theme_pct[,i] <- Delt(macro_theme[,i], k=1, type="arithmetic")*100
  }
  macro_theme_pct <- na.omit(macro_theme_pct)
  
  #ranking countries basd on macro indicators
  rank_macro <- zoo(data.frame(macro_theme_pct, t(apply(-macro_theme_pct, 1, rank, ties.method='min'))), index(macro_theme_pct))
  rank_macro <- rank_macro[,ncol(macro_theme_pct)+1:ncol(rank_macro)]
  scaled_rank <- (rank_macro-apply(rank_macro, 1, mean))/apply(rank_macro, 1, sd)
  
  column_names_long <- zoo(matrix(ncol = 3, nrow=nrow(rank_macro)), index(rank_macro))
  
  for(i in 1:nrow(rank_macro)){
    for(j in 1:ncol(rank_macro)){
      if(rank_macro[i,j] == 1){
        column_names[i,1] <- colnames(rank_macro)[j]
      }else if(rank_macro[i,j] == 2){
        column_names[i,2] <- colnames(rank_macro)[j]
      } 
    }
  }
 
  st_date <- index(rank_macro)[1]
  end_date <- index(rank_macro)[nrow(rank_macro)]
  equity_data_r <- window(equity_data, index. = index(equity_data), st_date, end_date)
  
  equity_final <- zoo(matrix(ncol = 2, nrow=nrow(equity_data_r)), index(equity_data_r))
  
  for(i in nrow(equity_data_r)){
    for(j in ncol(equity_data_r)){
      ss <- substr(colnames(equity_data_r), 1, 3)
      tt <- substr(column_names_long[i,j],1,3)
      a <- ss %in% tt
      equity_final[i,j] = equity_data_r[i, a]
    }
  }
}

Once I have the column names of the top and bottom period, How should I go about getting the values from equity data.

If this is too confusing (which I belive is the case) then Please let me know I will try and explain more clearly.

Any help would be greatly appreciated. If you are too busy to write the code, please drop me a hint on how to proceed.
Thanks.

technocrat · January 18, 2020, 5:52am

The hang-up is what I call the message board equivalent of the zen of lazy evaluation. A reproducible example, called a reprex sets the table for diagnosis and suggestions much better than a sort-of meta-description.

You're almost there. It should look like

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(stringr)) 
df_macro <- data.frame(
  us_gdp = runif(12),
  uk_gdp = c(NA,runif(11)),
  jp_gdp = c(NA,NA,runif(10)),
  us_inf = runif(12),
  uk_inf = c(NA,NA,NA,runif(9)),
  jp_inf = c(NA,runif(11)),
  us_monetary_policy = runif(12),
  uk_monetary_policy = runif(12),
  jp_monetary_policy = runif(12),
  row.names = c("Jan-78","Feb-78","Mar-78", "Apr-78", "May-78", "June-78", "July-78", "Aug-78", "Sep-78",
                  "Oct-78","Nov-78", "Dec-78"))

df_asset <- data.frame(
  us_equity = runif(12),
  uk_equity = runif(12),
  jp_equity = runif(12),
  us_bond = runif(12),
  uk_bond = runif(12),
  jp_bond = runif(12),
  row.names = c("Jan-78","Feb-78","Mar-78", "Apr-78", "May-78", "June-78", "July-78", "Aug-78", "Sep-78",
                "Oct-78","Nov-78", "Dec-78")
  )
equity_data <- df_asset[,stringr::str_detect(names(df_asset), 'equity')]
gdp_data <- df_macro[,stringr::str_detect(names(df_asset), 'gdp')]
gdp_data
#> data frame with 0 columns and 12 rows

^{Created on 2020-01-17 by the reprex package (v0.3.0)}

What this shows is that stringi::detect() was the wrong tool.

So, back to you. After reviewing help(stringi), what is a better function/argument combination to produce a populated tibble?

polymath548 · January 18, 2020, 6:24am

Hi, thank you for your response. I am new to programming, so I am trying to understand what you said. From what I understand, I was referring to the wrong data frame while creating gdp_data subset from the main df which I have corrected below.

gdp_data <- df_macro[,stringr::str_detect(names(df_asset), 'gdp')]
gdp_data <- df_macro[,stringr::str_detect(names(df_macro), 'gdp')]

Please let me know if this is wrong. Thanks.

technocrat · January 18, 2020, 6:33am

Everyone in this community, from Hadley Wickham on down is new to some aspect of programming or R. A Jedi mind trick is to think of R as school algebra writ large, f(x) = y. Most of using R consists of finding the right f and x and understanding the returned output.

To your question. Both df_macro and df_asset are in your namespace (i.e., available for processing). The issue is that stringi::detect returns 1/0, TRUE/FALSE, YES/NO. Is that what you want?

polymath548 · January 18, 2020, 6:43am

Everyone in this community, from Hadley on down is new to some aspect of programming or R . A Jedi mind trick is to think of R as school algebra writ large, f(x)=y . Most of using R consists of finding the right f and x and understanding the returned output.

Thank you, I will keep this in mind

To your question. Both df_macro and df_asset are in your namespace ( i.e. , available for processing). The issue is that stringi::detect returns 1/0, TRUE/FALSE, YES/NO. Is that what you want?

Yes, since the following code which uses stringr::detect actually gives the subset which is required

gdp_data <- df_macro[,stringr::str_detect(names(df_macro), 'gdp')]

I am actually facing the issue in the next block of code which was given above.

technocrat · January 18, 2020, 7:18am

Questions are actually the hardest part of data science, including R.

What I see (now) is

> gdp_data
            us_gdp    uk_gdp       jp_gdp
Jan-78  0.97328264        NA           NA
Feb-78  0.09641292 0.3180335           NA
Mar-78  0.85725676 0.9046612 1.164706e-01
Apr-78  0.03054817 0.9494336 2.877551e-01
May-78  0.30909399 0.4673599 5.476176e-01
June-78 0.19031507 0.2318318 5.746913e-01
July-78 0.03468993 0.8957502 2.809416e-01
Aug-78  0.76092711 0.8079632 6.372455e-01
Sep-78  0.31168507 0.9902383 9.393398e-01
Oct-78  0.12019427 0.2587291 9.477744e-05
Nov-78  0.50595548 0.2994246 1.499910e-01
Dec-78  0.69014462 0.3529472 3.860645e-01

What do you want it to be?

polymath548 · January 18, 2020, 7:41am

Now, I want to take monthly change of the variables in df_macro and then rank them accordingly. Once I do this, I need to find out the top and bottom country for each variable i.e. gdp, inflation and monetary policy.

After doing this, I need to extract values from the df_asset class. Here first I subset equity data and then for each month based on the rank of the country, I need to extract the top and bottom-ranked countries' equity data. This I have to do for all macro themes and asset classes. i.e. For GDP, I have to get data for equity and bond separately, for inflation, this has to be done separately etc.

I started writing a function which takes macro_theme and asset_class data and then it gives me the output which is a matrix containing the top and bottom asset class values for each month. This will not have any column names as the top and bottom-ranked countries in each month are different. This is where I am stuck, I am providing you with the code below

signal <- function(macro_theme = NULL, asset_class = NULL){
  monthly_return <- diff(log(asset_class),1)*100
  #calculate annual change in macro variable
  macro_theme_pct <- zoo(matrix(ncol = ncol(macro_theme), nrow=nrow(macro_theme)), index(macro_theme))
  #macro_theme_pct <- as.data.frame(macro_theme_pct)
  colnames(macro_theme_pct) <- colnames(macro_theme)
  for (i in 1:ncol(macro_theme_pct)){
    macro_theme_pct[,i] <- Delt(macro_theme[,i], k=1, type="arithmetic")*100
  }
  macro_theme_pct <- na.omit(macro_theme_pct)

  #ranking countries basd on macro indicators
  rank_macro <- zoo(data.frame(macro_theme_pct, t(apply(-macro_theme_pct, 1, rank, ties.method='min'))), index(macro_theme_pct))
  rank_macro <- rank_macro[,ncol(macro_theme_pct)+1:ncol(rank_macro)]
  scaled_rank <- (rank_macro-apply(rank_macro, 1, mean))/apply(rank_macro, 1, sd)

  column_names_long <- zoo(matrix(ncol = 3, nrow=nrow(rank_macro)), index(rank_macro))

  for(i in 1:nrow(rank_macro)){
    for(j in 1:ncol(rank_macro)){
      if(rank_macro[i,j] == 1){
        column_names[i,1] <- colnames(rank_macro)[j]
      }else if(rank_macro[i,j] == 2){
        column_names[i,2] <- colnames(rank_macro)[j]
      } 
    }
  }

  st_date <- index(rank_macro)[1]
  end_date <- index(rank_macro)[nrow(rank_macro)]
  equity_data_r <- window(equity_data, index. = index(equity_data), st_date, end_date)

  equity_final <- zoo(matrix(ncol = 2, nrow=nrow(equity_data_r)), index(equity_data_r))

  for(i in nrow(equity_data_r)){
    for(j in ncol(equity_data_r)){
      ss <- substr(colnames(equity_data_r), 1, 3)
      tt <- substr(column_names_long[i,j],1,3)
      a <- ss %in% tt
      equity_final[i,j] = equity_data_r[i, a]
    }
  }
}

As you can see that I have the column names of the top and bottom period, After this How should I go about getting the values from equity data.

If this is too confusing (which I believe is the case) then please let me know I will try and explain more clearly.

Also, if you feel there is a better and efficient approach to solve this problem then please suggest the same.

technocrat · January 19, 2020, 7:47pm

Sorry that I missed this yesterday.

The (second, the first had no gdp variable) gdp_data has NAs that will produce more NAs with month/over month.

Let's see if I can gin up an example using the gapminder package and data sets to get longer series without NAs. DM me if you don't hear anything after a decent interval.

technocrat · January 20, 2020, 7:08am

OK, so here's my ginned up example with proxy numbers transformed from gapminder with lifeExp serving as a financial index, five-year dates used as months, and actual annual five-year point GDP divided by 60. The logic to calculate top_gdp can be reversed for bot_gdp.

While this seems tedious, the virtue is that it's quite clear what each step does. Coming back to stuff coded this way is so easy to follow that even this fool who wrote it easily can pick up where he left off.

Let me know if you have questions.

library(tidyverse)
library(gapminder)
usukjp <- gapminder %>% mutate(gdp = pop * gdpPercap) %>% select(year, country, lifeExp, gdp)  %>% filter(country == "United States" | country == "United Kingdom" | country == "Japan") %>%  mutate(country = as.character(country))
# simulate months
month <- enframe(c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec")) %>% select(value)
colnames(month) <- "month"
by_country <- function(x,y,z) {
    x <- y %>% filter(country == z)
    x <- cbind(month,x) %>% select(-year) %>% mutate(gdp = gdp/60) %>% mutate(delta_m = gdp/lag(gdp)) %>% rename(index = lifeExp)
}
us <- by_country("us", usukjp, "United States")
uk <- by_country("uk", usukjp, "United Kingdom")
jp <- by_country("jp", usukjp, "Japan")
jpuk <- left_join(jp,uk, by = "month")
jpukus <- left_join(jpuk, us, by = "month")
new_names <- c("month", "drop1", "index.jp", "gdp.jp","delta_m.jp","drop2","index.uk","gdp.uk","delta_m.uk", "drop3",  
"index.us", "gdp.us", "delta_m.us")
colnames(jpukus) <- new_names
jpukus <- jpukus %>% select(-drop1,-drop2,-drop3)
jpukus <- jpukus %>% mutate(top_gdp = ifelse(delta_m.jp > delta_m.uk & delta_m.jp > delta_m.us, "Japan", NA)) %>%
mutate(top_gdp = ifelse(delta_m.uk > delta_m.jp & delta_m.uk > delta_m.us, "United Kingdom", top_gdp)) %>% 
mutate(top_gdp = ifelse(delta_m.us > delta_m.uk & delta_m.us > delta_m.jp, "United States", top_gdp))
jpukus
#>    month index.jp      gdp.jp delta_m.jp index.uk      gdp.uk delta_m.uk
#> 1    jan   63.030  4635581820         NA   69.180  8387776883         NA
#> 2    feb   65.500  6589018133   1.421400   70.420  9671564033   1.153055
#> 3    mar   68.730 10504197884   1.594198   70.760 11082228674   1.145857
#> 4    apr   71.430 16548433897   1.575411   71.360 12954615701   1.168954
#> 5    may   73.420 26401876449   1.595431   72.010 14856370553   1.146801
#> 6    jun   75.380 31524411793   1.194022   72.760 16318827662   1.098440
#> 7    jul   77.110 38269062298   1.213950   74.040 17120156678   1.049105
#> 8    aug   78.670 45531806558   1.189781   75.007 20574911640   1.201795
#> 9    sep   79.360 55585326667   1.220802   76.420 21897680150   1.064290
#> 10   oct   80.690 60493935975   1.088308   77.218 25556632934   1.167093
#> 11   nov   82.000 60577775437   1.001386   78.471 29435975082   1.151794
#> 12   dec   82.603 67252246618   1.110180   79.425 33632821832   1.142575
#>    index.us       gdp.us delta_m.us       top_gdp
#> 1    68.440  36737373719         NA          <NA>
#> 2    69.490  42557805177   1.158434         Japan
#> 3    70.210  50281771374   1.181494         Japan
#> 4    70.760  64681966719   1.286390         Japan
#> 5    71.340  76283328661   1.179360         Japan
#> 6    73.380  88362207128   1.158342         Japan
#> 7    74.650  96781923184   1.095286         Japan
#> 8    75.020 120933764349   1.249549 United States
#> 9    76.090 137027070293   1.133075         Japan
#> 10   76.810 162689218315   1.187278 United States
#> 11   77.310 187454644635   1.152225 United States
#> 12   78.242 215574308918   1.150008 United States

^{Created on 2020-01-19 by the reprex package (v0.3.0)}

polymath548 · January 21, 2020, 5:12am

Thank you I actually had an issue while running the for loop to extract the equity values based on GDP ranking. This helps, solved the problem.

technocrat · January 21, 2020, 9:55am

Great. Please mark a solution for the benefit of those to follow

system · January 28, 2020, 9:55am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.