You are right, sometimes dplyr does not work properly in Renjin, and many packages in Renjin are not so up to date. As for Rcpp, I really admire the efforts that have been devoted to it. Meanwhile, I do want to see fast native R code instead of resorting to a compiled language like C/C++, Fortran, etc. As for data.table, I used to use it a lot, but gradually transfer to the Hadleyverse for better readability and consistency, but I do use the fread function from time to time.
I have a piece of code as below:
require(tidyverse)
require(lubridate)
#find the institutional holding horizon of last 3 years (12 quarters or 36 months) for each quarter starting at 2000-03-31
hold_horizon <- function(quarter, data){
result <- data %>%
filter( qtrdate <= quarter, qtrdate >= quarter %m-% months(36), sharesheld != 0) %>%
group_by(cusip, ownercode) %>%
summarise(hold_period = n()) %>%
ungroup() %>%
group_by(cusip) %>%
summarise(long = sum(hold_period >= 8), short = sum(hold_period < 8), total = n()) %>%
mutate(long_percent = long / total, short_percent = short/total) %>%
select(cusip, long_percent, short_percent) %>%
mutate(date = quarter)
return(result)
}
#create quarter range
my_quarters <- rep(ymd("2000-03-31"), 72)
for (i in 1:73){
my_quarters[i] <- ymd("2000-03-31") %m+% months(3*(i-1))
}
#apply function
horizon_data <- list()
for(i in 1:73){
horizon_data[[i]] <- hold_horizon(my_quarters[i], data = institution_investor)
}
#combine date frames in the list
horizon_data_new <- do.call("rbind", horizon_data) %>%
select(cusip, date, long_percent, short_percent)
The data institution_investor has about 100 million rows and 4 columns. This block of code takes more than 30 minutes to run. I suspect that if I replace for loop with apply functions the situation will get better, but I am not sure.