Hi,
I am having an issue with forming a cumulative backward rolling window as part of my project. I am using around 50 search terms in my project to investigate their effect on market returns. I am regressing each word on the 'rmrf' fama french model factor to investigate if the word has a positive or negative relationship with the market from the start of 2004- 2022. I only want to use those that have a negative relationship, to then form an index of words 'UKIS' which is an average of all words' observations on day t. As words are searched at different rates and periods within these 18 years, I want to use rolling regressions, to regress all my words on 'rmrf' every 6 months (Jan-Jun then Jul-Dec), to see which ones had a negative relationship with the market within that period, and only use their observations in my UKIS index.
In case you are struggling to follow I am using a method from a published paper that explains it as follows "For each of these 118 terms, we compute winsorized, deseasonalized and standardized daily changes in log SVI as described in the paper. We then pick the terms for our FEARS index using a cumulative backward rolling window as follows. We start with the first six months (January to June) in 2004. For each search term, we regress the adjusted daily changes in log SVIs on the contemporaneous market excess returns and keep the t‐value associated with the regression slope coefficient. We sort the t‐values across terms and pick the 30 terms with the most negative t‐values. So there is no look‐ ahead bias, we then use these “Top 30” terms as our FEARS index for the following 6‐months (July 2004 – December 2004). We cumulate and continue in this fashion: the 30 most negative terms during the period January 2004 – December 2004 are used for the FEARS index for the period January 2005 – June 2005, the 30 most negative terms during the period January 2004 – June 2005 are used for the FEARS index for the period July 2005 – December 2005, and so on."
Their 'FEARS' index is the same as my 'UKIS' index.
I have been using STATA so far but with no success, so is there a method to do this in R?