Attempting to produce a cumulative backward rolling window


I am having an issue with forming a cumulative backward rolling window as part of my project. I am using around 50 search terms in my project to investigate their effect on market returns. I am regressing each word on the 'rmrf' fama french model factor to investigate if the word has a positive or negative relationship with the market from the start of 2004- 2022. I only want to use those that have a negative relationship, to then form an index of words 'UKIS' which is an average of all words' observations on day t. As words are searched at different rates and periods within these 18 years, I want to use rolling regressions, to regress all my words on 'rmrf' every 6 months (Jan-Jun then Jul-Dec), to see which ones had a negative relationship with the market within that period, and only use their observations in my UKIS index.

In case you are struggling to follow I am using a method from a published paper that explains it as follows "For each of these 118 terms, we compute winsorized, deseasonalized and standardized daily changes in log SVI as described in the paper. We then pick the terms for our FEARS index using a cumulative backward rolling window as follows. We start with the first six months (January to June) in 2004. For each search term, we regress the adjusted daily changes in log SVIs on the contemporaneous market excess returns and keep the t‐value associated with the regression slope coefficient. We sort the t‐values across terms and pick the 30 terms with the most negative t‐values. So there is no look‐ ahead bias, we then use these “Top 30” terms as our FEARS index for the following 6‐months (July 2004 – December 2004). We cumulate and continue in this fashion: the 30 most negative terms during the period January 2004 – December 2004 are used for the FEARS index for the period January 2005 – June 2005, the 30 most negative terms during the period January 2004 – June 2005 are used for the FEARS index for the period July 2005 – December 2005, and so on."

Their 'FEARS' index is the same as my 'UKIS' index.

I have been using STATA so far but with no success, so is there a method to do this in R?

Yes. In outline

  1. Prepare a data frame or data.table for

I'll refer to this as x

  1. Design a data frame layout for the results of the results

I'll refer to this as y

  1. Create functions to derive y from x

f1() subset of y into indexes of x for the six-month periods as x1_1 \dots x1_n
f2() regress

f3() extract

f4() sort t-values
f5() pick 30 smallest t-values as x2_1 \dots x2_n
f6() insert x2_1 \dots x2_n into y, which should omit the first period, since there is no backward period to calculate using

In principle, this is simply school algebra.


Do you have an idea what the code would look like? I'm quite new to R so I have been struggling to implement this


What do you have that you can derive the data frame in step 1 from? How do you turn it into x? What kind of variable is SVI? Can it be used as an argument to log()? Should that be log base 2, natural log or log base 10? To start. Daily do you have a time series? Changes can you use lag()?

Then keep going until you have x. Find daily changes . Find the appropriate f functions to windoriize, deseasonalize and standardize the dailies.

Decide whether index is defined or something you need to make up? How is it constructed? Out of which pieces of x.

And so on. It's all about breaking the problem down into its smallest pieces and putting them together one-by-one.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.