Studies -
Posted on **Monday, August 9, 2010, 11:30 PM GMT +1**

## Pairs Trading (ETFs)

First of all thanks for your patience, and from now on I’ll be posting again on a more frequent basis.

And furthermore I’d like to advise those interested in quantitative research of a new blog I just came across: **Engineering Returns** by Frank Hassler.

____________________________________

Due to the fact that I’m a big fan of statistical arbitrage (and trading it for a living), I thought it would be interesting to check if – and to what extend – there are pairs of **ETF**s (Exchange Traded Funds) which – as always based on historical data, statistical anomalies, regularities and irregularities, … – would provide a favorable and tradable edge maintaining a market neutral position.

I personally prefer ETFs to individual stocks due to the fact that the latter are much more sensitive to unforeseeable events and/or outcomes like earnings, fundamentals, crew changes (CEO, CFO, …), rate disputes, strikes, take-overs, force majeure (casualties, disasters, …). And I speak from my own experience …

In conjunction with pairs trading, you’ll probably hear about two (quite different) concepts: **correlation** and **cointegration**. ** **

**Correlation** states the degree to which the (daily, weekly, monthly …) returns of two series of prices (e.g. the S&P 500 and the Nasdaq 100) will move in the same direction most days/weeks/month over a period of time (but probably drifting farther and farther away from each other due to deviations in the magnitude of daily returns), while a pair (being long one and short the other series of prices in the right proportion) is called **cointegrated** if it has a consistent mean and standard deviation, both prices series never indefinitely wandering off in opposite directions and never drifting farther and farther away from its mean without eventually returning to the initial ratio or mean (mean-reversion).

But – unfortunately – the so-called **half-life** (the expected time to revert half of its deviation from the mean) concerning a cointegrated pair of price series will regularly be measured in weeks or month (see stats below), and I’m more a high-frequency trader looking for opportunities on a day-by-day basis.

The following are the ETF’s I’ve been utilizing for my investigations, meeting the necessary requirements like adequate liquidity (daily trading volume), as low as possible transaction costs (narrow bid/ask spreads), adequate volatility (in order to justify the arising high transaction costs), among others:

**SPY**: S&P 500**QQQQ**: Nasdaq 100**IWM**: Russel 2000**SMH**: Semiconductor**RTH**: Retail

Other ETFs may be subject to a follow-up posting.

To test for cointegration, the primarily method used is called the augmented Dickey-Fuller test. If two price series are cointegrated (with a probability of better than 90%), the Dickey-Fuller test would’ve to come up with a *t-statistic* exceeding the 90% critical value of **-3.038** (in absolute terms), otherwise the hypothesis that those two price series are conintegrated would be rejected. The following table provides the respective *t-statistics* based on the augmented Dickey-Fuller test for the time frame between 01/01/2002 and 08/06/2010 (price series are adjusted for dividend and cash payments).

t-statistic |
SPY |
QQQQ |
IWM |
SMH |
RTH |

SPY |
– | -0.8210 | -2.7995 | -1.8635 | -2.8513 |

QQQQ |
– | – | -2.6115 | -1.0084 | -2.0935 |

IWM |
– | – | – | -1.9626 | -3.2206 |

SMH |
– | – | – | – | -3.3625 |

RTH |
– | – | – | – | – |

Interestingly there are only two pairs – **IWM** vs. **RTH** (gt. 90%) and **SMH** vs. **RTH** (gt. than 95%) – which are cointegrated with a probability of better than 90%, while the **SPY** (as a proxy for the S&P 500) and the **QQQQ** (as a proxy for the Nasdaq 100) show the least probability for being cointegrated. **IWM** vs. **RTH **shows a half-life of **194** sessions, and **SMH** vs. **RTH** a half-life of **92** sessions. Both pairs seem to be good candidates for a (longer-term) mean-reversion strategy.

A second interesting observation is that even in conjunction with **SPY** and **IWM**, the **RTH** (Retail HOLDRS) seems to be a favorable candidate for a potential mean-reversion strategy.

But fortunately cointegration is not mandatory in order to find a profitable mean-reversion strategy, and on a day-by-day basis even non-cointegrated pairs (like the **SPY** vs. **QQQQ**) may provide favorable short-term mean-reversion opportunities (better fitting my style of trading). So my next step was to check for the pair’s performance based on the easiest mean-reversion strategy: ** **

**Buy** (on close) the pair in the event the ratio of **ETF** X and **ETF** Y closed lower (means ETF A under-performed ETF B on the respective session), and **sell short** (on close) in the event the ratio of **ETF** X and **ETF** Y closed up (means ETF A out-performed ETF B on the respective session). Due to the **RTH**‘s inception date in 2001 start date for the following stats is always Jan. 1, 2002.

“**Buy**” means buy ETF A and sell short ETF B (and vice versa), the number of respective shares specified by the ratio of closing prices (e.g. if the ratio of ETF A’s and ETF B’s closing prices is **3**, one would sell short 3 shares of ETF B for every share bought of ETF A). A marginable account would be mandatory, especially due to the fact that it is assumed that one would invest 100% of the then current net liquidation value on both sides of the market (means 100% on the buy and 100% on the short side).

(FAQs and a glossary concerning the stats can be found at the **FAQ/GLOSSARY** page)

Here is the link to the stats in a ‘readable’ size: Statistics 1

Interestingly it is again the **RTH** in conjunction with every other ETF which delivers the best results, always exceeding the 200% mark for compounded returns (gross profits before applying commissions, slippage and fees). Unfortunately commissions, slippage and fees would regularly eat up a major part of the compounded return, due to the fact that one would always have a position in the market, with an exposure of 200% (100% on the buy and 100% on the short side), and reversing one’s position (switching from the long to the short side of the pair and vice versa) would quadruple the respective transaction costs in comparison to somone who simply closes a long or short position with an 100% exposure.

In a second step I utilized a little bit more sophisticated concept (Bollinger Bands %B with 4-days EMA and 1 standard deviation): ** **

**Buy** (on close) the pair in the event the Bollinger Bands %B closed below 0.35, and **sell short** (on close) in the event the Bollinger Bands %B closed above 0.65.

For a detailed explanation of the Bollinger Bands %B concept see Stockcharts.com. In other words: **Buy** the pair in the event the ratio closed almost (< 0.35) one standard deviation below its 4-day exponential moving average (ETF A is short-term ‘oversold’ in comparison to ETF B), and **sell short** the pair in the event the ratio closed almost (> 0.65) one standard deviation above its 4-day exponential moving average (ETF A is short-term ‘overbought’ in comparison to ETF B). A classical mean reversion concept.

Here is the link to the stats in a ‘readable’ size: Statistics 2

Things are (partly) significantly improving: Compounded returns, *t-score* (vs. chance and benchmark) are increasing while transaction costs, maximum drawdowns are decreasing (now Time in Market is less than 100%, with a smaller frequency of closing or reverting one’s position), and especially the **SPY** vs. **RTH** and **IWM** vs. **RTH** pairs show promising results to be worth some further investigations.

More to come in a follow-up post (at time of writing it’s almost midnight in Germany) …

to be continued …

Successful trading,

Frank

*________________________________*

* *

If you might want to be instantly notified about what’s happening in the markets and at ** ****TRADING THE ODDS**, I encourage you to subscribe to my RSS Feed or Email Feed, and (or) follow me on Twitter.

*xx*

**Disclaimer**:* *No position in the securities mentioned in this post at time of writing.* *

The information on this site is provided for statistical and informational purposes only. Nothing herein should be interpreted or regarded as personalized investment advice or to state or imply that past results are an indication of future performance. The author of this website is not a licensed financial advisor and will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on the content of this website(s). **Under no circumstances does this information represent an advice or recommendation to buy, sell or hold any security.**

## Comments (19)

glad to see you’re back Frank

Welcome back! Good to see your posts.

Other than my being long on bond futures today, to see you back to posting is the best news of the day!

Welcome back!

Welcome back Frank, hope you are well rested. Jeff

Welcome back, looking forward to reading about new ideas.

welcome back

Thanks Frank for thsi awesome post. I will be folloing this for your future posts.

Thanks Frank for this awesome post. I will be following this for your future posts.

The “Statistics 2” link is broken: http://www.tradingtheodds.com/2010/08/wp-content/uploads/2010/08/PairsTrading2orig.png

evo34,

it’s working well on my side, so I don’t know what may’ve caused the problem on your side.

Best,

Frank

Dear Frank,

welcome back.

I have a question related to cointegration and augmented Dickey-Fuller test. I tried to test cointegration of IWM and RTH, but I couldn’t come up with the same answer.

I did as following:

1. found log returns of IWM and RTH

2. found a spread between IWM and RTH (iwm.delta – rth.delta). From the spread graph it is clear, that spread isn’t cointegrated.

3. ran augmented Dickey-Fuller test in R on spread (adf.test)

Could you provide a short description how did you get cointegration values?

Thank you!

kafka,

concerning the test for cointegration I used a free Matlab package available at http://www.spatial-econometrics.com/ (only two series of prices are needed as input data). I verified results (correctness of data and toolset) by running a cointegration test between S&P 500 and SPY opening and closing prices (the latter shows a t-statistic greater than -45, when -3.9 would be sufficient for a 99% probability that S&P 500 and SPY’S closing prices are cointegrated). A step-by-step approach can be found here: http://tradingwithmatlab.blogspot.com/2009/12/pairs-trading-cointegration-testing.html.

Please take into account that results may (slightly) differ if different (and/or longer/shorter) time frames are used, or if the closing prices do not account for dividend and cash payments. Simply adding dividend and cash payments to closing prices and computing daily returns afterwards would NOT be correct. On the ex-dividend day the daily return is claculated as [(close(today) + dividend/cash payment) / close(yesterday)] – 1, while the daily return the day immediately following the ex-dividend day is calculated as [close(today) / close(yesterday)] – 1 (dividend and cash payments are no longer part of the computation), NOT [close(today) / (close(yesterday) + dividend/cash payment)] – 1.

Best,

Frank

Wow this is a great resource.. I’m enjoying it.. good article

Hi Frank,

I just found this post. Very impressive. I will be looking forward to your next post as I want to add a pairs-trading strategy to my stable as well.

Thanks!

QD

Interesting article, thanks!

Other high co-integrated pairs that are good within a trading system are GLD-GDX and USO-OIL, I find having both high correlation and co-integration helps with results, no matter the time frame your trading signals are being generated on.

dear jared: please discuss or show the formulas you use tro calculate correlation and cointegration. thanks very much, Ajay

Dear Frank: what is the link to the free Matlab package to calculate cointegration? Can the calculations be set up in spread sheet format?

Ajay,

the free Matlab package can be downloaded here: http://www.spatial-econometrics.com/

(.zip and manual, Chapter 5)

Best,

Frank

Thanks for sharing your strategy. However, there is one flaw of the analysis: the back testing is in-sample: you use dataset A to generate cointegration results, then back testing your trading strategy again using dataset A. Of course, your back testing results look good. Question is whether your strategy will make money in a forward looking manner.