The Ultimate Guide On Raw Edge Discovery
Earlier today the GoldGerb asked me how to put together a scatter plot for raw edge discovery as introduced to you by Scott during my Tenerife adventure. Since my gums and I are feeling a bit better than anticipated I thought I may as well condense some of the exchanges I’ve had with him into a dedicated post. It is my belief that raw edge discovery (or RED as it shall be known henceforth) is an integral but much neglected aspect of system development.
When done correctly RED can not only potentially help you avoid months and perhaps even years of wasted time. It will also lead to a cleaner and more solid system whilst helping you develop a deeper understanding of what actually drives your system’s edge. Finally it will allow you to establish baseline from which you are able to evaluate additional parameters or rules and avoid over optimization. It doesn’t do your laundry or wash your car but if you’re a system developer then RED is your starting point when considering a new trading idea.
Like Scott already pointed out in his original post the visualization we will use is a scatter plot, which is easy to do in Excel. There are many tutorials out there  and Google is your friend. But before we launch Excel or your favorite charting app you first have to go back to first principles and develop a hypothesis. Just like a scientist.
Scott and I looked at a heap of mean reversion systems and the hold time basically boils down to only a few bars. On average only 2 – 3 days. For example here’s Larry Connors’ hypothesis: For stocks in a bull market, trading above their 200 SMA, there is a mean reversion effect.
Raw Edge Discovery – RED
So how should we test this?
First we look at the change of price leading up to the entry condition. Doesn’t matter what exactly your entry conditions are, even if it’s something complicated like:
- 1) The stock is a member of the Russell 1000 (at the time, not today eliminating – survivorship bias which is huge)
- 2) Minimum daily liquidity requirement
- 3) 70% of the stocks in the market trading > SMA(200)
- 4) The individual stock trading > its SMA(200)
- 5) A down move defined in different ways (see next list below)
- 6) A volatility filter (VIX or VIX equivalent below x)
In reference to item 5) this is one of the most complicated mean reversion thing we’ve seen, ever, but a couple of useful takeaways:
- A close below lower 1.0/20 bollinger
- 3 lower lows (i.e. a Net-Line Buy Level [NLBL] forms at the high of the first candle)
- 5 lower closes
- RSI(2) < 10
- ROC (3) low
So you throw all these conditions together. It’s not curve fitting (yet) at this stage, but it *might be*, so don’t go crazy with the rules. Then you make your best guess of the timeframe for the entry condition. In mean reversion systems it is pretty trivial:
- X axis: delta 3 days before (in percent)
- Y axis: delta 3 days after (in percent).
By the way this is just a best guess. You could do a few best guesses maybe 3 days, 5 days, etc. In reality in most cases 3 days will turn out your sweet spot but prove me wrong. Of course if you’re building an hourly trading system then you’d be testing against +/- 3 or 5 hours.
Here’s an example of just that produced by Francis – an intrepid reader who took one for the team and volunteered to run the numbers on our Net-Line concept across a few daily charts. The scatters he produced show us a pretty weak positive edge. So what have we learned? At least on the daily panel on their own without additional context single Net-Lines appear to be astonishingly useless as an entry system. Which incidentally is the very reason why I rarely if ever use Net-Lines without additional context such as SMAs, Bollingers, or other even other Net-Lines.
Now a fundamental point I hope you fully comprehend moving forward is that RED only shows you what the market does after your entry condition has been triggered. It has absolutely nothing to do with your future trading system.
The R squared value you’ll get from your scatter is a measure of how strong this effect is, which is effectively how close the dots are to making a line (i.e. how bunched up the dots). If you are looking for mean reversion for example you should be seeing a nice diagonal line.
So how do you know if you are fooling yourself? After all, if you tested pretty much any half decent MR system on AAPL then you’ll be looking pretty clever. Buying down closes on AAPL will look great on a 10 year backtest. Does that mean your raw edge is real?
- Firstly you test on Russell, and then SP500 and Wiltshire 3000 participants. Results should be a forest of good results not just an outlier. Ditto for testing foreign markets. All the good mean reversions test well across countries, e.g. the Nikkei, the Hang Seng, the DAX, etc.
- Secondly you make sure you have statistically significant numbers of data points, in terms of standard error. But for any given set of data the scatter plot will be orders of magnitude (literally) more reliable than a backtest in proving or disproving your hypothesis. That’s how you make your best guess against curve fitting.
After you have proven your hypothesis, then and ONLY then do you start playing around with different exits and actual system stuff.
If your scatter shows you at minimum a weak positive correlation – congrats, you are now ready for back testing. The gold standard is to take out some data you didn’t use to build your system on, you optimize as little as possible, and then run your new rules over the data you set aside. Don’t just throw 20 rules at your system from the get-go – start small and build it up rule by rule. The fewer rules the better. A system’s quality and resilience come via simplicity and not by adding complexity. The closer your optimized system backtest matches your ‘out of sample data’ backtest, the less you have fooled yourself.
So for example, lets say we built a system on “Buy 7 days down in AAPL, sell a 7 day high close“. That would test amazing, but if we tested the same thing across random out of sample data it would most likely suck. A classic way is to take the Russell 1000 for example, and keep out every 50th stock alphabetically. Don’t use that data at all for your backtests, but when you finish up your system building you run your system against those 50 stocks. The closer the match the less the curve fit, by definition. That’s not to say market type won’t change, but it does prove you haven’t succumbed to data snooping biases.
By the way all this is *really* easy to do with quantopian which we’ll cover in much detail in future articles of this educational series. Now before you recoil in horror at the thought of writing code keep in mind that even Convict Scott could figure it out, and he can barely program his way out of a paper bag.
So to make it easier for you guys, for daily MR systems there are ONLY three really viable entry methods.
- Entry a few minutes ahead of close – standard
- A.k.a. the Nick Radge: Limit order .5 ATR(14) below last close. We are going for more extreme, and therefore better mean reversion.
- Entry following open (generally this one is not as good)
If your system is positive on all three entry methods then it is a lot less likely to be curve fitted. Again this should be a good number in a forest of good numbers. The idea is once you prove the hypothesis to a standard you are happy with, you play with entries and exits until you get something close enough. Then again you run your fledgling new system on:
- Out of sample data
- Other indexes in the same market
- Other stock markets in other countries.
If you still see good numbers (which is rare to be honest) then you can be fairly confident that you aren’t fooling yourself with randomness (hat tip to Nassim). Once again Scott and I both believe that the research environment in quantopian is ideal for this stuff, which is why I am working toward posting a pertinent introduction a few weeks from now. May be something we’ll do after May so that you guys don’t go away
It’s not too late – learn how to consistently bank coin without news, drama, and all the misinformation. If you are interested in becoming a subscriber then don’t waste time and sign up here. The Zero indicator service also offers access to all Gold posts, so you actually get double the bang for your buck.
Credits go to Scott Phillips who contributed large parts of this post.