Exercises in Trading System Building – Part 1
This is Scott, I’m taking over from Mole while he gets some downtime in Tenerife this week.
I’d like to digress a little from the usual bullshit about markets (which of course I will cover), and work through the preliminary stages of working up a new system, from concept to backtesting, forward testing and optimisation. I’d also like to update you on some of the things I’ve learned since I last did a system building course for your guys in early 2014.
I’m going to have to eat a little humble pie along the way. Turns out a few of my deeply held market beliefs needed to be thrown away, and some things I argued strongly for were completely wrong. Both Mole and I have learned a lot since 2014, wasted months on some intractable problems, solved a few more, and discovered some fascinating ways to shoehorn objectively average systems into something much, much better. I’ve had exposure to real institutional traders who are orders of magnitude better than me, and had the opportunity to peek over their shoulders and steal their secrets like the rat cunning bottom feeding hustler I am.
I’ve also done some work investigating the systems of other pros and I have some insights into both trend following futures systems and stock trading mean reversion systems that are rich veins of knowledge to explore. I’ve gone as far as I can without having programming skills of my own so I’m forcing myself to start programming at age 42. I suck at that, and I most likely always will, but nothing worth doing is easy, so I persist. My plan is to use the superb software at quantopian as a research environment.
These are all works in progress for my own professional development, however, and I prefer not to write authoritatively on them until I am an expert and not just a dabbler.
If you were not around here back in the day, it might help you to take a refresher course
Without further ado, let’s dive right in. What I’d like to do is show you how the real pros work up a system. And truthfully it’s not a lot like my original approach, which had a tendency to build systems with too many moving parts. Too many moving parts means by definition FRAGILE. We are going to keep coming back to this concept of fragility, and for an interesting diversion on the concept I can recommend this book by Taleb (the black swan guy).
What the pros do is start with a hypothesis to test. They test that hypothesis to an acceptable degree of statistical significance BEFORE they start working up a system. Statistical significance is a pointy headed way of saying that small amounts of data don’t mean shit (which means that seasonality data in particular is suspect)
The old idea of working through market beliefs and building up a list of indicators and whatnot you believe in, has some merit for purely discretionary systems. Where I personally fucked it up was by taking a full system, and backtesting that, rather than the raw concept first. Also, it doesn’t really matter much what your beliefs are, if they aren’t an edge you are quite simply wasting your time.
Of course we don’t want to objectively test our own long held ideas about technical analysis. What if we are wrong? What if we have to admit that we wasted years of work on things that don’t have any validity at all? Well welcome to the club, I’ve wasted years of my life on garbage too. Suck it up snowflake and get testing, or scuttle off back to your safe space.
Lets look at one of the components of the Crazy Ivan system which was the first one Mole and I built. The inside period, going long on break of the high or short on break of the low with a stop a tick below the low (for a long) or a tick above the high (for a short). Let’s use a simplified 1.5R target to make things easy
Now, you can see in this instance we went long on break of the high, and it hit the 1.5R target for a nice win.
My old approach was to backtest this a LOT, but there is a big problem with that. I’m not just testing the hypothesis that a break of an inside period extreme leads to a move, I’m also testing the efficacy of my stop, my target, how quickly I move to breakeven, and any other indicators I might be adding. Each extra thing I’m adding adds an order of magnitude of fragility to the whole bag of cats.
So I can generate a 1000 trade backtest, and not even be sure if I’m looking at a coin flip edge with a good exit strategy, or a strong edge with a shitty exit, or anything in between. There are ways I can mitigate this, but bottom line, if I walked into the office at RenTech with this strategy they would laugh me out the door. Now think what happens when I mix 5 OTHER setups in with the already existing bag of cats. I have no idea what’s working and what’s not, whether the same exit and stop logic is optimal (or even decent) for the other setups, etc etc. It is a child’s bedtime story, good for making me feel better but completely and utterly useless in the real world.
My bad. I wasted a decade of my life on rubbish like this. I’m sorry if I led you astray, but hopefully I can lead you back to the path of righteousness. And for the record, Crazy Ivan is a small but measurable edge, just parts of it suck and the methodology to build it was wrong.
The upshot of it is that while I was stuck in this mode of building systems I could produce all the backtests I wanted, and they don’t mean a fucking thing. This has led Mole and I to be on the verge of throwing away backtesting completely (which is what Van Tharp did) which is also wrong. Backtesting is a difficult beast with many inherent problems that need to be overcome or mitigated, but just because it is difficult doesn’t mean we shouldn’t do it.
So what we do is test the broad concept of our trading ideas, one at a time. We plot some data points and see if we can roughly draw a line of best fit. We are looking for a positive correlation for some things, and a negative correlation for mean reversion (obviously)
Some examples will illustrate. There is a very weak positive correlation with investor optimism and one month change in S&P. The technical term for how well the dots make a line is the “R squared” statistic (labelled on most of the charts) here you can see the R2 number of .06 is basically very weak line of best fit.
This is all covered well in Victor Neiderhoffer’s seminal and very well priced book which I recommend. How could we use this in a practical sense? Let’s say we wanted to test for mean reversion on SPY on a 2 day period. We plot the 2 day return on the X axis, against the return for the NEXT 2 DAYS on the Y axis. We have a reasonable -.17 negative correlation, which means that there is a legitimate statistical tendency for 2 up days in the SPY to be followed by 2 down days (measured as a total, not both days down). As an aside this tendency is also present (albeit to a lesser degree) in the Australian, Russian and Brazil equivalents, giving more confidence in the concept.
This is really important, and its important to understand WHY this is better than just building a mean reversion system and backtesting it. If we build the system, it might test 100 trades, but in the course of those 100 trades it actually takes, it probably has thousands of events. We get more data points, more statistical significance, and purer data. Here we are JUST testing the tendency of price to mean revert on a 2 day period, as opposed to testing a system with a stop, take profit, trailing stop logic, and all the other stuff that goes into system building.
If we don’t start with an idea that is real, we can literally throw the backtest away, because we have built our house out of straw. The good news about scatter plots is they can be done by hand, in google sheets, excel. The other good news is they are far more reliable than backtests, because you are testing just one idea, and not the fragile bag of cats that is the entire system.
It also means that old idea about it doesn’t matter what instrument we trade or what timeframe we trade it all tastes like chicken and a chart is a chart is a chart – well that one can be tested. And empirical testing reveals I was completely wrong about that and BobbyLow you were completely right! Mea Culpa my friend. In my travels there are many markets which are empirically trending intraday and mean reverting interday (like ES futures for example). You just can’t take a system which works on one timeframe and throw it up on 5 min charts or 360m charts. Sorry, this is one of the things I really thought was true, but it’s not. Also, the most robust edges are present across markets, but the most robust edges are not the best edges. Not everything is present everywhere. Some markets gap and revert, some don’t. Currency markets have very few mean reversion opportunities but those that do tend to be very good (and as Mole has figured out the mean reversion tends to be cyclical meaning it works great for a while then stops)
So what kinds of things should we be testing for? The low hanging fruit is things we all pretty much know are true and don’t have to think too much about. Mean reversion (but the above scatter plots could tell you if it exists on a one day, 2 day, or weekly timeframe), momentum (the tendency for the strongest currency or stock to continue to be strong over a given period), trend following performance following X day breakouts, trend continuation following a trend hitting moving average support, pairs trading, opening gap, earnings drift post announcements, market performance over a given time period, or on Fridays, or OPEX days, etc.
Now you are going to get a lot of dry holes in your research. It’s part of the game, for me I test 5 different ideas to find one that even has potential, and I would throw away 90% of potential systems. For example let’s test whether an upmove in the Russ-hole is predictive of an upmove tomorrow (and vice versa) or maybe predictive of a down move (mean reverting over the 1 day timeframe). This big random looking scatter plot means effectively there is no relationship between the data, no exploitable effects or correlations, in simple terms it’s all bullshit.
So, for shits and giggles lets test two different theories, which I’m hoping to work into two potential systems over the next few days.
System 1 hypothesis. Trend Following Systems buy breakouts, typically 50 day breakouts, and currently approximately $300 billion dollars is traded on them. Since trend following systems have win rates of between 25%-35% it stands to reason there will be a pullback effect after breaking out. We have all seen breakouts that continue and go forever, and breakouts that fail. The hypothesis we want to test is
Is there money to be made shorting 50 day breakouts for a mean reversion scalp?
(this is not dissimilar to Linda Raschke’s Turtle Soup setup which Mole used to trade, back in the day)
System 2 Hypothesis: The strongest currency pair on any given day should continue to trend over the following days.
Now, I have literally no idea whether these concepts are an edge worth pursuing, but I’m eager to find out. And if one of them is an edge, together we will build a system out of it, and hopefully you guys can work it into something tradeable in the comment section and we can backtest it into a crowdsourced evilspeculator system, ready for prime time.
As for the markets today. I have one tasty setup for you, and some commentary on the indexes. It is important to notice that a massive bar at the end of a long running low volatility trend indicates capitulation, and time to start looking at both short and longside again. Here we have 4 days of pullback, with each successive day being more overlapping and smaller range than the previous day. The obvious conclusion is that the bears are running out of steam, and the bulls are preparing for another assault on fresh highs. I have no idea whether this assault will be successful, and personally I think the easy phase of this move is done.
In retrospect I’m quite chuffed at this exit, even if it was only for a puny 3 contracts. Still, better to be lucky than good 😉
Questions about the scatter plots in the comment section.