Education
Now Reading
Quantutorial – Linear Regression
79

Quantutorial – Linear Regression

by The MoleApril 21, 2017

A few weeks ago during my trip to Tenerife Scott produced a post on testing mean reversion in the context of parsing for what I personally categorize as ‘Raw Edge Discovery’ (RED). Since I was on vacation I had very little time to contribute to the ensuing discussion but I had been planning to circle back on Scott’s post for two reasons: First I was positively surprised by the high level of interest regarding machine learning and basic system development. Secondly, although being rather comprehensive, I had felt that his post could benefit from a more in-depth explanation of the math behind scatter charts, which of course directly relates to linear regression.

Now before you run for cover let me assure you that I’ll keep it very light on the math and promise to provide you with plenty of visual aids to make the process less painful. I personally am not a math wiz by any definition. Plus as your humble host of eight years plus I would know better than to bore you with dry theory that has little application to your daily reality as a trader. Instead what I would like to do here today is to instill a deeper understanding of linear regression and why it is essential to fully grasp the concept in order to actively pursue RED in the context of your system development. Let’s dive right in:

Linear Regression

2017-04-21_LR_bad_and_good_fit

Here we have two scatter plots which I populated with random data, after which I added a best fit line. If I asked you which one is a better fit you would most likely point at the one on the right. But if I asked why you think it is a better fit then you probably would think for a few seconds and then tell me that it’s because the dots are closer to the line. And you would be exactly right.

Just looking at the plot on the left it is pretty safe to assume that there is either very weak or no linear relationship between each respective data point, each of which is produced by two values: x, and y. Thus we have an x axis and a y axis, which are an integral part of our formula:

2017-04-21_formula2

This simple formula is our basic equation which defines our best fit line. At any point along our (horizontal) x axis we plug each x value into our formula above we need m and b in order to arrive at y. So how do we figure those out? Well, m is the slope of the line and b is the y intercept. But how do get those?

2017-04-21_formula3

It’s mostly leg work and it looks a lot more intimidating then it really is. We figure out m by:

  1. Multiplying the mean of all the xs by the mean of all the ys (the x and y with the line above them)
  2. minus the mean of all the xs times the ys.
  3. Then we take the mean of all our xs and square it
  4. minus the mean of all the xs squared
  5. We divide the product of 2 by the product of 4.

Now we need to figure out the intercept b, which is much easier:

2017-04-21_formula4

All we do is to take the mean of all the ys and deduct it from m multiplied by the mean of all the xs. And we are DONE!

Squared Error

Alright we are ready to put things a bit more into context:

2017-04-21_LR_good_fit_squared_error

Here again is the ‘better fit’ scatter plot but this time I have added a few red lines which attempt the visualize the concept of ‘squared error’ or e². Squared error is calculated by measuring the distances between your best fit line and each respective point on your scatter plot. But just measuring the distance is insufficient because you’ll often have outliers which may skew your dataset plus sometimes the distance may be positive and sometimes negative (i.e. below the best fit line). For that reason we simply square the distance which elegantly solves both problems for us.

You may ask why we do we square it? Why not quadruple it or use an even bigger exponent? Well, you could actually but the accepted norm is a squared value and it seems to work pretty well for most applications. There may however be datasets that would benefit from a higher exponent and you are of course free to use that if you’re a) mathematically inclined and b) are able to write your own linear regression algorithm. Which is what I actually wound up doing in the context of my own educational endeavors. I felt that it would give me a more deeper understanding of the underlying concept plus I did it in python which I am starting to enjoy very much.

 

Coefficient Of Determination (r²)

2017-04-21_formula1

Okay let me explain to you those strange symbols to you first. SE stands for squared error. The y with the little roof over it stands for the best fit or regression line on the scatter above.

2017-04-21_LR_mean_of_the_ys

Finally the y with the little line above it stands for ‘the mean of the ys’, which refers to the black line on the plot above (step 4 in our m equation above). And all we’re going to do is to compare the accuracy of that black line with the accuracy of the blue best fit line. See, that wasn’t so hard!

Examples

Now that  you understand the basic concept let’s look at a few example values of r². What we already know is that the range we’re going to be dealing with is 0 to 1. If r² = 0.8 then the product of the division must be 0.2, right? Which in turn would mean for example that the SE of the best fit would be 2, and the SE of the mean of the ys would be 10.

Given that example the squared error of the blue best fit line is significantly better than that of the (black) mean line. And that’s a good thing of course and it means that our data is pretty linear. If the r² is 0.3 on the other hand then the product of the division in the formula above must be 0.7, which in turn may be produced by 7 / 10. So now the SE of the best fit line is only marginally better than the SE, and yes all other things being equal that is a bad thing. Of course the accuracy of your model always depends on the type of data you are analyzing.

Done!

And there you go, NOW you understand the basic concept of linear regression and the mythological r² value you probably have been hearing about every once in a while. In a future post we’ll dive in a little deeper in order to understand how we can actually leverage our newly gained knowledge.


About The Author
The Mole
Mole created Evil Speculator amidst the chaos of the financial crisis in early August of 2008. His vision for Evil Speculator is a refuge of reason, hands-on trading knowledge, and inspiration for traders of all ages and stripes. You can follow him and his nefarious schemes at various social media waterholes below.
  • ridingwaves

    holy moly Sir Mole, you have some mad Math skills….I will bookmark and get back to it after my South America sojourn…Great write up!
    picked up 1/2 UCO

  • Tomcat

    If only there was a system to predict when and what the orange puppet will say next, it will be the most successful system in the next few years.

  • Mark Shinnick

    Yeah super clear understanding, feels great.

  • Ronebadger
  • StockTalker

    Crude in trouble below 50

  • BobbyLow

    Hope so. :)

  • StockTalker

    Looking at these waves it looks like a possible wave wonker special right now

  • Mark Shinnick

    Long tza

  • Mary

    OPEC meetings this weekend …

  • BobbyLow

    Y=MX +B just brought back some not so good memories. Y=MX +B is some pretty basic stuff but working with co-efficients and quadratic equations without a decent understanding of algebra is what makes most of this stuff so difficult to understand. The reason why I mention this is because I was such a shitty (did only enough to get by) high school student that it made every collegiate math related course that I took later in life much more difficult than it should have been. :)

  • Darkthirty

    Seriously, the question is Did the jump make the horse fart…or did the fart make the horse jump? News means nothing, but is often used as a reason…….

  • BobbyLow

    Can’t look at too many waves because they make me seasick. But I am short crude though. :)

  • ridingwaves

    or it might be buy entry…
    just touched-tested the 2 candles with tails from 3/31-4/4

  • StockTalker

    That looks like a good place right now, been short all week waiting for some direction.

  • Mark Shinnick

    Raise stop.

  • BobbyLow

    Looks like a good play.

    I have no idea how this one is going to turn out for me but it looked weak enough to pull the trigger and fit my old as the hills approach of selling weakness and buying strength.

  • HD

    Dude! That’s some smart shit. My experience with linear regression, as far as charting is concerned, is that you can draw fibs and typically find the values easier. Of course TOS has a tool for regression lines etc but that is volume weighted I believe. For example, if you use the scatter plot in your first coordinate grid image you can see y values of apprx. 79 & -65. The sum, midpoint, or 50% fibonacci value, all equal, near 14 – exactly where the regression line is. IF I’m understanding where you are going with this some refer to the set up as mean reversion. Just my $0.02

  • Mark Shinnick

    Nice place to rally here, has inflection potential so also a nice place for some fear.

  • HD

    nice call. You guys r good. 5 point rally right on your comment.

  • Mark Shinnick

    Well, seriously, using volatility as an estimate of heavy bear interest and how they just stood there yesterday and got skewered …but have still not left the danger, they look like they need to be killed off.

  • HD

    Yikes, I think I’m in the skewered camp. They did let me out today but I’m still a seller of rallies. Using 2355 resistance.

  • Mark Shinnick

    Yeah…keep the faith and stick to plan; stay nimble.

  • StockTalker

    Signal is FU

  • BKXtoZERO

    carbon copy here………

  • Mark Shinnick

    Just more Machiavellia… maybe yesterday worked so well, its become time to drop a bit to load up on some more bears. Volatility at resistance now. On the other hand…sometimes the bears are right. Anyway, the warfare we are privy to never ceases to amaze :)

  • Tomcat

    BTFD, looks like another healthcare BS is “sticking”

  • Tomcat

    for those (middle/low income folks) that still think the Orange puppet has their backs, should read this:
    https://blogs.wsj.com/moneybeat/2017/04/21/grab-your-pitchfork-america-your-401k-may-need-defending-from-congress/

  • almez

    Is the zero stuck?

  • BTrader

    yes it is.

  • Mary

    Are you a female?

  • HD

    SPX working on another 5 point rally to sell?

  • Tomcat

    No, are you a male?

  • Mary

    No

  • Ronebadger

    Question: Why do people refer to points as “handles”…just trying to be cool? I keep hearing/reading this…it just sounds stupid.

  • Mark Shinnick

    It worked a lot better for shorthand in the pits at lower prices and when ranges traded within a single handle for longer.

  • Mark Shinnick

    You know, Trump in a relative sense to all politics that went before, pulled it all off all by his friggin self. Does puppet really ring true? Are you saying he seems to have no backbone to well-moneyed interests after being elected?

  • ZigZag

    Crap – just noticed.

  • Tomcat

    I am not sure what you are asking, all I was trying to do is grab some attention with clear facts, of how some of the initiatives (i.e. tax refor, health care etc) of his administration are gonna screw a lot of his support base and beyond that all middle/low income class.

  • Mark Shinnick

    Things are already pretty screwed / rigged for so many who had every reason it would get worse in the first place. BTW, couldn’t follow the link without a sign-on/subscription.

  • http://evilspeculator.com Sir Mole III

    Hey guys – the Zero is back up. My apologies but it seems that the entire VM crashed and burned while I was out running an errand. It’s back up again.

  • phantomflash

    I have observed that for some futures contracts, brokers and pit traders may commonly refer to a “point” as being something less than a 1.00 increment. Therefore in some cases the word “point” is actually ambiguous depending on context. So “handle” became adopted to unambiguously denote an increment of 1.00.

  • Trouzzer_Snake

    Regardless of personal opinion of Trump, one has to admit that the guy has a very unique psyche where he literally wills himself to accomplish personal goals. I thinks its amazing what he overcame to be president.

  • http://evilspeculator.com Sir Mole III

    No, that’s NOT how you use it – I strongly recommend that you read Scott’s post to get an idea of how to plot the RAW EDGE of your trading ideas. Which is extremely important in the context of your whole EWT obsession. For example, start recording the instances of triggers that *could have been interpreted as signals for a 2nd or 3rd wave looming ahead’ – then define target areas. Use the resulting vector in the scatter chart and you will see if there actually IS a raw edge. Clearcut – no hindsight rationalization and changing of wave counts.

  • HD

    r u from AZ? Nice avatar. Go Cats!

  • http://evilspeculator.com Sir Mole III

    I suck at it as well but will do my best to start conveying the basic concepts. I hope you learned something today!

  • http://evilspeculator.com Sir Mole III

    I guess we have a match!

  • http://evilspeculator.com Sir Mole III

    Way too much emotion for my taste – not productive IMO. Let’s focus on the stuff we DO have control over as opposed of resorting to defeatist perspectives which are not offering us an edge.

  • http://evilspeculator.com Sir Mole III

    See above – I have to ditch that fucking hosting company.

  • http://evilspeculator.com Sir Mole III
  • Mary

    No thank you. Too emotional for me. I like the alpha males.

  • Mark Shinnick

    Yeah…pretty critical inflection here.

  • HD

    “Obsession?” :-) alrighty then, you must have traded EWT in a vacuum. That’s not best practice. If you had counts and were changing them I kinda think you were using Pretcher’s method- also not best practice. JMO nothing personal.

  • Ronebadger

    Yeah…then this is “basis points”

  • Tomcat

    Alpha male or pussy grabber? Because if you knew the definition, you would know the difference.

  • HD

    March OPEX closed 2378, they did test that level this month for a few hours and sellers have been in control since. April OPEX looking to close under 23(55) can lower the R levels again. 2344, 2334 still have supply but then 2300 seems doable.

  • BobbyLow

    I’ve been looking for some detail about meetings this weekend and can’t find any. The only one I could find is for May 25th?

  • Mark Shinnick

    Yes, the bear thing has been dwelling for months now. Someone probably knows right where to nail them.

  • BobbyLow

    Does anyone know anything about an Opec meeting this weekend?

  • Mark Shinnick

    I like the times when all the distinctions don’t matter :)

  • Tomcat

    I think that is the next formal/official meeting, but the risk with holding a short over the weekend, is that you are on MOAB (thrown in a big/country producer) away from being wiped out.

  • BobbyLow

    You’re right. Too much risk holding short crude over the W E. I can always re-short on Monday if need be.

    I’m sitting on a decent profit so no sense screwing around here. I’m closing my position at today’s close currently at +.6R. Bird in hand and all that rot. :)

  • ridingwaves
  • BobbyLow

    Thanks RW. I decided to take profit on my short regardless and not hold it over the weekend. I’ll take another look on Monday.

  • ridingwaves

    I’m L but could be stopped out pretty quick….49.15

  • http://greenlander1.blogspot.com/ Greenlander

    I got long XOP today and planning on re- adding more oil exploration names like MRO, COG, APA, CHK next week. Really like this spot to bounce.

  • http://evilspeculator.com Sir Mole III

    You’re missing my point. What I am referring to is the fact that you are using EWT without ever having proven its supposed predictive quality. It does not matter which count you follow – they are all subjective based on a mental framework that remains to be unproven. I continue to encourage you to actually go and do the work yourself – for yourself. Plus I have been giving you all the tools to do just that. So what’s keeping you?

    You continue to talk about counts, theories, ideas, which all are intangible. They are mental constructs without any statistical or mathematical basis. It’s not something an institutional trader would ever waste his/her time on. So why ARE you using EWT?

    Inquiring minds want to know.

  • http://gerb-reloaded.blogspot.com/ Gold_Gerb
  • Scott Phillips

    The best description I ever read of Trump is a guy who was born on 3rd base convinced he just hit a triple.

    Born into extraordinary wealth. Gifted his father’s extensive business contacts when he went into business. Still fucked up and went broke or nearly went broke multiple times.

    His return on his own business affairs is less than if he had put his wealth in an index fund.

    He’s a poor man’s idea of what a rich man is, a coward’s idea of what a tough man is.

    The scammiest scammer ever to scam his way into anything.

  • Scott Phillips

    No dude. Just no. Also FYI 50% isn’t a fib number, although it IS the only “fib number” with any actual basis.

  • Scott Phillips

    Superb post Mole.

    One thing I will add, is that if you DON’T want to do this kind of scatter plot, you can still trade, but you are limited to trading phenomena you KNOW are real.

    This is a very easy litmus test for “is it an edge?”

    Without an edge, you are literally wasting your time. It’s why the wave theory guys talk and talk and talk and NEVER bank any coin.

  • HD

    I‘m not sure if you‘re being confrontational. Sometimes the internet makes things sound worse than meant to be. I‘m not going to take it that way and hope you don‘t either. BUT WTF! “No mathematical basis!?!” I’m not here to defend EW. I came in on Tuesday and said I had an EW set up for CL and was a seller >$53. You think that was a fluke? Posted a 10 handle short right after ES, sellers. You think that’s not “banking coin?“ Scott. Are you guys even traders? I also sold GC in RT here from an EW 335 pattern. You want to ignore the SPX pattern? I can quantify the waves. You never gave me a chance to though. I was also very clear EW is not a primary for entering a trade. You think anybody really just trades an EW count? If that’s what you were doing than I overestimated the IQ in the room. You have to make it proprietary, yours, you have to quantify waves and you have to have an indicator to make that happen. Why the hell do you think I’m here? Looking at your zero. You guys are a trip.

  • http://evilspeculator.com Sir Mole III

    My goal now is to produce those plots right out of python by ingesting a dataset and preparsing the data to produce a dataframe which represents the entry and exit conditions. Run it through the LR algo and you’re done. Boom – RAW EDGE DISCOVERY parser and no bloody Excel sheets required.

  • http://www.captainboom.com/ captainboom

    It’s amazing how much you can pick up as an older student. I never had anything past basic algebra and geometry in high school. After a stint in the navy, I went to college and successfully achieved some pretty heavy calculus topics with excellent understanding. Focus and a more mature mind can work wonders.

  • Scott Phillips

    Excellent!

  • http://gerb-reloaded.blogspot.com/ Gold_Gerb

    Dollar drop 1%

    I may pee glitter, shit cupcakes, and fart rainbows, but i DO NOT hold over the weekend.

    what is this World coming too?

    http://media.10news.com/photo/2017/04/18/starbuckunicorn_1492551944656_58493011_ver1.0_640_480.jpg

  • Mary

    OPEC meeting was a misprint on the forex factory calendar. They fixed it. Sorry for the confusion.

  • http://evilspeculator.com Sir Mole III
  • phantomflash

    Well, “basis points” refer to yield, not price of a contract. What I’m talking about are just called “points.” Sorry I don’t remember what contract(s) were commonly done this way in the pit — this was way back when I had to call a futures broker to make trades, and she told me about this. As Mark said, it was shorthand when every tick was important, and sometimes they called them “points” instead of “ticks.”

  • phantomflash

    Thanks – I had to customize it a bit, but it came out pretty good I think. Actually from Texas, but spent 6 years at the good o’l U of A. Many wonderful memories at the Great Desert University, like playing Frisbee when it was 100 degrees F – at midnight! Also love the beautiful desert itself. Unmatched sunsets. Still have a number of friends in Tucson. Bear Down!