The Winner of the 2008 United States Presidential Election: Statistics

The polls have closed, the votes have been counted, and a clear winner has emerged from this election: statistics.  Okay, Barack Obama also won, but everyone else is covering that story.  So, let’s cover the real victory here, a victory for mathematics!

Statistics is, broadly defined, the branch of mathematics that deals with taking an incomplete set of data and extrapolating how a complete set of data would look.  Statistics works, period.  The way data behaves is well studied and as long as you follow the correct methodology, statistics can give you some powerful insights.  However, it is easy to misinterpret the results that statistics gives you, which is a rampant problem.  As Benjamin Disraeli said, “There are three kinds of lies: lies, damned lies, and statistics.”

Despite the unintuitive nature of statistics, it does work as long as it is used properly.  Despite what your instincts may tell you, a poll of a thousand random people is enough to get an idea of how the whole will vote.  If statistics works, why has polling been so often wrong?  In 2004, Kerry was ahead in the polls when election night rolled around, but he lost.  So, what happened?

The polls make many mistakes.  Well, they aren’t necessarily mistakes, per se, but they don’t demonstrate the right thing.

The first thing they do is they ask the wrong question.  Most polls ask the question, “If the election were held today, who would you vote for?”  Well, that’s not a question anyone really cares about, because the election won’t be held “today”; it’ll be held on the first Tuesday after the first Monday of November.

The second thing they do is report a number that you shouldn’t care about.  Polls report the percentage of those polled who will vote for a candidate, but what you really want to know is the chance that a given candidate is ahead of the other candidate.  If a poll reports that Candidate A has 49% of the vote and Candidate B has 51% of the vote, you really want to know the chance that Candidate B is actually leading.  This issue comes down to a lack of understanding of a term that gets thrown around a lot: margin of error.

The margin of error is a radius around the reported value.  With a certain probability (called the confidence interval, which is almost always 95%), the true value lies within that radius.  So, in our aforementioned example, if the margin of error is 3% points, there is a 95% chance that the true value for Candidate A is somewhere between 46% and 52% and the true value for Candidate B is somewhere between 48% and 54%.  As you might have noticed, there’s a pretty big overlap there.  So, there’s a definite chance that the true values are Candidate A with 52% of the vote and Candidate B with 48% of the vote.  So, while our poll showed Candidate B in the lead, Candidate A is somewhat likely to be in the lead as well.  The real value you want is the chance that Candidate B is actually in the lead over Candidate A, which I have never seen reported in a poll.  This isn’t really a flaw, so much as a report of the wrong piece of information.

Third, statistics largely depends on the samples being random.  While it seems like simply having a computer pick a thousand random numbers is sufficient, it often isn’t.  For one, many people who are called by polls don’t want to take the survey.  Second, most polls still rely on landlines, which are phasing out of style.  Many of the younger generation uses their cell phones as their primary point of communication.  There are other issues around who gets polled, but the end result is you don’t always get an unbiased, truly random sample of people.

So, if all of these polls are fundamentally flawed in one or more ways, then it’s all useless, right?  No!  Here is where the beauty of statistics comes in.  The error that is introduced by these various flaws is well-behaved and well-studied by statistics.  Furthermore, over enough polls, these flaws tend to average out to nothing.  So, if you get a lot of bad polls and add them together, you can get a good poll.  Crazy, I know, but it works.

Up until now, this sort of meta-analysis of statistical data hasn’t been accessible to the average voter.  Well, in the 2008 election cycle, that changed.  In an earlier post, I talked about a site called FiveThirtyEight.  This site is run by a statistician who knows these things as well as I do.  He took all the polls and used them to create a single good statistical indicator of the election results.

And now that the election is over, we can see how well he did.  What follows is the election map as it has been called after polls closed followed by FiveThirtyEight’s prediction, based on his methodology.

Wow.  In large part, the map is spot on.  Indiana did go for Barack Obama, but his prediction only had it coloured as light red, which meant it leaned McCain, but was not definite.  This shows that statistics, when properly used, has incredible predictive power.  Nate Silver, the person who runs FiveThirtyEight, will certainly have a career in political analysis and he has my personal thanks for using statistics properly.

3 Responses to “The Winner of the 2008 United States Presidential Election: Statistics”

  1. Toby Clise Says:

    All of this talk talk talk.. and no action! As a professional blogger, I do tend to come across a good thing from time to time. How can you spend all day on the internet, responding to Blogs? Making money online is not magic. However, you also don’t want to try to reinvent the wheel. A lot of people supposedly make money by selling stuff on eBay, posting ads on search engines, auctioning off their stuff or even trading stocks. When you work online, you work for yourself. I found a piece of software that is very very easy to use and it basically just helps you setup a point and click, push button type internet business that you can run from your home in just a few hours a day. Don’t even worry about price because even by my standards, this thing is dirt cheap right now. Although I do think I read where this is an introductory price that could go up pretty soon. Using the software is as easy as opening up your inbox and writing an email. There is nothing complicated about it and it comes with a complete set of video instructions. You can use the “step-by-step” training videos and instructions to easily setup and profit from the software fast! Then you can get paid ‘month after month’ WITHOUT a product, website or blog! I think that if I can do this, anyone could do this. Go here to see what I’m talking about: http://cash411.info/go/to/202

  2. anderson windows igloo cooler replacement latches san jose Says:

    Wow, tɦat’s what I was looking for, what a information! existing here at this
    website, thanks admin of this site.

    My homepagе :: anderson windows igloo cooler replacement latches san jose

  3. O hai let me wanna-be! pe Trilema - Un blog de Mircea Popescu. Says:

    [...] are seeing this because your blog was recently used as part of a DDOS attack against [...]

Leave a Reply