Statistics is, broadly defined, the branch of mathematics that deals with taking an incomplete set of data and extrapolating how a complete set of data would look. Statistics works, period. The way data behaves is well studied and as long as you follow the correct methodology, statistics can give you some powerful insights. However, it is easy to misinterpret the results that statistics gives you, which is a rampant problem. As Benjamin Disraeli said, “There are three kinds of lies: lies, damned lies, and statistics.”

Despite the unintuitive nature of statistics, it does work as long as it is used properly. Despite what your instincts may tell you, a poll of a thousand random people is enough to get an idea of how the whole will vote. If statistics works, why has polling been so often wrong? In 2004, Kerry was ahead in the polls when election night rolled around, but he lost. So, what happened?

The polls make many mistakes. Well, they aren’t necessarily mistakes, per se, but they don’t demonstrate the right thing.

The first thing they do is they ask the wrong question. Most polls ask the question, “If the election were held today, who would you vote for?” Well, that’s not a question anyone really cares about, because the election won’t be held “today”; it’ll be held on the first Tuesday after the first Monday of November.

The second thing they do is report a number that you shouldn’t care about. Polls report the percentage of those polled who will vote for a candidate, but what you really want to know is the chance that a given candidate is ahead of the other candidate. If a poll reports that Candidate A has 49% of the vote and Candidate B has 51% of the vote, you really want to know the chance that Candidate B is actually leading. This issue comes down to a lack of understanding of a term that gets thrown around a lot:* margin of error*.

The *margin of error* is a radius around the reported value. With a certain probability (called the *confidence interval*, which is almost always 95%), the true value lies within that radius. So, in our aforementioned example, if the margin of error is 3% points, there is a 95% chance that the true value for Candidate A is somewhere between 46% and 52% and the true value for Candidate B is somewhere between 48% and 54%. As you might have noticed, there’s a pretty big overlap there. So, there’s a definite chance that the true values are Candidate A with 52% of the vote and Candidate B with 48% of the vote. So, while our poll showed Candidate B in the lead, Candidate A is somewhat likely to be in the lead as well. The real value you want is the chance that Candidate B is actually in the lead over Candidate A, which I have never seen reported in a poll. This isn’t really a flaw, so much as a report of the wrong piece of information.

Third, statistics largely depends on the samples being random. While it seems like simply having a computer pick a thousand random numbers is sufficient, it often isn’t. For one, many people who are called by polls don’t want to take the survey. Second, most polls still rely on landlines, which are phasing out of style. Many of the younger generation uses their cell phones as their primary point of communication. There are other issues around who gets polled, but the end result is you don’t always get an unbiased, truly random sample of people.

So, if all of these polls are fundamentally flawed in one or more ways, then it’s all useless, right? No! Here is where the beauty of statistics comes in. The error that is introduced by these various flaws is well-behaved and well-studied by statistics. Furthermore, over enough polls, these flaws tend to average out to nothing. So, if you get a lot of bad polls and add them together, you can get a good poll. Crazy, I know, but it works.

Up until now, this sort of meta-analysis of statistical data hasn’t been accessible to the average voter. Well, in the 2008 election cycle, that changed. In an earlier post, I talked about a site called FiveThirtyEight. This site is run by a statistician who knows these things as well as I do. He took all the polls and used them to create a single good statistical indicator of the election results.

And now that the election is over, we can see how well he did. What follows is the election map as it has been called after polls closed followed by FiveThirtyEight’s prediction, based on his methodology.

Wow. In large part, the map is spot on. Indiana did go for Barack Obama, but his prediction only had it coloured as light red, which meant it leaned McCain, but was not definite. This shows that statistics, when properly used, has incredible predictive power. Nate Silver, the person who runs FiveThirtyEight, will certainly have a career in political analysis and he has my personal thanks for using statistics properly.

]]>

When delivering a pick-up line, the desire is to get a positive result, i.e. getting a phone number, “hooking up”, or what have you. Current pick-up line theory rests on the premise of delivering a clever line followed by the request, under the premise that showing yourself as a humourous individual will result in a better than average chance of getting a positive result.

For example, a classic pick-up line is “I lost my phone number. Can I have yours?” However, the problem with these pick-up lines is that they still offer the recipient the chance of a negative response. Furthermore, overuse of these pick-up lines has led to them being cliché and is more likely to make the deliverer of the pick-up line look pathetic rather than funny.

The solution to the stale theory in the field of pick-up line creation is to apply mathematics to build a more optimal pick-up line. Specifically, we can use logic to build a pick-up line which always evaluates to a positive result, regardless of the answer.

“If I were to ask you for your phone number, would your answer be the same as the answer to this question?”

If the recipient answers “no”, then that implies that she would say “yes” to a query of her phone number. Similarly, if the recipient answers “yes”, it also implies she would say “yes” to a request for her phone number. Thus, we have created a logically perfect pick-up line. No matter how the recipient of the pick-up responds, he/she is giving you a positive result.

Of course, we are assuming a two-value Boolean logic system with the values “yes” and “no”. Reality tends to favour the three-value logic system when it comes to pick-up lines: “yes”, “no”, and a slap to the face.

]]>Okay, maybe a bit more than seven words, but those are the seven that need to be said.

One of my favourite statistics quotes is also one of his funniest lines.

Just think of how stupid the average person is, and then realize half of them are even stupider!

George Carlin

George Carlin was my favourite comedian. His comedy was a unique and insightful mixture of crass language and deep social commentary. He had a unique way of pointing out exactly was wrong with the world and making it funny at the same time.

Many modern comedians make it their focus to satirise current events, but George Carlin perfected it long before any of them were standing in front of their first audience, delivering their jokes.

George Carlin will be sorely missed by yours truly and I’m sure a multitude of people around the world. Today is a sad day because the world saw the departure of one of its funniest people.

George, your soul has been flung to the highest roof and may it always be there.

]]>However, I’m not here to talk about Baseball; I’m here to talk about politics. In early 2008, Nate Silver started a site called FiveThirtyEight, a reference to the number of total electoral votes in a modern Presidential campaign. This site proclaims to do, in their own words, “Electoral Projections Done Right.”

After looking at the site quite a bit, I’m going to agree with their tagline. At worst, they are doing electoral projections infinitely better than has been done before, to my knowledge. What makes FiveThirtyEight so much better than other places that do electoral projections? Maths.

The entire site has a strong mathematical bent to it. It isn’t simply an aggregate of polls, but rather a detailed analysis of said polls. Each poll is weighted according to its reliability, which they base on a polls accuracy, sampling size, and recentness of the poll. The polls are then aggregated according to their reliability and then a regression estimate is added to produce an average. In layman’s terms, they’re using some seriously sound maths to analyse the polls. You can read more about their methodology in their FAQ.

In addition to weighted averages of polls, they do election forecasting based on their data. They have trackers that calculate trends in the polling. And they blog about the election too. In short, FiveThirtyEight is one of the most comprehensive and well thought out election blog on the net today and it is mathematically delicious (mathelicious?).

]]>Now, on to the stupid!

Prenatal and Postnatal Exposure to Cell Phone Use and Behavioral Problems in Children. [via The Indepdent]

Mothers of 13,159 children completed the follow-up questionnaire reporting their use of cell phones during pregnancy as well as current cell phone use by the child. Greater odds ratios for behavioral problems were observed for children who had possible prenatal or postnatal exposure to cell phone use.

…

Exposure to cell phones prenatally-and, to a lesser degree, postnatally-was associated with behavioral difficulties such as emotional and hyperactivity problems around the age of school entry.

These associations may be noncausal and may be due to unmeasured confounding.If real, they would be of public health concern given the widespread use of this technology.

Repeat after me: correlation does not imply causation. The researchers of this particular study ran a survey and found a pretty high positive correlation between parents using their cell phones during pregnancy and their kids developing behavioural problems later in life. Until you do a detailed study of the specific interaction between cell phones and pregnancy, a correlation is not sufficient to draw any conclusions about causation. This research found a potential relationship between cell phone use during pregnancy and behavioural problems in kids: that’s great! Unfortunately, their conclusions heavily imply that cell phone radiation causes these behavioural problems, which is a spurious conclusion to say the least.

Now, I am glad that they, at least, did acknowledge that the relationship may be non-causal. However, they should have said that the relationship “may be causal” and assumed a non-causal relationship instead of the other way around. There’s simply not enough information in this study to draw any conclusions about a causal relationship.

They did do some work to remove confounding variables, such as drinking or smoking while pregnant, which is good. However, they failed to realise the multitude of possibilities that could explain this relationship. For example (and I’m not saying these are any truer than their conclusion, but these are alternate possibilities), perhaps that heavy use of cell phones while pregnant is strongly correlated with an inattentive parenting style and inattentive parenting causes behaviour problems. Perhaps the behavioural problems are genetic (so both the mother and child may suffer from these issues) and these behavioural issues cause a tendency to use cell phones a lot more; maybe gossiping or neophilia (love of new things) are symptomatic of these behaviour problems. Perhaps something is causing behaviour problems to crop up in children more often now than in the past and the correlation between cell phone use and these behaviour problems is simply a coincidence, because they both are new. Or, perhaps, as the researchers seem to be fond of believing, cell phone radiation alters the brains of children and causes these behavioural problems. I could go on, but I’ll spare you.

What do all of these these hypotheses have in common? They’re all equally valid and all equally unsubstantiated. The correlation that these researchers found demonstrates one thing and one thing only: there is a correlation. What conclusions you can draw from that correlation requires additional study into the menagerie of possibilities that might explain this correlation.

]]>