Tuesday, December 18, 2007

Understanding "Margin of Error"

by Ali-Asad

"Statistics is like a bikini. What they reveal is suggestive; but what they conceal is vital."
Aaron Levenstein


The media constantly feeds us a barrage of numbers from opinion poll surveys on vital topics such as; the war in Iraq, presidential approval and who will win in Iowa and New Hampshire. But little emphasis is put on an accompanying number; the margin of error. Firstly, I must point out that I am just as guilty of this as anyone else. In my 'Edwards in Iowa' post below I suggested that former Senator John Edwards may cause an upset in Iowa because supporters of the lesser-polling candidates prefer him as their second choice. The numbers:

Edwards - 29%
Obama - 24%
Clinton - 15%

So, what was the margin of error? The Rasmussen report cites the poll's margin as being "+/- 3%". Now what does that really mean?

Basically, if we were to repeat this poll an infinite number of times (math/stat people love thinking they can do things forever), 95% of our poll results would be within 3% of the results of this specific poll.

But how did we get all this? Consider that when a pollster asks a question there is a probability (p) that this person will respond yes and (1-p) that they will respond no. Pollsters seek to estimate the true (p) of the population based on the (p) of the sample they obtain. This is a Bernoulli random variable (also called a Bernoulli trial). When we expand this to a poll to (n) people we get a Binomial random variable. And if (n) is large, we can say that this random variable is has an approximate normal distribution (ie the distribution follows the infamous bell curve). This fact is important (thank the statisticians for this) because it allows us to conduct confidence intervals; basically allowing us to find an interval based on a desired level of confidence in the interval. Most polls use a confidence level of 95% and the margin of error is usually calculated as 1/√(n).

Recapping the 2 main points;

1. If we kept repeating our poll, 95%(or whatever confidence level we choose) of our poll results would be within the margin of error.
2. The above rests on the assumption that the answer to the poll question has an approximate normal (bell curve) distribution.

So take opinion poll results with a touch of salt, and especially if they are about the race among the democrats in the Iowa caucuses. Recall the unique caucus methodology of the democrats in Iowa in the post below. A true opinion poll that reflects these nuances would have to; take a poll of who 'likely' caucus-goers support (just as in normal polls). Next we would have to take into account caucus-goers shifting allegiances through persuasion. This factor could be modeled by taking factors such as caucusing experience and volunteer training and constructing a distribution of what % of candidate support will be lost to which candidate. After all that, we would then cut all candidates below the 15% threshold, and determine who their supporters would now support using the shifting allegiances distribution.
That could make for a very long research paper, or an interesting blog post...

And that's jus' the tip.

Comment below.


References

Dems tight poll in Iowa, USA Today

Margin of Error & Confidence Interval, Concered Journalists

Rudimentary Statistics, Professor's Pool of Polls



0 comments: