“Margin of Error” in polls

An introduction to some elementary statistical concepts, specifically aimed at explaining how to compare percentages within and between polls. I cover some of the stuff you barely understood when you were struggling through Methods courses, and which you have long forgotten. Fortunately, there’s no homework, and no grading.

UPDATE: Margin of error when looking at changes from one poll to another.

Grebner :: “Margin of Error” in polls

I’m following up a discussion in another thread:

http://www.michiganliberal.com/showComment.do?commentId=28228

Everybody already knows what “margin of error” means for a single percentage in a given poll: it tells you the range in which the “true” value which the poll is attempting to estimate will lie at least 95% of the time.

You can easily calculate such a “95% confidence interval” for a simple percentage reported in a poll – it’s almost exactly equal to the reciprocal of the square root of the sample size. Taking taking a few shortcuts, if a poll of 400 Democratic primary voters shows Hillary Clinton with 40%, take the square root of 400, which is 20. Take the reciprocal, which is 0.05 or 5%. (The precise number is more like 4.8% – the rough methods shown here are slightly conservative.)

But knowing Hillary has 40% isn’t as important as knowing whether she’s really ahead of Barack Obama (who polled 30%) or whether the difference may be statistical noise. As the previous discussion asked, if Hillary’s percentage could be 5% too high, couldn’t Barack’s be 5% too low? This seems especially likely, since her “extra” votes must have been taken from somebody else’s total.

To understand the comparison of percentages between choices posed in a single question, we have to change our thinking from “deviation” to “variance” which is defined as the average squared deviation. This is necessary because statistical mathematics deals almost exclusively with squared errors, not actual errors, for reasons which lie buried deep in theory.

The “variance” of the difference between two simple (“binomial”) variables like the candidates’ percentages is equal to the sum of their variances, plus twice their covariance. “Covariance” is closely related to “correlation”, which you may dimly remember is a measure of the degree to which two variables tend to move together (in which case they are positively correlated) or move in opposite directions (in which case they are negatively correlated).

When you look at the answer to a poll question, it’s obvious that each candidate’s support is negatively correlated with each of the others. If one candidate goes up, somebody else is likely to go down. If a statistical fluke gives too many votes to Hillary, it’s likely to have taken them from Barack.

Unfortunately, I don’t know an authoritative method for estimating the covariance in this circumstance, but perhaps my critics will provide one. Until we hear from them, I assert the correct formula (using H for Hillary’s support, and B for Barack’s) is to multiply the MOE for a single candidate by:

Sqrt(2 + B/(1-H) + H/(1-B))

which in this example would be

Sqrt(2+ 30/60 + 40/70) = 1.75

Don’t worry if you didn’t follow that calculation – I didn’t follow it either, and I bet I left out at least one factor somewhere. The only claim I make is that it produces reasonable results for every pair of numbers I try.

Back to the original question: what’s the MOE for comparing Hillary to Barack when a poll with 400 respondents shows her winning 40% to 30%?

Since the MOE for a Hillary (or Barack) considered alone is +/- 5%, the MOE for the difference between them is about +/- 8.75%.

In general, the MOE for comparing two candidates is close to twice the MOE stated by the pollster for a single candidate, unless both of the candidates are well below 50%. For complete non-entities (if you were concerned, for example whether Dodd will finish ahead of Kucinich) the MOE is only about 1.45 times the individual figure.

So, if we want to know whether a candidate’s lead over another in a poll is real, the margin of error is between 1.5 and 2.0 times as large as the typically stated number.

What about comparing two different polls, to see if there has been a change?

As before, the answer hinges on the variance of the polls’ estimates. In the simplest case, where the two polls have equal sample sizes – and therefor equal stated MOE’s – the variance of the difference from one poll to the next is simply the sum of the variances of the two polls, which is to say: the variance is exactly doubled. Since the MOE is proportional to the square root of the variance, it is simply multiplied by about 1.41.

To expand our previous example, if in the first poll Hillary is ahead of Barack 40% to 30%, and in a second poll, they’re tied at 35% to 35%, should we believe the breathless pundits, when they tell us things have dramatically changed? Well, no.

As we saw before, assuming both polls have samples of 400, the MOE for the size of the lead in either poll is almost 9%, so the MOE for the change in the difference would be slightly larger than 12%.

But let’s be realistic. The standard MOE is calculated as a “two-tailed 95% confidence interval”, meaning that if nothing really changes, random fluctuations in the sample should only reach the MOE in a given direction once out of forty trials. A change in the lead of ten points from one poll to another, with an MOE of 12%, should occur less than one time in ten. If I were working for Barack, and I heard we had caught up to a tie, after trailing by 10 points, I’d be happy, because it would probably mean we were doing better – but I wouldn’t be sure it was a real change.

Where the sample sizes are different, you wouldn’t go wrong by multiplying the larger MOE (from the poll with the smaller sample) by 1.4.

The published “Margin of Error” is generally slightly overestimated. The standard methods used for calculating MOE are very conservative, which I guess serves as a sloppy compensation for its inappropriate use comparing differences among candidates or differences from one poll to another.

Still, it seems worthwhile stating the causes of the overestimation. First, and generally least important, is the fact that to some extent every poll is also sort of a “census”. I sometimes have a client who wants to conduct a poll in a small area – say the City of Perry. Let’s say there are only 500 likely voters, and the polling firm ends up interviewing 200 of them. Because they were chosen to be representative of all likely voters, we use their opinions to estimated the remaining 60% – but we aren’t “estimating” the opinions of the 40% we talked to – we actually know what they think. In this case, we say the poll had a “sampling factor of 40%”, meaning the variance of our estimate is only 60% as large as it would have been if the universe from which we drew our sample was infinitely large. In this example, the improved accuracy from conducting the partial census allows us to reduce our MOE by 23%. (The square root of .60 is about .77.)

Non-statisticians tend to overestimate the effect of sampling fraction on MOE. For a typical poll, if the universe being sampled is over 2000, the increase in accuracy is too small to notice.

A second source of improved accuracy is due to the use of stratified samples. For example, let’s imagine a poll conducted in a City of Flint mayoral election between a black candidate and a white candidate, each of whom received overwhelming support from voters of their own race. If a 400-interview poll were conducted using a simple random sample, the black percentage interviewed might range anywhere from 52% to 62%, which would of course be likely to be reflected in the level of support for each of the candidates. But because Flint is so sharply segregated, if we assign a proportionate number of interviews to each precinct, in effect we can force the entire poll to accurately reflect the black percentage of the likely vote: 57%. (This is for illustration only – I haven’t looked at the actual percentages in Flint in the last year or two.) By reducing the variability of the racial distribution, we also reduce the variability of the candidate preference estimate. Other situations which benefit from statification include contests where quotas can be assigned to geographic regions, gender, political party, or age, if those divisions are strongly correlated to candidate preference. Because the benefit is difficult to explain or calculate, it’s generally simply ignored when calculating the MOE.

Finally, the standard method of calculating MOE assumes each candidate will receive precisely 50% of the vote, which is the value for which a simple (“binomial”) variable has the maximum variance. At values either higher or lower than 50%, the variance (and MOE) decline. This effect is especially strong for values close to 0% or 100%. For a sample with 400 interviews, if our MOE is +/- 5% for a candidate at 50%, the MOE is 4.9% for a candidate with either 40% or 60% support , 4.6% at 30% or 70%, 4.0% at 20% or 80%, 3% at 10% or 90%, and approximately 0.5% at 1% or 99%. This makes sense if you think about it: If exactly zero of the people interviewed say they’re supporting Mike Gravel for president, it’s not likely that another poll would have found 20 such people (5%).