More fun with recount statistical analysis

Thu Mar 27, 2014 at 23:14:51 PM EDT

I’ve previously explored the statistics of recounts, trying to get a handle on how much we can expect a close election result to change, if the election night tabulation errors are carefully corrected. Based on data collected by the Michigan Election Reform Alliance, in two elections in Allegan County, the answer seems to be about 5 votes per thousand.

Now, I’ve obtained similar data, via the State Bureau of Elections, collected during a 2012 circuit judge recount in Muskegon County. The overall message is very similar, but a few nuances appear.

As I explained in the original post, our analysis should focus on the square of the observed discrepancies in each precinct, NOT the net change. If positive and negative errors are roughly balanced, they will tend to cancel, particularly as we consider larger number of ballots, which will give us a falsely reassuring estimate of the number of miscounted ballots.

Of course, if you’re one of the activists gathered at City Hall watching the ballots recounted, hoping to pick up enough votes to overturn a narrow election-night loss, the only thing you care about is the net change; you don’t care about statistical theory. But I’m not concerned here with any particular election; I’m trying to develop some rules-of-thumb to assess when a recount is likely to make a difference, as opposed to situations where it’s likely to be a waste of time and money. For my purposes, the SQUARE of error found in each precinct is the best measure, and represents an estimate of the total number of errors.

The MERA data pointed to an error of 5.0 per thousand ballots, with no major difference between AV’s and walk-in votes.

The new data, from the Muskegon recount, yields an estimate of 3.9 mistakes per thousand, with a higher rate among AV’s (5.2) than among walk-ins (3.5 per thousand). Another major difference was that the MERA data suggested the positive and negative errors were roughly balanced, while the recount showed about twice as many negative errors (votes which weren’t counted but should have been) as positive errors (non-votes counted when they shouldn’t have been).

From the Muskegon recount:

Kostrzewa Smedley

absentee voters 7299 5158

Votes gained during recount +20 +9

Votes lost during recount -2 -4

Walk-in voters 20989 23351

Votes gained during recount +30 +29

Votes lost during recount -10 -7

Keep in mind that these tallies DO NOT INCLUDE errors which cancelled out in a given precinct, since there was no way to detect such offsetting errors, which probably amounted to half the total.

There are so many caveats that I almost feel that publishing anything is a mistake, since the underlying data is so shaky. I’m confident that I’ve learned something real, but I’m also aware I have a lot more to learn. Here are a few concerns:

We don’t know what sort of statistical model is appropriate, and we don’t have enough data to test one specification against another. All we can do is look for a model which “sort of” fits the data, and be ready to discard that model for a better one as soon as we see evidence of mis-fitting.
We can’t be sure the errors are randomly distributed. It may be that certain precincts had a large number of errors because of the way the printed portions of the ballot fell onto creases in that particular precinct or township or whatever.
We can’t be sure the errors are independent. They seem to be, but we don’t have enough data for a rigorous test.
There are three different methods in use for printing and counting ballots in Michigan. One version was used in Allegan, while a different one was used in Muskegon. It may be the different methods of printing and counting ballots have completely different statistical properties, and shouldn’t be averaged together or even compared. We don’t have enough data to say.

Eventually, if I collect enough data, I may be able to say something clearer about these problems. But for the present, I’d say the error rate seems to be roughly 4 to 5 ballots per thousand, that it seems to be randomly distributed among candidates and precincts, and that they may predominantly involve overlooking votes rather than counting too many.

Finally, it remains good advice that if the difference between two candidates is less than twice the square root of the sum of their votes divided by 200, there’s a real chance a recount will change the result.

More fun with recount statistical analysis

Comments

One response to “More fun with recount statistical analysis”

Leave a Reply Cancel reply