UPDATE: New analysis of New York Times "Get Out The Vote" explanation for missing votes, here.

Comparison of Precinct Return Data between Duval County and Lee County, Florida


In the wake of the Florida election controversy, Duval County has often been mentioned as a county with an unusually high number of invalidated ballots for the presidential ticket.

This document compares the precinct return data of Duval County with the data from Lee County for the Nov. 7, 2000 general election. These two counties were chosen because they are comparably sized, have a similar balance of Republicans and Democrats, use the same voting mechanism, and (most importantly) both counties make their individual precinct data available on the Internet.

The raw precinct return data for Duval County, Florida, for the general election of November 7, is available from the county web page for election results. The raw return precinct data for Lee county is available from the Lee County Supervisor of Elections page. This raw data is in a format which is not spreadsheet friendly.

I reformatted the data into standard CSV format files (which you can download here: Duval County, and Lee County), and loaded them into a standard spreadsheet program. From this point it is a straightforward matter to analyze the raw data.

County Comparison

This table compares several key data between Duval County and Lee County:

Duval County Lee County
Voting Mechanism Punch Card, Centrally Tabulated Punch Card, Centrally Tabulated
Precincts 268 198
Ballots Cast 291,110 184,361
Bush Votes 151,832 (52.16%) 106,123 (57.56%)
Gore Votes 107,664 (36.98%) 73,530 (39.88%)
Invalid Presidential Ballots 26,871 (9.23%) 4,593 (2.49%)

Votes Distribution by Precinct Composition

The following charts break down the precincts according to the percentage of votes cast for George Bush. The purpose is to illustrate the precinct composition of the county (Republican vs. Democratic) and the relationship between precinct composition and missing votes.

These first two charts illustrate the party composition of the precincts that are generating the most votes. Duval County is divided between a large number of predominantly Republican precincts, with a smaller number of extremely Democratic precincts. Lee County, on the other hand, is much more homogenous, with a large, Republican-leaning centrist base, and nearly no votes at the extremes.

These next two charts illustrate the relationship between precinct composition and missing votes. In Duval County, there is readily apparent correlation between Democratic votes and missing votes. In Lee county, however, the relationship is not so clear. The precincts in the 0-40% Republican range and the 80-100% range do not contain enough votes to get reliable statistics. The precincts in the 40-80% Republican range show no clear trend for missing votes.

Regression Analysis

Linear regression is a technique used to determine how close the relationship is between two independent variables. In this case we will look at the relationship between missing votes and Bush votes, as well as the relationship between missing votes and Gore votes.

It is well known by now that voting is not a perfect process. There will always be a certain small percentage of ballots for any given office that will not reflect a valid choice of candidate. This can be the result of errors by the voter during the voting process, errors introduced during the counting process, or it can simply be due to the fact that the voter deliberately abstained from making a choice.

The standard measure for the strength of the relationship between two related variables is the correlation coefficient, a number that ranges between 1 and -1. A correlation coefficient of 1 means a perfect relationship, a correlation coefficient of -1 means a perfect reverse relationship, and a correlation coefficient of 0 means no relationship at all.

If there is an error in the voting or counting process that affects Democrats and Republicans equally, then the correlation coefficient between invalid ballots and Democratic ballots, and between invalid ballots and Republican ballots will be approximately the same.

The following two charts show a "scatter plot" of all the precincts in Duval county and in Lee county. The precincts are placed in order from left to right according to the number of invalid presidential ballots found in that precinct. Then each precinct is given a red diamond to indicate the number of Bush votes in that precinct, and a grey triangle to indicate the number of Gore votes in that precinct. This lets you visualize the relationship of missing votes, Bush votes, and Gore votes across the entire county.

The first thing to notice about these two charts is the much higher number of missing votes in Duval County compared to Lee County. The entire Lee County chart could fit in the left-most 25% of the Duval County chart.

The second thing to notice is the enormous difference between Bush votes and Gore votes in the Duval County chart. In Duval county the correlation coefficient between missing votes and Gore votes is 0.79, whereas the correlation coefficient for Bush votes is an amazing -0.15. That means that there is almost no relationship between Bush votes and missing votes.

The strong correlation between Gore votes and missing votes is particularly interesting, because Duval county is so diverse. Approximately the same number of Gore votes came from precincts with less than 10% Bush votes as came from precincts with 60-70% Bush votes. Presumably, these Democrats are demographically distinct, yet they are correlated with missing votes at approximately the same ratio (the blue line on the Duval chart represents an ideal 4-to-1 ratio). In fact, whether the precinct is large or small, or Republican or Democrat, there are approximately 4 Gore votes for every missing vote throughout Duval county.

Lee County, on the other hand, shows a slightly smaller Gore correlation coefficient (0.78), and a much greater Bush correlation coefficient (0.65). The distribution shows that for each missing vote, there are slightly more Bush votes than Gore votes. This is exactly what would be expected in a county that voted 58% Bush and 40% Gore.

This table shows the linear regression data for the two counties:

Duval County Bush Gore
Correlation Coefficient -0.15 0.79
Slope -0.02 0.22
Intercept 111.24 12.7
Lee County Bush Gore
Correlation Coefficient 0.65 0.78
Slope 0.03 0.05
Intercept 8.19 2.92

So, what happened in Duval county?

Well, we know that about 20,000 votes for president disappeared from Duval county with no particularly good explanation (the remaining 6,000 missing votes in Duval county would easily be explained by the punch-card voting system used both in Duval and Lee counties).

We also know that the disappearance of these 20,000 votes is strongly related to Democratic votes, and has almost no relationship to Republican votes. We know that this relationship is relatively constant regardless of which part of Duval County the Democratic voters are registered in.

We also know that Duval County (like Lee county) tabulates the punched ballots centrally, rather than tabulating the ballots at the precincts.

One explanation that fits all the evidence is that some aspect of the tabulating process in Duval County is biased against Democratic votes. Another explanation is that 20% of all Democratic voters throughout Duval County, and only in Duval County, suffer from an unexplained voting impediment which is not evident in other counties around Florida, such as Lee County.

There may be other explanations. Please feel free to suggest them, or make any other comments or suggestions by using the link below.

Send Comments