Answers to Two Questions Predict the Electoral College Vote Outcome
There have been twenty-two elections since 1932, and answering two questions goes a long way towards predicting the Electoral Vote outcome of these elections:
- Does the incumbent party candidate receive strong support at the party’s nominating convention?
- Is the unemployment rate at the time of the election not rising too fast?
The following analysis will show that answering these two questions provides a better than 85% probability of predicting the Electoral Vote outcome.
Background
Professor Allan J. Lichtman is Distinguished Professor Department of History at the American University in Washington DC. His book on “Predicting the Next President” provides significant insight into the factors that determine presidential elections. Professor Lichtman identifies thirteen factors which he calls Keys (Table 1) that when “turned” determine who will win the next election. “Turned” means that the criteria defining each Key is TRUE for the incumbent party candidate, otherwise, the Key is FALSE. The Keys range from being quantitative and objective to being judgmental and subjective.
Lichtman writes, “The thirteen keys enable forecasters to track electoral prospects better and to incorporate historical experience more fully than does any subsystem (fewer keys) or any expanded system (more keys).” However, he also recognizes in his book that just two Keys are needed to achieve high Popular Vote prediction accuracy: The Nomination Contest Key and the Short-Term Economy Key. If both Keys are TRUE, then the incumbent party candidate usually wins, but if either Key is FALSE, then the challenging party candidate usually wins.
Per the Merriam-Webster dictionary, Occam’s razor is “a scientific and philosophical rule that … the simplest of competing theories be preferred to the more complex …” In Data Science this means using as few assumptions and variables as possible, which is known to lead to solutions that are more likely to be accurate when presented with new data.
This analysis will focus on the Electoral Vote outcome and on just these two Keys.
Quantifying the Nomination Contest and Short-Term Economy Decision Thresholds
One needs to decide when to turn each Key TRUE or FALSE. The analysis will use the historic data from the twenty-two elections since 1932 to set decision thresholds for each Key.
Figure 1 plots the twenty-two elections since 1932 in terms of two factors, the year-over-year change in unemployment rate at the time of the election versus the percentage of 1st ballot votes received by the incumbent presidential candidate at that party’s nominating convention. The 1st ballot voting data comes from multiple sources available on the web, and the unemployment rate data comes from the Bureau of Labor Statistics. The figure also shows the elections won by the incumbent (orange dots) and the elections won by the challenger (green diamonds).
Lichtman defined the Nomination Contest Key as follows: “there is no serious contest for the incumbent-party nomination”, which means, “an uncontested nomination is one in which the nominee wins at least two-thirds of the total delegate vote on the first ballot of the nominating convention.” If the candidate exceeds this threshold, the Key would be turned TRUE thus favoring the incumbent. As you can see in the figure, the orange dots signifying an incumbent win are clustered to the right of two-thirds on the horizontal axis. Lichtman’s judgement appears to be valid, and the analysis below will confirm that.
Lichtman defined the Short-Term Economy Key as follows: “If the overwhelming public perception is one of the economy in recession, then the key should be turned against the party in power, even if the economic statistics might suggest a more ambiguous situation.” How do you implement this latest definition in an objectively repeatable way?
Let’s start with the official designation of recessions. From the United States Bureau of Economic Analysis website, “The designation of a recession is the province of a committee of experts at the National Bureau of Economic Research (NBER), a private non-profit research organization that focuses on understanding the U.S. economy.” NBER’s chronology of expansions and contractions (recessions) is listed in Table 2 along with Lichtman’s designation of Short-Term Economy Key turns. Recall that TRUE here means that the economy is in an expansion thus favoring the incumbent party.
With the exception of three elections, Expansion aligns with True and Contraction aligns with False. The 1948 election occurred at the peak of the expansion. NBER has recessions ending in July 1980 and March 1991, which is 4 and 20 months, respectively, before the corresponding elections. NBER did not announce the end of these recessions until July 8, 1981 and December 22, 1992, respectively. The official pronouncements were not until after the elections.
Why would the Key be turned FALSE for these two elections? Per NBER’s website, the peak in unemployment rate tends to lag the official end of a recession, and unemployment would be one way the electorate perceives the state of the economy. The Bureau of Labor Statistics has monthly unemployment rate data going back to 1948 (https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm) and yearly data going back to 1929 (https://www.bls.gov/opub/mlr/1948/article/pdf/labor-force-employment-and-unemployment-1929-39-estimating-methods.pdf).
One way of numerically implementing “public perception…of the economy” is to look at the year-over-year change in the unemployment rate at the time of the election. Unemployment going down is a good sign, whereas unemployment going up is a bad sign, but where do you place the decision threshold?
Based on the twenty-two elections since 1932, Figure 2 shows where to place both the Nominating Convention and Short-Term-Economy decision thresholds ahead of the 2020 election. The placement shown maximizes predictive accuracy, with just one incorrect prediction. Qualitatively, if the incumbent party strongly supports their candidate on the first ballot and the unemployment rate has not risen too much over the prior year, then the incumbent party candidate wins the Electoral College vote and loses otherwise. Quantitatively, if the 1st ballot vote is less than or equal to 67.5% and year-over-year change in unemployment rate is greater than or equal to 4.3%, then the challenger wins the electoral vote nine out of nine times. If the 1st ballot vote is greater than or equal to 75% and year-over-year change in unemployment rate at the time of the election is less than or equal to 2.6%, then the incumbent wins the electoral vote twelve out of thirteen times. If the data falls within these boundaries, then the situation is ambiguous.
The incorrect call is the 2000 election, which came down to which candidate won Florida. After various recount efforts driven by the state’s law and judicial rulings, the US Supreme Court in a 5-to-4 decision stopped the recounts which effectively awarded the Presidency to the challenging Republican party and George Bush. The incumbent Democrat Vice President Gore won the popular vote by 500,000 votes, but Bush won the electoral college by five votes, 271 to 266.
At first blush, one might conclude that the prediction accuracy is twenty-one out of twenty-two tries, but this would be misleading. This is the data we have going into the 2020 election, but to be as realistic as possible, we cannot use the data for any particular election to predict that election.
With just twenty-two data points, a way to handle this is as follows:
- Hypothesize that what determined the election’s results in 1932 also determined the 2016 election’s results, and all of the elections in-between. This hypothesis directly follows from Lichtman’s research, where his “study of history shows that a pragmatic American electorate chooses a president according to the performance of the party holding the White House … If the nation fares well during the term of the incumbent party, that party wins another four years in office; otherwise, the challenging party prevails.” Following this hypothesis allows us to predict the election outcome in any one year using the Keys and outcomes from the other twenty-one years. In predictive analytics, this is called leave-one-out cross validation, and it is a technique used when there isn’t a lot of data.
- Apply leave-one-out cross validation to the 1st ballot and year-over-year change in unemployment rate data
- Find the threshold values that maximize win prediction accuracy
- Apply those thresholds to the election year in question
When you do this, eighteen of the twenty-two elections result in the decision thresholds just described, and seventeen of the eighteen elections are correctly called. However, the 1948, 1968, 1972 and 1992 elections are ambiguous. Figure 3 depicts the 1948 election as an example of how the ambiguity manifests itself. Not having the 1948 orange dot widens both the 1st ballot vote and year-over-year unemployment rate threshold ambiguity areas. The 1948 value plots within the uncertainty bands. A similar situation occurs with the other three elections.
With only eighteen unambiguous trials, it is possible that the system has a win probability lower or higher than 94% (17 correct outcomes out of 18 elections). The 95% binomial confidence band for 17 successes out of 18 trials is 72.7% to 99.9%.
Summary
The analysis has objectively quantified how to answer the two questions that predict the presidential election outcome with an accuracy of 72.7% to 99.9%.
- Does the incumbent party candidate receive strong support at the party’s nominating convention? The answer is YES if the 1st ballot vote is 75% or greater. The answer is NO if the vote is less 67.5% or less. Values in-between are ambiguous.
- Is the unemployment rate at the time of the election not rising too fast? The answer is YES if the year-over-year change in unemployment rate at the time of the election is 2.6% or less. The answer is NO if the change is 4.3% or greater. Values in-between are ambiguous.
The prediction algorithm is simple. If the answer to both questions is YES, then the incumbent party candidate is likely to win the Electoral College vote. A NO answer to either question means the incumbent candidate is likely to lose. The data does not support a prediction if the answer to either question is ambiguous.
The algorithm and decision thresholds can now be applied to the 2020 election.