Monday, October 17, 2016

Rational Voting Models – The Problem Space

What is the cost of not voting?  What are the costs of voting for Gary Johnson or Jill Stein?  What are the benefits of voting for Gary Johnson or Jill Stein?  At this point you’ve heard conflicting versions of the One Simple Trick to understanding why voting is/isn’t a waste of time.  Maybe the person advancing one of these even chopped up the electoral map into “voting matters here” and “voting doesn’t matter here” regions – pretty impressive precision.  

Whether stated or implied, each rationale for voting/not-voting/voting-third-party/not-voting-third-party is supported by a voting model.  The arguments all have at their core some model of how an individual can convert a vote into a beneficial outcome (or, alternatively, why she will fail if she tries). 

What I keep seeing are models that address or “solve” a small corner of the problem space but fail to appreciate the vastness of the space.  Let me use an analogy of a simpler (well, more well understood at least) type of model to explain what I mean.  Imagine you’re on a team tasked with the question of what metal materials to use to construct an airplane.  Imagine a colleague walks in urging the team that it should use the strongest metal it can find, because durability and safety were ranked primary among customer and company concerns.  This person has a metal they want to propose to us, they can prove it’s the strongest available, and they can prove that the cabin will be able to take more abuse without puncturing/tearing, etc. using some tests they did in a lab.  That may all be true, yet this person’s model of how an airplane works is obviously and fatally incomplete.

It’s not that they don’t have a model at all, and it’s not that their model isn’t internally rational at answering the narrow problem raised – the issue is that the model doesn’t appreciate that, among other things, the plane has to fly.  Modeling the construction of an aircraft requires addressing problems of aerodynamics, cost, safety, durability, comfort, and more.  You can’t model only durability of the aircraft, address that problem, and move on.

Getting an intuitive sense of the size and scope of the problem space in voting is hard, but below I’ll try to outline different aspects of it, so that when someone says “it’s simple, just …” you can respond “here’s why you’re advancing a position which essentially fails to account for the fact that the plane has to fly.” 

None of my answers to the problems below is itself a working model for the entire problem space.  That’s kind of the point.  Plus, humility in a complex domain is a recurring theme here, and I may be wrong and am not even aiming to be complete in the discussions below.  This is, after all, an attempt to outline the problem space.

Problem 1:
Unless you can distinguish among the parts, building something with parts to spare does not mean any one specific part was useless, and by extension it certainly does not mean all the parts were useless

Say it takes 2000 bricks to build a wall between you and your neighbor.  And say you decide to crowdfund with a goal of 2000 bricks on a website that provides no visibility for anyone as to how many bricks have been donated (imagine they all have to donate on a single Tuesday in November and the website doesn’t tally same-day).  If 2500 of your friends and loved ones each donate 1 brick, for a total of 2500 bricks, there is waste, but how much?  What was the return on investment, in good-things-accomplished, for each donating person?

If you look at it from the vantage point of 1 individual, and consider the margin of 500 extra bricks, there is a temptation to say this individual’s contribution accomplished nothing or worse, created waste.  Had they done nothing, the wall would still be built and with less waste.  But unless you can distinguish among the contributors, you may receive this question 2500 times, and you will have to make this argument 2500 times, and your total return will not sum to the gain achieved (1 wall built, with some waste that isn’t that big a deal).  Your local value, repeated over all local vantage points, not summing to the total, is a gigantic red-flag that your model is broken. 

Because you can’t distinguish between the “core” backers and those on the margin, you have to allocate to each participant both the gains and the losses.  You can’t tell who is a loser on the margin and who is a winner far removed from the margin. 

Voting Model I’ve Encountered that Fails to Address this Problem

My favorite blogger - Scott Alexender of SlateStarCodex - posted some expected value math on his blog that I take strong issue with (despite enjoying the rest of the post).  Scott borrows from 538 that the odds the election is decided by 1 vote are roughly 1 in 60 million generally and closer 1 in a billion in California.  Scott assumes the value of a presidential election win in the right direction is worth roughly $300 Billion.  From this, Scott assigns an expected value to voting of $5,000 generally or $300 in California. 

First, this is double counting.  If even 100 million people do this math locally and conclude their vote has $5k value, the total value expected in that system will be $500 Billion, more than what is at stake in the election under the assumptions.  At the very least, Scott needs to divide by the number of voters in the winning coalition to randomly select the 1 person in the Margin group vs the millions in the Core group.  You don’t get to assume you cast the winning vote or else everyone else will make the same assumption, and your model’s values won’t sum.

Second, what about the other 59,999,999 elections where the margin of victory is more than 1 vote?  The country still stands to win or lose $300 Billion based on whether the best candidate wins, and individuals deciding to turn out and vote, and who they vote for, in the aggregate of course determines the outcome.  You have to allocate the $300 Billion wins even when the margin of victory is not 1.  So Scott is undercounting across scenarios and double counting across indiviudals in the 1-vote-margin scenario.

This is a popular model for expected value of voting, the one that says the return on investment or expected value is equal to the probability that the election comes down to 1 vote, multiplied by the massive benefit that casting the winning vote would have if it occurred.  There is an obvious issue here: most elections are not decided by 1 vote, yet the winning side still got their candidate elected and (presumably) got some benefit out of that win.  The margin isn’t the only thing that exists, and unless you actually want a model that is useless in 99.99999…% of elections, you can’t refuse to allocate the gains because the scenario is “outside the model.”  If you want to accurately model ROI/EV, you need to allocate the returns/values obtained regardless of the margin of victory. 

To drive it home further: I’m sure you’ve encountered someone who simultaneously believes 1) that the collective voting result of the State of California is an incredibly valuable asset controlled by the Democratic Party, and 2) each individual vote in the California is meaningless and shows a return on investment of roughly zero because of the size of the typical margin of victory in CA.  This argument puts every voter on the margin, glues them to that vantage point, and then concludes that they are in the Waste group not the Gains group and allocates to them 0 gains.  This is hiding the ball, but some of the smartest people I know do it without realizing what they are doing.

Problem 2:
Everything is Iterated - the winner never takes all in an iterated context

Suppose the IRS was having its last year of tax collection ever, and they knew it.  Would they ever spend $100k of resources coming after someone for $50k in unpaid taxes?  Well, maybe they would, but hopefully we agree it wouldn’t be a fiscally sound decision.  But in the world where it isn’t the last year of tax collection ever, it might make a lot of fiscal sense to pursue enforcement actions that had immediate negative return on investment.  This is common sense in an iterated context.  Reputations matter, deterrence matters, perceived fairness (participants’ and witnesses’) matters. 

In voting, reputations matter (‘we can’t endorse policy X, we will lose Y voters who have a reputation for caring’), deterrence matters (anyone think presidential candidates haven’t been deterred from taking an honest position on a variety of issues such as how religious they are?), perceived fairness (participants’ and witnesses’) matters (turnout in the next cycle can be impacted this cycle).

So how do the models you encounter hide this ball?

Voting Model I’ve Encountered that Fails to Address this Problem:  “Nobody cares about 1 voter’s reputation.”

Ah, so we’re back to Problem 1, where individual contributions are rounded to zero.  Voters are 1 part in a large collective, but just as their vote doesn’t round to zero, neither does their reputation.  Rounding to zero is refusing to believe the plane must fly.  Candidates are strategic about what positions to take on the campaign trail, what to fight for once elected, etc.  If their constituents had zero reputation, the candidates wouldn’t even know what to do to win them over if they wanted to! 

“In our election system, there will always be two candidates, not a multitude, because the stronger coalition wins, and if one coalition breaks apart it only serves to cement the others’ lead.”  Maybe so, but there is some sleight of hand here as well, to the extent the speaker means the same two parties will be in power, regardless of whether you vote third-party or fall in line and vote major party.  First, you have the American Whig Party being replaced by the Republican Party in our nation’s history.  Second, each election showcases different versions of the GOP and Democrat platforms.  A GOP that gets crushed in 2016 will not likely show up with the same strategy in 2020.  Each vote they didn’t get is an expressed preference in some other direction.  The Parties (at individual and organized collective levels) look at those many directions and then make strategic choices to capture those votes.

We already have a multi-party system, as soon as you pick away at the delusion that the 2000 Democratic Party, for example, is the same exact party as the 2016 Democratic Party.  Plus, if the coalition gets weak enough, a new one forms (Whigs replaced by Republicans).  The names aren’t what we care about, and iteration can alter what the names stand for. 

None of this iteration pressure on parties and candidates can be rounded to zero at the individual level, if you want your numbers to sum properly.

Problem 3:
“Irrational Winners” – The Great Red-flag

Smaller elections are easier to visualize, let’s say in a school election the Class President gets to allocate funding to various afterschool clubs.  The “Math, Physics, & Rationality Club” never votes in the class election – why would they, it’s a waste of time.  The “Bible Study Club” votes religiously, so to speak, year after year.  Who do you think gets the most funding? 

Now, if the Math, Physics, & Rationality Club got together and made a pact to all vote, they could get that funding they want.  But if they are disorganized enough, or large and spread-out enough, and can’t coordinate on a such pact (a collective action problem, hmmm), then they each have to decide to “waste their time” with just a tiny amount of hope that others also act contrary to their “rational” principles before anything changes.  “If we can’t coordinate, nothing will change,” they each say to themselves without coordinating.

But wait a second, nobody said the Bible Study Club coordinated, we just said they voted.  Maybe they’re also disorganized, large, and spread-out.  But they just don’t know or don’t care that it is “negative expected value” to vote.  They have this bizarre sense of civic duty and they just fill out a voting slip and place it in the box, year after year. 

If asked to model the individuals in these two clubs, there is a temptation to arrive at the label “Rational Losers” for the Math, Physics, & Rationality Club members and the label “Irrational Winners” for the Bible Study Club members. 

This post isn’t about the definition of “rationality” so much as it about recognizing that internal consistency doesn’t mean a model is “working.”  Here, we have a model that impacts what it is modeling. The Math, Physics, & Rationality Club votes based on its model, but doesn’t stop to appreciate the impact of that recursive element.  Let’s get to some examples of what I mean…

Voting Model I’ve Encountered that Fails to Address this Problem: “Given the initial conditions of people who think like me believing voting is irrational, and the people who think nothing like me feeling duty-bound to vote, it remains irrational to vote.  You can’t magically propagate your beliefs to the rest of the people who think like you, and you can’t magically change the initial conditions.” 

This is the challenge of collective action in a nutshell.  I’m not saying I have a solution for the short term (unless you count this blog post as my best attempt at a first step).  But it’s instructive to notice that we have overcome other collective action problems with negative initial conditions, through shifting the definition of rational behavior – through changing the model.

Do you feel duty-bound to recycle even if the expected value is opaque and the returns are tiny at the individual level?  Do you think your great-grandparents felt that way? 

Do you feel duty-bound to vaccinate your kids even if you don’t live in an area with other unvaccinated kids (meaning your kids are extremely, extremely unlikely to be exposed to the diseases you’re vaccinating against)? 

These are collective action problems in which there has developed an “irrational” duty at the individual level that results in collective rationality.  And whether you toggle your label for these activities from “irrational” to “rational” is less important than the fact that you vaccinate your kids.  You’ve internalized that winning is more important than clinging to yesterday’s model when that model impacts the behavior you’re modeling.

Again, forget the deep dive on the definition of rationality – who cares about one word – I’m here to win over the people saying “Not voting is simple.” or “Voting third-party in California is free, it’s that simple.”  It’s anything but simple if you care about actually, eventually, getting it right, not just thinking you got it right because you’re staring at an incomplete model.


  1. Comment I received via Twitter, posting here so I can reply here:
    Let's start by supposing a hypothetical voter Ken has 100% certainty about the outcomes and everyone else's vote. He knows that U(A wins) = 1 and U(B=wins) = 0. If more people are voting for A than B, or if more than 1 more are voting for B than A, then it's clear that it doesn't matter who Ken votes for or if he votes at all. In this case the marginal utility of him voting for A over B is 0. But if there is a tie, or if B is leading A by exactly 1 vote, then the marginal utility of voting for A over B is 0.5 (assume ties are broken by coin flips - exact procedure is not really relevant to this discussion). This, I think, is uncontroversial. Ken looks at what everyone else is going to do and figures out if his vote will have an impact. If it will, he has a positive marginal utility of voting for A over B. Otherwise, the difference is 0.

    But what about another hypothetical voter, Leslie. If we put Leslie in the exact same situation as Ken, we get the exact same results. And against for a 3rd hypothetical voter, Jordan, and a 4th, etc. When any given voter believes there will be a tie, the utility of voting for A over B is 0.5. Then if we sum this utility over the population, suddenly the total utility is N/2 where N = number of voters. But total utility is supposed to be 1! This isn't double counting, it's N-counting!

    That's your complaint, I think. Do correct me if I'm wrong. But there's a subtle mistake there. That summation is taking the marginal value of a single person's vote from N *different* counterfactuals, and adding them up. You don't add up marginal values across counterfactuals. You add them up from the same counterfactual.

    There's nothing special about the 100% certainty assumption here either. In the standard voting model that Scott talks about, each voter assumes a probability distribution over the votes of the other voters. So while the assumptions are different, Ken's problem and Leslie's problem still are considering different counterfactuals. It's worth going into it in a bit more detail. When Ken computes the expected utility of voting for A vs. B in the uncertainty case, he does so by dropping down to each counterfactual, computing the utility of each counterfactual, computing the probability of that counterfactual occurring, then takes the average utility over counterfactuals weighted by probabilities. I.e.

    E[U(vote for A)] = sum_{possible arrangements of votes} U(vote for A | arrangement) * P(arrangement)

    Each arrangement of votes is a counterfactual with each other voter's vote specified, e.g. one arrangement Ken considers might be: Leslie votes for A, Jordan votes for B, etc. The point being here that *Each voter considers a different set of counterfactuals than each other voter.* Now, we simplify this by assuming that each voter faces the same probability distribution over possible aggregate arrangements, i.e. that each voter has the same chance of casting the deciding vote (assuming they're in the same state), even if one is vastly more likely to vote for A than another. This is an approximation, but as long as N is very large the impact of one voter on the aggregate probability distribution is tiny.

    The upshot to all of that is, again, that *each voter averages over a different set of counterfactuals in order to compute the marginal value of their vote*. So summing the marginal values is not and should not be meaningful. Marginal values don't sum to total value because they're computed under different counterfactual assumptions. Adding uncertainty to the mix doesn't change anything. If you want to get from marginal values to total values, you have to compute the marginal value of each vote *under the same set of counterfactual assumptions*, then add those up.

    1. These aren't different counterfactuals, they are the same scenario. Ken and Leslie both have 100% certainty the election will come down to 0-1 vote depending on whether they vote (this is not practical, but they live in a simulation the Creator has told each of them that every other person WILL vote, Creator knows this will make each of them vote, so he ends up correct in the future - these are the parameters that allow for 100% certainty... ok moving on...). They each must decide what to do in that single election, under those conditions. One scenario, one world. If they don't divide their expected value by N, then a million Kens and Leslie will each add their "I cast the winning vote!" expected prize to the sum predicted and they will be off by a factor of N.

    2. No, they are different counterfactuals. You have to look at the vote arrangements. When Ken is choosing to vote in the certainty scenario, he knows Leslie's vote. But when Leslie is choosing to vote, her vote isn't fixed because that's precisely the variable she is choosing!

    3. Well, looks like this is devolving into a determinism debate. My philosophy allows for Leslie's consciousness to be "deciding" and also for her decision to be ~100% predictable by Ken.

    4. My position is 100% independent of determinism. I'm just describing the math. (Though I agree with you about determinism here).

    5. Put another way, suppose Ken and Leslie are the only voters. Ken fixes Leslie's vote and asks "what's the counterfactual difference between voting for A and B?" while Leslie fixes Ken's vote and asks "what's the counterfactual difference between voting for A and B?" So suppose that in a deterministic world they both vote for A. Ken is comparing U(A,A) and U(B,A), while Leslie is comparing U(A,A) and U(A,B) - two different sets of counterfactual comparisons.

    6. Okay but that's superficial, you're examining the steps before they solve for their utility. Now have them each solve for the expected utility! That *output* has to sum to total utility even if their prior steps are relative not objective. Even if Ken knows things Leslie can't know, if they are both to be VNM rational, their expected utilities can't exceed the total utility.

      Say they are the only voters, and it takes 2 votes to win and the prize is $300B to american primary schools. If Ken looks down and says "looks like I create $300B of wealth every time I vote here (since Leslie's vote is Yes and my vote is thus the deciding vote)" and Leslie looks down and draws the same conclusion, then on November 8th they both generated half the utility they expected since they can't actually tell who cast the "key vote" or however you want to describe them dividing the win by N in order to allocate the gains. Notice how easy it was to accurately model the others' behavior. Of course the other will vote yes. But when they go to calculate expected utility, they better divide by the N of the collective they are acting as.

    7. "Even if Ken knows things Leslie can't know..." I take that back. If I know the coin has landed heads and you don't know, my ev on a bet of heads is 1 and yours on tails is .5 and that doesn't sum. I'm starting to see your point, I have to think harder now about how it applies to the voting scenarios.

    8. (Running away for now so I can actually be productive; I'll return later)

    9. Next step: do you agree that if everyone has the same shared information, the expected utilities of rational agents must sum to total utility? (No rush, I'm also multitasking).

      If so, isn't working from the same distribution that produces 1/60M chance of 1-vote-tie a shared-info setting?

    10. pastebinning again because I broke the character limit for comments:

      "Next step: do you agree that if everyone has the same shared information, the expected utilities of rational agents must sum to total utility? (No rush, I'm also multitasking)."

      Yes, marginal utilities must sum to the total utility, but you have to be careful about *which* marginal utilities you sum. See the link for details.

      "If so, isn't working from the same distribution that produces 1/60M chance of 1-vote-tie a shared-info setting?"

      Sort of. Strictly speaking, when Ken consider's his vote, he assumes a probability distribution over Leslie's vote, and when Leslie considers her vote, she assumes a probability distribution over Ken's vote. They do not have the same information sets - each voter has a probability distribution over a different subset of voters, to wit, each other voter. Now, their information *is* assumed symmetric, that is, what Ken believes about Leslie, Leslie believes about Ken (conditional on what stat they're in, etc). Or at least, this is the strictly speaking correct way to model it. But when N>300 million, one person's vote doesn't change the aggregate probability distribution a noticeable amount, so we round it off to "everyone has a 1/60M chance of casting the deciding vote (conditional on their state)."

    11. First, thank you for the nice writeup in the pastebin. That was helpful. I still feel like "In short, you decide as if you were the last voter, but you assign credit as if you were the average voter." is an "I expect to be surprised in exactly this direction" situation that rational agents shouldn't be subject to.

      If K really believes his expectation is 1, he should pay 0.9 to get there. So should L. Then they wake up the next morning down 0.8 utilons because assigning credit obeys the total but their subjective decisions did not. They used math we thought was sound, paid an amount of utility consistent with what their subjective estimate of future utility was, and lost.

    12. K should be willing to pay 0.9 utilons *conditional on L not paying anything*, and similarly for L conditional on K. But if K knows that L is going to pay 0.9 utilons, then K should only be willing to pay up to 0.1 utilons to get there. And if K and L can't communicate, coordinating the right utilon-payments isn't easy. You've described a coordination game, not a paradox.

      I get that "decide as if you're the last voter" sounds counter-intuitive. I'll think about it and see if I can better explain why it's reasonable. But fundamentally, I'm just trying to describe the math. If I'm wrong, there should be a problem in the math I've been using in order to actually determine K & L's decisions.

  2. >Because you can’t distinguish between the “core” backers and those on the margin, you have to allocate to each participant both the gains and the losses.

    Just because the first option (finding the core contributors) is impossible doesn't mean the second option is automatically coherent. They can both be nonsense. What exactly does "allocate" mean? That they "get" the Total_Value/Total_Participants? It has no meaning whatsoever except as sort of an infographic way of patting yourself on the back.

    SA's point, and the point of most voting skeptics, is that as far as any individual brick-donator can tell, they should have stayed home. The only necessary assumption is that their actions don't impact too many others' actions, which is quite a safe assumption in private elections.

    The brick thought experiment also seems to lead to some strange recommendations. Do you encourage 2500, 3000, 5000, 100000, people to get out and participate? Because that is a lot of wasted bricks. At some point, rational people would look at forecasts, do some fast math, and stay home. Hopefully you agree this is correct. But this is exactly what many people do in national elections.

    >This is the challenge of collective action in a nutshell. I’m not saying I have a solution for the short term (unless you count this blog post as my best attempt at a first step). But it’s instructive to notice that we have overcome other collective action problems with negative initial conditions, through shifting the definition of rational behavior – through changing the model.

    >Do you feel duty-bound to recycle even if the expected value is opaque and the returns are tiny at the individual level? Do you think your great-grandparents felt that way?

    >Do you feel duty-bound to vaccinate your kids even if you don’t live in an area with other unvaccinated kids (meaning your kids are extremely, extremely unlikely to be exposed to the diseases you’re vaccinating against)?

    No one calculates the EV of recycling or vaccinating. It's just something you do to belong to the middle class. It has nothing to do with "changing the model" of rationality. If you don't vax everyone talks behind your back about how stupid and what a bad parent you are. So you should probably vax even if it does nothing.

    And if your only point is that "irrational people would do better in this one instance", the obvious counterpoint is that they'll do TERRIBLY in all sorts of other instances. The world is a better place with an extra Scott Alexander not voting than with moron #17825912859 who thinks voting is his civic duty.

  3. i really enjoyed your blog but i had no idea what you said!!!

  4. hello everyone.....
    thank the good topic.
    Welcome To Casino online Please Click the website
    thank you.
    goldenslot slots casino
    gclub จีคลับ