September 7, 2019

A Blockchain-based Voting System is a Monumentally Bad Idea


BLUF

Simply put, a blockchain is a decentralized database of transactions.  The objective was to create a system of money that would be controlled by many people (or no one), rather than giving a single institution absolute control over the currency.  This sounds a lot like democracy, and it's understandable why it would be appealing to those who are dissatisfied with the voting system we have today.  Unfortunately, there are other aspects of blockchain, particularly when it comes to anonymity, that would create more problems than solve if applied to an election system.

One of the major features of blockchain is it's resistance to tampering.  This relies on the use of a secure hash algorithm that encodes the transaction history of the chain.  Through this mechanism, if anyone alters the chain in any way, then it's easily found with a simple hash check.  Also important is it's decentralized nature.  There is no central server to synchronize with.  As transactions are executed, copies of the database propagate throughout the network and self-correct using hash checks.  Because everyone in the network has access to the transaction list, it's important to hide the identities of the individual users.  So blockchain provides a decentralized and tamper-proof system for logging anonymous transactions.  So what's the problem?

Proponents of blockchain-based voting focus on its tamper-proof aspect.  The idea is that if outside actors want to try and change votes, that they can't because blockchain explicitly prevents that.  This ignores the transactions added through voter impersonation.  In the system we have today, this is a rare occurrence, mainly because most of us still need to show up in person to cast our vote.  It would be obvious if I kept returning to the same polling spot over and over.  It would also be very time-consuming.  This type of fraud is exponentially more simple to commit in an electronic system.  And the anonymous nature of blockchain makes it exponentially more difficult to catch.

There are several other problems with blockchain-based voting that I'll try and describe here.

Application Design Requirements

Congratulations! Andrew Yang has won the presidency and you've been contracted out to design his voting app that's going to revolutionize the way we choose our political leaders.  Like any good software engineer, we need to define the system requirements before we begin.  The rules are simple:
  • Only votes by eligible voters are counted
  • One person gets just one vote
  • Votes must remain anonymous
 We'll see later why the last two are contradictory.

Scenarios

Scenario 1: The Honor System

No real system would ever ignore user authentication, but this will illustrate a couple problems that will make it easier to understand issues with the more sophisticated designs described later.

This hypothetical system consists of only the blockchain of votes.  Just as in Bitcoin, the integrity of votes (transactions) is secured using a hash algorithm that encodes the history of the chain.

Hash codes are numbers that represent sequences of bytes in a data stream.  They're frequently used in database and authentication applications because they're random enough that it's impossible (or at least prohibitively difficult) to reconstruct the information used to generate the hash, but specific enough that different information will (almost) always produce a different hash code, even in cases where the inputs are only slightly different. [1]


So let's bring the test system on line and see what happens.

Korben Dallas casts the first vote and it's for Elizabeth Warren (sure, Yang already won, but this is a demo system and users are encouraged to misbehave -- so we get a proper stress test).  Keeping in line with the one-person-one-vote paradigm, the system automatically logs him out afterward.  He decides his girlfriend Leeloo also likes Warren and logs back in as her and casts another vote on her behalf.  In fact, he decides that everyone in his neighborhood would be better off if Warren was president and just logs in over and over again pretending to be each one.

We already knew there had to be some authentication system in place to prevent this kind of thing.  But notice there was nothing inherent to the blockchain to prevent this.  If the integrity of the system was determined simply because it passed a hash check, this would have appeared perfectly legitimate to an outside observer.

Let's say Leeloo logs on later as herself but is rejected because her vote has already been cast.  This would be news to her.  Who does she complain to?  The board of elections?  You, the app developer?  Should the system have the capability to modify votes that might have been cast erroneously?  If so, then how do we do that without undermining the core benefit of using a blockchain in the first place?  At this point it's absolutely clear we need a system to authenticate users (except we knew that from the start).

Scenario 2: Server-side Authentication

So now we've introduced a server to store a database of registered users (voters).  Let's say it's the most state of the art password + token + sample of your DNA + best authentication ever.  Korben Dallas may only log in as himself and there's no hack in the universe that would allow him to log in as anyone else.  To keep things simple, I'll refer to the information Korben used as proof that he is who he says he is as his "credentials" -- this is Korben's digital self.

In order to satisfy the system's anonymity requirement, we can't put Korben's name, let alone all of his credentials, into the chain when he casts his vote.  In fact, the user that cast the vote is irrelevant -- only that he just casts one.  So let's not include any of that information.  We'll leave it to the authentication server to keep record of whether or not the user has already cast a vote.

Leeloo is pissed about Korben casting her vote without asking her permission first back in Scenario 1.  In this scenario she landed the lucky job of administering the authentication server.  See, she thinks Sanders is the best and would never have voted for Warren.  As revenge (and because she was encouraged to misbehave), she flags Korben's entry in the database as having already cast his vote before he has a chance to do it.  In fact, she happens to know a whole neighborhood of mostly Warren supporters.  She thinks that's bad news and flags the whole group as having already cast their votes too.  After all, they never trusted the blockchain system, so they're unlikely to complain.  Who'd listen if they did?  Not her, and she's running the server. (Yes, this was a thinly-veiled reference to the 2018 GA election, in case you were wondering)

Okay, so giving authentication squarely to an individual was obviously a bad idea.  But there is a way to keep a centralized database of registered voters, while still maintaining the anonymity requirement using hash codes.  Instead of the database containing the specific users, why not encode each voter's credentials with a secure hash, kind of like how websites do it?

Scenario 3: Client-side Authentication

Again, we have the greatest authentication system in the universe.  Except now, the authentication is performed on each voter's device, and only a secure hash is transmitted to the remote server to complete the process.  The database has much less information now: A hash for each registered voter and a flag that indicates whether or not they already cast their vote.  As before, no information about the voter is stored in the blockchain, only their vote.

Unhappy with Leeloo's performance in the last scenario, we decided she shouldn't be running the voter registry server.

The system comes on line and votes start coming in.  Warren, Sanders, Biden, Biden, Biden, Biden, Harris, Mayor Pete, Biden, Biden, Biden, Biden... Maybe Zorg wasn't the best alternative to administer the server after all.  Zorg admits (because he was encouraged to misbehave) that since the database was anonymized with hash codes, he was free to add as many as fake registrations as he wanted.  There was no way to trace the hash codes back to real people.  He was then able to use those fake registrations to cast votes for his favorite candidate, Biden.

No elections ever have 100% turnout.  This could be that as registered voters move or die, that there's no automatic system in place to remove them from the voter rolls.  But with 50-60% turnout in a district with, say 50,000 registered voters, one could easily add 1,000 votes on behalf of fake entries and still only change the participation metric by 2%, well under the radar of anyone who might be watching.

This problem could be solved by creating a firewall between the system administrator and the blockchain.  This not only would guarantee the admin couldn't use it to create scores of fake voters, but would also eliminate any chance of changing a vote once it was cast -- even in cases where genuine mistakes were made.

Scenario 4: Best-case Scenario

Okay, so now that we've learned our lessons (and I've been writing for a few hours), this is what I think a blockchain-based voting system would look like in it's best form:

I mentioned before that there is an inherent conflict between the anonymity requirement and the one-person-one-vote requirement.  That is because the system needs some information about who a voter is so that it can prevent that person from voting more than once.  Or, more specifically, from having their vote counted more than once.  Perhaps a more flexible system would be better; one that allows users to cast votes as many times as they like, with only their latest vote counting.  This would solve the problem of genuine mistakes.  If the user hash is added to the chain with a time stamp, then a scan of the chain would verify the chain's integrity, and earlier votes cast by the same individual can be ignored.  The chain could be open to the public so anyone can verify numbers of votes and chain integrity for themselves without revealing who specifically voted for who.  Personally, I see this as the height of democracy in the modern age.

Here's the glaring but:

No matter what, there needs to be a master database of registered voters with corresponding hash codes.  Because the state boards of elections need some way to verify that the entries in the chain were put there by eligible voters.  In this system, the votes are never truly anonymous, because anyone with access to that database can easily connect voters to specific votes.  If the database was anonymized as in Scenario 3, then the administrators no longer have a way to verify whether or not the entry came from an eligible voter (they at least need to know age, residency status, address).

Voter suppression by government officials is already prevalent through major registration purges and conveniently lost votes.  While a decentralized system like blockchain would be a great way to counteract this, there still needs to be some mechanism to ensure votes are only counted from eligible voters without revealing who those individuals are voting for.

Lesser Problems with this Idea

There exists a demographic of people who are averse to new technology, particularly kinds they have difficulty understanding that also impact how they might be governed.  There are also those with limited access to mobile devices and the internet.  They would either fiercely resist this type of solution or simply avoid using it, and thus be under-represented in future government.

In the scenarios, I assumed authentication was ideal, but I'm of the opinion that this is impossible to achieve in practice.  Authentication is the process of proving you are who you claim to be to a computer.  To do this, information about who you are, whether it's your name, SSN, DOB, thumbprint, or the pattern of your iris, is digitized and sent to the authentication server.  I referred to these earlier as your credentials.  For the sake of network bandwidth and database storage size, this information is usually boiled down to less than a kilobyte.  The problems with this are 1) people change (names, scars on thumbs, eyes); 2) interface systems are imperfect (camera on your new iPhone my have different resolution than your older model); and 3) that digital tokens can always be stolen.  Although rare, voter fraud does happen (cf. ref 3).  The anonymity that blockchain is designed for only makes this easier to accomplish.

Finally,

No state department of elections is so altruistic as to give up their control over how votes are counted.  If any were to adopt a blockchain-based system, there would very likely be flaws built in specifically so that it can be manipulated to serve their own interests (even if that wasn't the case, opponents of the system would quickly jump on any flaw they could find and claim that was the case).  It will never be as transparent or as publicly accessible as the ideal.  I believe that, ultimately, any blockchain-based system would end up with exactly the same flaws and likelihood of abuse (if not more so) than the system it was designed to replace.

References

  1. https://passwordsgenerator.net/sha256-hash-generator/
  2. Simon Dedeo, "The Bitcoin Paradox", http://nautil.us/issue/55/trust/the-bitcoin-paradox, 14 Dec 2017.
  3. https://www.heritage.org/sites/default/files/voterfraud_download/VoterFraudCases_5.pdf