tldr: I made a dice-based passphrase wordlist, each word has a unique 3-letter prefix so you only need to type the first 3 letters of each word in your passphrase! You can also get it as a print-and-staple zine, and a python script to generate passphrases.

Background

In 2016, the EFF released a new set of wordlists for generating high-entropy random passphrases using dice, attempting to improve on some of the flaws of the classic Diceware wordlist. I was particularly interested in their list of words with unique 3-letter prefixes; one of my biggest obstacles to using Diceware more in practice is that typing 6 whole words into a password prompt is a pain, and there’s too many opportunities for random typos even if you remember the actual words correctly. The EFF speculate about a future password prompt with an autocompletion feature based on their list, filling in the whole word after you type in the first three letters; I’m perfectly happy here and now just memorizing a phrase and typing the first three letters of each word as my password. This is a super cool project, and I’m grateful to the EFF for putting it together. However, looking at the list and reading through their process for generating it, there are a number of things I’m unhappy about:

Confusability

The EFF’s word list with unique 3-letter prefixes contains several pairs of words that, in my opinion, are too close to synonyms to be usable as distinct elements of a passphrase mnemonic. For example, it includes both “backpack” and “knapsack”, and both “idiocy” and “imbecile”. I’d like a word list where words are chosen to be easily conceptually distinguishable.

What gets included…

You may already have noticed another problem: “idiocy” and “imbecile”, in addition to being confusable, are both ableist. The creators of this list attempted to remove offensive words through a combination of manual review and published word filter lists, but this is complicated territory, and not every reviewer or list curator will have the same standards for what constitutes offensive language. Personally, I’m also unhappy with words like “policeman” and “jailhouse”, among others, featuring in the EFF’s list. The need to abolish the police and the carceral system is more visibly urgent than ever, and I don’t want to enshrine these concepts in passphrases I commit to memory and type out every day.

… and what gets left out

In addition to leaving in some yikesy stuff, the curated filter lists used by the EFF also unfairly exclude some words. One of the offensive word lists they cite, by Luis von Ahn (cw: lots of slurs), includes the words “lesbian” and “gay” as words to filter out (“heterosexual” is also included, so I guess it’s at least kinda fair?), as well as many other innocuous words. There are contexts where it makes some amount of sense to filter strictly like this, but for making a static word list that will already be subjected to manual review (rather than, say, a Twitter chatbot that learns from random users who interact with it), I’d say this filter does more harm than good. A word list where cops are represented, but LGBTQ+ people aren’t, isn’t a word list I want to use for my passwords.

Data ethics

A lot of other data went into the EFF’s word lists to reduce the intensive labor of manually picking out suitable words, mostly from Ghent University’s Center for Reading Research. This includes data on how commonly-known various words are, as well as data on the concreteness of various words; the EFF chose to target more concrete words for easier memorization. Unfortunately, Ghent University’s word concreteness data was collected using Amazon’s Mechanical Turk platform, which exploits expendable laborers working for rates far below minimum wage, enriching aspiring trillionaire Jeff Bezos by commodifying the parts of human intelligence that can’t yet be cheaply offloaded to computers. Vast amounts of this kind of disposable human labor are sadly ubiquitous in the background of countless research projects in computer language processing and image processing, and I don’t think it’s possible to build an ethical project on data gathered this way.

My attempt to do better

With these issues in mind, I put together a wordlist I like better, taking inspiration from the EFF’s list while trying to improve on its flaws. Below, I’ll talk about my process for generating this list, and share resources for anyone interested in building on my work here.

Conceptual distance

The first issue I highlighted with the EFF’s list was words that are too conceptually similar. To avoid this problem, I used ConceptNet Numberbatch, a dataset of word embeddings for use in machine learning projects. As best as I can tell, ConceptNet primarily uses data from Wikipedia and voluntary surveys, rather than exploitative sources like MTurk, so I feel more comfortable using this dataset. Each word is associated with a 300-dimensional vector, and I use the distances between these vectors as a measure of how conceptually distinct various words are. As I added new words to my wordlist, I was able to see what other words might be too similar, and remove words that were too close to existing words. This approach had its pluses and minuses, as close conceptual distances didn’t always correlate perfectly with confusability. For instance, the closest pair of words in my finished list according to Numberbatch are “piano” and “violin”, with a distance of 0.64. Sure, they’re both musical instruments, but I’m not too worried about someone losing track of which is which. For the most part, I tried to maintain a spacing of around 0.9 or higher between word vectors, with occasional exceptions for cases like this.

Manual review

While I used the ConceptNet Numberbatch embeddings as a guide, and used them to produce suggestions for words I might want to add that were sufficiently distant from the words I already had, all of the actual decision making about what words to include or exclude was done manually by me. Yes, this took forever, but I couldn’t really find other data sources that would be useful to simplify my search. For a while I tried using Ghent University’s word prevalence data as a source of words that were commonly known, but their corpus just wasn’t big enough and left out a lot of usable words.

I’ve done my best to keep out words that are offensive, as well as words that are tied to societal systems I want to tear down. The result is a word list that’s very much colored by my personal perspective and cultural reference frame, as a young white nerdy leftist trans woman. I’ve done my best to make something that’s hopefully useful to people other than myself, but I recognize that I’ve probably made a lot of mistakes, and I’d love to hear feedback from people with different thoughts and lived experiences.

Non-goals

The EFF did a few things in creating their word list that I wasn’t interested in reproducing or wasn’t able to reproduce. For instance, their word list with distinct 3-letter prefixes has a minimum edit distance of 3 between all of their words. This would potentially be useful for typo correction… if I were typing out entire words. Since my intended usage of this list is to memorize a passphrase as a mnemonic but only actually type the first 3 letters of each word, typo resistance really doesn’t seem useful to me. For the same reason, I also didn’t care as much about avoiding words that have confusing or ambiguous spellings, as long as common misspellings or alternate spellings didn’t affect the first 3 letters.

Code

If you’re interested in building on this work, you can find my extremely janky code at https://git.xeno.science/xenofem/diceware

In conclusion

I’ll end this article the same way the EFF ended theirs: Hopefully I’ve made something useful, but there’s plenty of room for more research and experimentation in this area, and I hope people keep exploring!