Love Story to World Domination – a Linguist’s Take on Wordle

The free online word game Wordle has taken the world by storm. (Image credits: Wordle website.)

The free online word-guessing game Wordle has taken over the internet, a craze catalysed by New Zealand players. This app that started as a romantic gift to support a partner has now become one of our most competitive daily activities. In this discussion, Linguistics Editor Stephanie Jat looks at the linguistic elements of word guessing, and explores the benefits of corpus linguistics in helping find an advantageous first guess. (All images of the game are screenshots from players of the web page.)

Wordle 234 3/6*

🟨🟨⬛⬛🟩
🟩⬛🟩🟨🟩
🟩🟩🟩🟩🟩

If you had told me this time last year that posting this colour-coded grid on Twitter would make sense, I probably would have scoffed, rolled my eyes, and carried on with my everyday life.

Which now consists of a daily game of Wordle, the by-product of a love story.

For those of you who have yet to be swept up in this internet trend, Wordle is a free online English word game by Josh Wardle, who originally developed it as a game for his partner who wanted something mindless and calming to do while she was going through a rough patch (hence the American spelling). The prototype was designed back in 2013, and included all 13,000 five-letter words in the English language found in some online dictionary Wardle had found. His friends had fed back to him that many words were obscure and rare, which would never have been one of the guesses they made. When Wardle set off to build something for his partner, he decided to use a filtered word list (Corpus Linguistics is useful!). This was a collaborative effort between him and his partner, who sorted the 13,000-word list into three categories: “I know this word,” “I don’t know this word,” and “I maybe know this word.” The end result was a 2,315-word set of allowed answers, defined mainly by what his partner knew and didn’t know. This is the romance that emerged into the game that has now dominated the English-speaking world.

So, how do you play? You get six guesses for a different five-letter word every day. You begin with nothing but an interactive dynamic keyboard which marks your progress and some empty boxes:

Boxes with letters not in the answer word remain black; those with correct letters but in the wrong positions go yellow; and correct letters in the correct position have green boxes. The keyboard below keeps track of which letters you’ve used and your discovery of their accuracy. You can play on Hard Mode, which means all of the information discovered in cumulative guesses must be used in the next guess.

There are approximately 300,000 people worldwide playing Wordle daily, and everyone plays the same game every day (counted as the 24 hours of the date, once past midnight, the page renews itself).

After you win (or run out of guesses), Wordle tells you how many games you’ve played, what percentage you’ve won (i.e. guessed correctly within 6 tries), both your current and maximum streak (i.e. how many days in a row you’ve played, and what the distribution of your number of tries is. Assuming 95% of words can be guessed within 6 tries, easy mode users need an average of 4 attempts, while hard mode needs 5.

Now, let’s talk about strategy. Most players have a “starting word” or a few of them, which they believe are the best option for eliminating possible words. ADIEU has been used frequently as a starting point, since it includes four vowels, more than any other 5-letter English word. Clever right?

Not really. You might think it’s the best plan to figure out what vowels you have first, but this linguist is telling you that the consonants will determine whether you win or lose. 

Linguistics (and the mathematical side of it as seen from the use of corpora like Hemingway) is actually useful for a game like Wordle. I often focus less on “what vowels” and more on “what consonant clusters”: the vowels can all occur together in many words, but the combination of consonants in English are limited (e.g. no -tp-, -tl, -wb-) and there are so many more of them you need to eliminate. Once you have some of the consonants, you can probably guess straight away what vowel(s) needs to go in.

Think about it, say you enter the word RAISE as your first try, and you get R A and S but none of them are in the right position. You have some frame of reference to work from: SR is not an onset in English (sriracha is a loanword) so you know that can’t be the combination. S cannot be in second position unless you have a psy- word, which A and R can’t fit into. You know that R and A can’t be in the first two positions respectively and S can’t be in fourth. You can then have (X is an unknown letter): SXRAX, ASXRX (Latin has astra but English rarely has a word which looks like this; the third position is unlikely a vowel since English likes doubling S intervocalically; STR and SHR are the only SXR consonant clusters permitted, but both are unlikely here), SXXAR, SXARX, SXAXR, XRAXS (XS is a very unlikely five-letter word ending, since plurals aren’t really in the answer set).

Then you can look at what words are possible: STRAP, SOLAR, SUGAR, SONAR, SCARF. (I’m going for common words, which is the whole idea of the game and the reason for its accessibility.) There are now only really 4 more guesses you could make (SOLAR and SONAR both have O so if one fails, the other will too!). The answer is SUGAR. Win.

Say you start with ADIEU instead, your first guess will yield (I’m using all these 5-letter words just for you!) letters A and U. That doesn’t get you very far, does it? Now you have XAUXX, XXAUX, XXUAX, XUAXX, XAXUX, XUXAR, UXXXA, UXXAX, UXAXX… I’m not a mathematician, but I can tell that there are many more than six possibilities here. So we continue with vowel elimination and go for OVULA, and– nothing. No O, and U and A are still wrong. So now you can guess: BURKA, LAUGH, SUGAR, CAJUN, FAULT, RURAL, UNSAY, URBAN, VAULT… just to name some of the most popular ones. This is way more than the 4 remaining guesses, and your chances of losing are significantly higher. 

In fact, a mathematician has done data analysis or Wordle answers, and also used a frequency corpus of Ernest Hemingway’s first novel, The Sun Also Rises. To look at the relative incidence of Scrabble versus real life letter-use in words, she ranked the 26 letters according to frequency in Hemingway and Scrabble score (usually the rare it is, the more it scores). She then calculated a “coverage score” of her guesses, which is the likelihood of the word including some element of the correct answer. She found that gut instinct first words (here ALTER, BISON, DUCHY) got her 86.81% coverage, while a guess based on Hemingway’s corpus gave her a lower score of 85.92% (STONE, CHAIR, WOULD). If she took only the highest ranking 15 letters and made words with them, she ended up with a coverage of 88.51% using the words ADULT, GRIME, SHOWN. What does this mean? This tells me that first of all, the rare Scrabble letter H (which is 4 points) is actually the third most common consonant. Vowels are not the be all and end all: T is actually more frequent than A.

Gross characters, relative percentages, and Scrabble tile values for The Sun Also Rises. (Image credits: Caroline Delbert.)

TL;DR: Everyone has their own strategy for nailing Wordle, and I’m not going to demand you play a game designed for de-stressing in a specific way. Corpus linguistics, however, does suggest that using a combination of vowels and consonants in that first guess could be much more effective than focusing only on those pesky vowels. This might help give you that leg up against your neighbour, your family, your friends… I mean, everyone’s got a competitive streak, right?

Previous
Previous

A Long, Long Time Ago, a Book you Read Changed your Life

Next
Next

Can You Truly Learn a Language on Duolingo?