wordle and information theory

Jul 03 2025

i'm not great at wordle, which can probably be evidenced by this game

but other than that, i think that wordle is very interesting regarding information theory (just like 3b1b x3), so this will be my attempt at an explanation on how we, realizing it or not, employ information theory while playing a game of wordle.

the game space

imagine every possible word that is valid in wordle (about 14.9k words) - everything from aahed (the simple past and past participle of aah) to zymic (pertaining to, or produced by, fermentation). now imagine they all somehow exist in one solid block of words. to make a guess in wordle would be to chop off a section of this block with all the words that don't match your guess.

to give a simpler example: imagine you only had 4 words to guess from - phone, cobra, sling, and night. if you guess radar, and you get

you can only eliminate cobra as an option. (ie chopping it off the block) but if you get

you can eliminate phone, sling, and night! very efficient. once you chop all 3 off the metaphorical block, only cobra is left, so you can be sure it's it.

and so a game of wordle is just repeatedly removing sections from all possible answers until you remain with just 1 word.

shannons (bits)

some of you may have heard of bits as they are in computer science (1s and 0s) - they are somewhat related to our bits here, but only somewhat. even though most people just call them bits, i'll call them shannons to remove ambiguity and to honour Claude Shannon (the "father" of information theory) i guess x3

a shannon is, really crudely said, how much did you chop off the possible guesses. and it's base 2 logarithmic. said with more understandable words, a shannon is basically how many consecutive times did you chop the space in half. if you remove ("chop off") half of the possible answers, that's 1 shannon (Sh for short). if you halved the possible answers twice in a row (or shrank your possible answer space 4 times), that's 2 Sh.

back to the game with only 4 possible words, guessing radar and getting the first two yellow, means that you just gained 2 Sh of information (4 possible answers -> 1 possible answer), but if you got all gray, you've only gained about (log2(4/3)) 0.42 Sh of information. how can you chop something in half 0.42 times? this has to do with logarithms and fractional exponentiation and the like, which you can read about here if you're interested.

now, back to the real world. let's measure how good my guesses are in shannons with the full word list

each row of text is the information gained in shannons, how many possible words there are (again in shannons) and examples of possible words. click to view in full!

as you can see, salon and posts are similarly good guesses, while boost barely eliminates anything. morse narrows it down more, then horse only manages to eliminate itself as an option, which finally leads me to guessing worse.

i was also wondering about how much information does each individual letter give us, so i wrote a bit more code

the brighter the square, the more information it has given

as you can notice, most letters gave absolutely no information, but also each guess is worth more than the sum of its parts. for example, the sum of the letter values of the second guess, pots, is barely 1 Sh, while the guess' actual value is 4 Sh! this is due to stuff like the Ss not giving information individually (since we already know that there's an s in the word), but together in the guess they drastically narrow down the possible answers.

notes

you may notice that the word list that i'm using has some words that will never actually be the guess to an official wordle game ever. which is true! these are all the valid words (those you can play), not the possible answers. i've decided to use them instead of the answers for 2 reasons

the visualizations are prettier and arguably easier to understand (knowing the answers lets you narrow down pretty much all words in 3 guesses)
players can play any valid word without it being in the list of answers, which they don't even know

the code

the code wasn't too difficult to write, although i fear there still might be errors in my wordle logic

from collections import defaultdict

def wordle(guess, answer):
    pattern = []
    used = defaultdict(lambda: 0)
    for i in range(5):
        if guess[i] == answer[i]:
            pattern.append("green")
            used[answer[i]] += 1
            continue

        pattern.append("gray")

    for i in range(5):
        if guess[i] in answer:
            if (used[guess[i]] < answer.count(guess[i]) and
                pattern[i] != "green"): 
                pattern[i] = "yellow"
                used[guess[i]] += 1

    return pattern

everything is done very bruteforce-ishly, but it runs fast enough so i don't care that much. all the visualizations were made in matplotlib, aligning them was hell lol. this is hella awkward without an outro