(this post requires you have some knowledge of information theory that was shared in a previous post)
okay! let's take a look two games of two different friends of mine, which we will completely randomly name felix and pam.
both games took 4 guesses, but do you see
how much luckier of a guess moldy
was
for felix than it was for pam? let's look
at the information gained at each guess
you can see how in pam's game, the final guess
yielded only 1 Sh, while in felix's, it gave
a whopping 5.49 - felix had a much wider space
of possible guesses. but wouldn't you also say
that worth
is a better guess than foldy
?
if we didn't know the answer, you'd suppose that
letters like t
and h
are much more probable
to appear than d
and y
, but the information
gains reading says that pam's guess was far
better than felix's. pam got lucky. is there
any way to measure that?
yeah what'd you think lol. anyways, if you recall how our system works, you might notice a flaw - we only run the checks on the resulting pattern. the player can't actually see it before they guess though! we have to figure out a way to get the possible future information gain. the way this is done is actually quite simple - you just run the guess against every possible answer word and get the average entropy returned. that's entropy (at least in information theory). a neat thing you can do after that is find out how much actual information you've gained, and see how that differs from what you expected. that's basically how lucky you got.
to recap, in this scenario, the entropy of your guess is how much information gain you can expect, the information is how much you actually got, and the difference is how "lucky" you got (better or worse than expected). queue the charts
each chart has a distribution of the probabilities of possible resulting patterns, the entropy, the expected entropy, and the difference.
you can see how this time, even though foldy
got
more information, worth
would've gave more
information, but it got unlucky. you could say that
felix's guess was more skilled than pam's, and that
pam actually was consistently luckier than felix. on
that guess in particular, felix got 1.78 Sh less than
expected, while pam got 1.57 Sh more!
the weird shape above the actual information, or the
"distribution of the probabilities of possible resulting
patterns" as i called it 2 paragraphs ago, is, well, pretty
much that. to get the entropy of a guess, you just have to
run it against all possible answers (since you don't actually
know it in this scenario). back to our example where we only
had to guess between phone
, cobra
, sling
, and night
,
and we put radar
as a guess, to calculate its entropy, we
just have to average the information gain for each of the
possible words. so, for phone
, sling
, and night
, the
answer will be all grays. this means that we would've gained
~0.42 Sh if it was any of those words, but if the word was
cobra
(and it only), we'd have a yellow letter on the first a
,
giving us a whopping 2 Sh of information. now we just average
it! (3*0.42 + 2)/4 = ~0.85 Sh of entropy.
that shape is a histogram of every possible resulting pattern from every possible answer word, and the highlighted part is what we actually got. but because probability is inversely proportional to information gain (if we got a rare pattern, we'd get more information out of it), we just display the information from lowest (most probable) to highest (least probable) information. to build up some more intuition, here's also my game ran through this technique
let's start at the bottom. there are 2 bars, the taller one being
highlighted. in this case the possible words left are corse
,
dorse
, gorse
, worse
, and zorse
. guessing worse
basically
has 2 outcomes (just like the previous example scenario) - that the
answer is worse
, or that the answer isn't worse
. the former
gives us more information (it's a rarer outcome), so it's the taller
bar, and highlighted because we guessed it. going up, the smaller bar
is highlighted because we guessed the higher probability (lower
information) option - that the word is not horse
. going up, there's
an even more complicated distribution, and it seems like we guessed
one of the two lowest-entropy answers. the other one looked like this
you'd think that having one less green square highlighted would give less information, but if you remember, we're looking at how many words were eliminated (or remain rather), and in both cases 6 possible words remain. then another bar, for a pattern that leaves only 2 possible words, and the flat bit at the end are all patterns that would give maximum information - leave only one word, for instance this pattern
would only leave mouse
as a possible word. actually, another curious
thing is that all of our logic a pattern that's all greens is in the same
category since it leaves one word (morse
). going up again, now the
distribution is smaller, yielding less entropy. you can imagine it as
the fact that it's smaller, everything has a higher probability, so
everything on average will give lower information, and the top two
distributions are pretty self-explanatory.
a small thing to note is because our word list is much bigger than the possible answers, our luck is going to be skewed a bit downwards, since the size of the word list gives more variation (you can sort of think about this through the lens of the law of large numbers).
muehehe i installed a maths extension to my blog engine so you will now get to see some formulae :3
to start off, we define some message (remember, in information theory we talk about messages/events, in our case the wordle patterns) and its probability . to calculate the information content in bits of this message, you just have to take the log of the inverse of the probability
actually, that base could be anything. so far i've been using bits because they're the most common, but also they're a bit more intuitive if you know binary. given a number with 8 bits lets say, having 256 values, the probability of any message (number) is , meaning that the information content of any message would be bits (shannons)!
the entropy of a message space (that would be all patterns in our result) is the expected value (weighted average) of the individual information content , and we defined it like this
where is the size of . is just a way to say "the amount of times this message has occurred, and is just a way to divide the whole thing by . but you might've already noticed, those cancel out!
behold, the formula for entropy, as defined by claude shannon himself! (see here, page 13). readers are free to re-check all of this errors, i'm proud of my , but still not that confident in my maths skills x3
anyways, that's been all from me, toodles!
© nicole; go back