Algorithm for Wordsnakes Problem
Original Problem Description
My algorithm runs in 2 stages:
Deterministic Tempered Greed
The obvious greedy strategy in this game is to start linking words which
have the highest overlap of characters (eg. abcd & cdef can
be linked to form abCDef), or to go for full inclusion (eg. placing
bc after abcd results in aBCd).
But, as is often the case, greed isn't optimal. Consider the words
baab, ab & aab. Lets call them X, Y
& Z. Y & Z are each substrings of X, and can
hence be 'fully included' in X as long as they occur after X
in our solution. If we choose the order XYZ as our solution, the
resulting snake is just X and we would get 4 + 9 = 13. Not bad.
But...if we change the order in our solution to be YZX, the snake
starts off as Y=ab, becomes YZ=abaab with no
accumulated points, but then X is fully included in the snake,
giving us a score of 16!!
Discovering these 'better' options requires exploring the combination
space as deeply as possible. With 100 words and 2 minutes of time, I
got as far as trying every possible ordering of 4 words (which is an O(n^4)
search space) to find the best 'quad's. Only combos that have a score
above a mimimum threshold are considered. There can be conflicts. If
PQRS and QRUV are our top 2 quads, QR can't be used
twice, so we can only pick one of these (the one with the higher score of
course). Hopefully we can find a few conflict-free quads, treat each of
them as a single word by adding these new 'multi-words' to the word-list
after removing their constituents. Then look for good trios. Then
look for good pairs. Line up the multi-words (quads, trios & pairs)
selected so far, and with the remaining words that haven't been linked to any other yet,
find the best place in this sequence of multi-words to insert them.
The conflicts can sometimes be resolved safely. eg. just because RS is
used in PQRS doesn't mean the quad RSTU has to be thrown
out for conflict reasons. We might be able to merge the two quads to
form PQRSTU. But that might not be possible...eg. if R were
fully inside PQ and not at the end, that would make the suffixes
of PQRS the same as the suffixes of QS rather than those
of RS, killing the match with the prefixes of TU.
And so the greed is tempered, making not the immediately obvious greedy
choice, but delving a little into combinatorial explosion before getting
Non-Deterministic Simulated Annealing
Starting with the solution from the previous stage, simulated annealing
is run until time is up.
Neighbors to the current sequence (of 100 words) are found by
randomly splitting it into four parts ABCD (of which A &
D can be empty) and swapping B & C to form ACBD.
Temperature is rigged to decrease such that it will hit zero right when
time is up, decreasing the likelihood of taking downwards steps as time
runs out. The best sequence so far (which until we get an improvement is
our local maximum) is kept track of, and after some number of steps (eg. 100)
with no gain in score, (and possibly much loss) the current sequence
is reset to the best so far. This gives the annealing process a chance to
head in a different direction from the local maximum than the last time
When time runs out, the highest-scoring sequence found (which corresponds to
our local maximum) is our solution.
On the two word-lists from our competition and the one from the previous
year's competition, the greedy solution was improved by 5-10%. Not that
much, but the greedy solution was already pretty good to start off with