## Algorithm for Wordsnakes Problem

Original Problem Description

My algorithm runs in 2 stages:

__Deterministic Tempered Greed__

The obvious greedy strategy in this game is to start linking words which
have the highest overlap of characters (eg. *abcd* & *cdef* can
be linked to form *abCDef*), or to go for full inclusion (eg. placing
*bc* after *abcd* results in *aBCd*).

But, as is often the case, greed isn't optimal. Consider the words
*baab*, *ab* & *aab*. Lets call them *X*, *Y*
& *Z*. *Y* & *Z* are each substrings of *X*, and can
hence be 'fully included' in *X* as long as they occur after *X*
in our solution. If we choose the order *XYZ* as our solution, the
resulting snake is just *X* and we would get 4 + 9 = 13. Not bad.
But...if we change the order in our solution to be *YZX*, the snake
starts off as *Y*=*ab*, becomes *YZ*=*abaab* with no
accumulated points, but then *X* is fully included in the snake,
giving us a score of 16!!

Discovering these 'better' options requires exploring the combination
space as deeply as possible. With 100 words and 2 minutes of time, I
got as far as trying every possible ordering of 4 words (which is an O(n^4)
search space) to find the best 'quad's. Only combos that have a score
above a mimimum threshold are considered. There can be conflicts. If
*PQRS* and *QRUV* are our top 2 quads, *QR* can't be used
twice, so we can only pick one of these (the one with the higher score of
course). Hopefully we can find a few conflict-free quads, treat each of
them as a single word by adding these new 'multi-words' to the word-list
after removing their constituents. Then look for good trios. Then
look for good pairs. Line up the multi-words (quads, trios & pairs)
selected so far, and with the remaining words that haven't been linked to any other yet,
find the best place in this sequence of multi-words to insert them.

The conflicts can sometimes be resolved safely. eg. just because *RS* is
used in *PQRS* doesn't mean the quad *RSTU* has to be thrown
out for conflict reasons. We might be able to merge the two quads to
form *PQRSTU*. But that might not be possible...eg. if *R* were
fully inside *PQ* and not at the end, that would make the suffixes
of *PQRS* the same as the suffixes of *QS* rather than those
of *RS*, killing the match with the prefixes of *TU*.

And so the greed is tempered, making not the immediately obvious greedy
choice, but delving a little into combinatorial explosion before getting
greedy.

__Non-Deterministic Simulated Annealing__

Starting with the solution from the previous stage, simulated annealing
is run until time is up.

*Neighbors* to the current sequence (of 100 words) are found by
randomly splitting it into four parts *ABCD* (of which *A* &
*D* can be empty) and swapping *B* & *C* to form *ACBD*.
Temperature is rigged to decrease such that it will hit zero right when
time is up, decreasing the likelihood of taking downwards steps as time
runs out. The best sequence so far (which until we get an improvement is
our local maximum) is kept track of, and after some number of steps (eg. 100)
with no gain in score, (and possibly much loss) the current sequence
is reset to the best so far. This gives the annealing process a chance to
head in a different direction from the local maximum than the last time
around.

When time runs out, the highest-scoring sequence found (which corresponds to
our local maximum) is our solution.

On the two word-lists from our competition and the one from the previous
year's competition, the greedy solution was improved by 5-10%. Not that
much, but the greedy solution was already pretty good to start off with
anyway!