Algorithm for Wordsnakes Problem

Original Problem Description

My algorithm runs in 2 stages:

Deterministic Tempered Greed

The obvious greedy strategy in this game is to start linking words which have the highest overlap of characters (eg. abcd & cdef can be linked to form abCDef), or to go for full inclusion (eg. placing bc after abcd results in aBCd).

But, as is often the case, greed isn't optimal. Consider the words baab, ab & aab. Lets call them X, Y & Z. Y & Z are each substrings of X, and can hence be 'fully included' in X as long as they occur after X in our solution. If we choose the order XYZ as our solution, the resulting snake is just X and we would get 4 + 9 = 13. Not bad. But...if we change the order in our solution to be YZX, the snake starts off as Y=ab, becomes YZ=abaab with no accumulated points, but then X is fully included in the snake, giving us a score of 16!!

Discovering these 'better' options requires exploring the combination space as deeply as possible. With 100 words and 2 minutes of time, I got as far as trying every possible ordering of 4 words (which is an O(n^4) search space) to find the best 'quad's. Only combos that have a score above a mimimum threshold are considered. There can be conflicts. If PQRS and QRUV are our top 2 quads, QR can't be used twice, so we can only pick one of these (the one with the higher score of course). Hopefully we can find a few conflict-free quads, treat each of them as a single word by adding these new 'multi-words' to the word-list after removing their constituents. Then look for good trios. Then look for good pairs. Line up the multi-words (quads, trios & pairs) selected so far, and with the remaining words that haven't been linked to any other yet, find the best place in this sequence of multi-words to insert them.

The conflicts can sometimes be resolved safely. eg. just because RS is used in PQRS doesn't mean the quad RSTU has to be thrown out for conflict reasons. We might be able to merge the two quads to form PQRSTU. But that might not be if R were fully inside PQ and not at the end, that would make the suffixes of PQRS the same as the suffixes of QS rather than those of RS, killing the match with the prefixes of TU.

And so the greed is tempered, making not the immediately obvious greedy choice, but delving a little into combinatorial explosion before getting greedy.

Non-Deterministic Simulated Annealing

Starting with the solution from the previous stage, simulated annealing is run until time is up.

Neighbors to the current sequence (of 100 words) are found by randomly splitting it into four parts ABCD (of which A & D can be empty) and swapping B & C to form ACBD. Temperature is rigged to decrease such that it will hit zero right when time is up, decreasing the likelihood of taking downwards steps as time runs out. The best sequence so far (which until we get an improvement is our local maximum) is kept track of, and after some number of steps (eg. 100) with no gain in score, (and possibly much loss) the current sequence is reset to the best so far. This gives the annealing process a chance to head in a different direction from the local maximum than the last time around.

When time runs out, the highest-scoring sequence found (which corresponds to our local maximum) is our solution.

On the two word-lists from our competition and the one from the previous year's competition, the greedy solution was improved by 5-10%. Not that much, but the greedy solution was already pretty good to start off with anyway!