Microarray Verifier Algorithm

Here is the code: Ver.java

Basic idea: Combine the strands in the microarray file to form a string that is shorter than the dna string. Then, try to add bases to form another string that satisfies the length, the number of each base and microarray matches. Since finding the shortest string that contains a set of shorter strings (or a superstring) is an np problem, this algorithm is also np. However, sometimes, the shortest string might not provide the correct solution. So, my strategy is just to go through a couple thousand permutations of the strands in the microarray file with some preferences, namely shortest first.

The program is divided into three parts.

The first part: reconstructMinSuper
Try to find a superstring recursively starting with the strand that matches the front of the dna sequence.

The second part: findInteriorMinSuper
Try to construct a superstring by combining the ones with the most overlap first. This part might generate a lot of identical strings, but this is easier then to save all the strings generated so far and compare them to the newly generated strings. There used to be a test to stop the program here if the shortest superstring seems to equal the dna sequence, but since there are more than one shortest superstrings sometimes, the test was removed.

The third part: reconstructB
This is a pure brute force approach that tries to generate all permutations from the strands. Originally, I used this to test the capability of the other two parts. If the first two parts cannot find another sequence that satisfies the conditions, this part would most likely fail also. But since I alrealy wrote this part, I include this in the verifying process anyway.

All the three parts have n! (n=number of strands) run time if you allow them to run freely. Thus, a maximum recursion count is used to stop each part if that number of recursions has been reached. It would also stop as soon as the program find a suitable sequence that is not equal to the original one. So, even with the recursion count set to 20000 on a sequence of length 100, the program should terminate in about 30 seconds.

BUGS: If you gives it a single strand that is shorter than the dna sequence, sometimes, you would automatically win. I don't have a clue why. But it is so obvious, that you can determine that the microarray would fail just by looking at it.

Usage: java Ver (recursion count) (dna file) (microarray file)

Karven Lam