Microarray Verifier Algorithm
Here is the code: Ver.java
Basic idea: Combine the strands in the microarray file to form a string that is shorter than the
dna string. Then, try to add bases to form another string that satisfies the length, the number
of each base and microarray matches. Since finding the shortest string that contains a set of shorter
strings (or a superstring) is an np problem, this algorithm is also np. However, sometimes, the
shortest string might not provide the correct solution. So, my strategy is just to go through a couple
thousand permutations of the strands in the microarray file with some preferences, namely shortest first.
The program is divided into three parts.
The first part: reconstructMinSuper
Try to find a superstring recursively starting with the strand that matches the front of the dna sequence.
The second part: findInteriorMinSuper
Try to construct a superstring by combining the ones with the most overlap first. This part might generate
a lot of identical strings, but this is easier then to save all the strings generated so far and compare
them to the newly generated strings. There used to be a test to stop the program here if the shortest
superstring seems to equal the dna sequence, but since there are more than one shortest superstrings
sometimes, the test was removed.
The third part: reconstructB
This is a pure brute force approach that tries to generate all permutations from the strands. Originally,
I used this to test the capability of the other two parts. If the first two parts cannot find another
sequence that satisfies the conditions, this part would most likely fail also. But since I alrealy wrote
this part, I include this in the verifying process anyway.
All the three parts have n! (n=number of strands) run time if you allow them to run freely. Thus, a
maximum recursion count is used to stop each part if that number of recursions has been reached. It would
also stop as soon as the program find a suitable sequence that is not equal to the original one. So, even
with the recursion count set to 20000 on a sequence of length 100, the program should terminate in
about 30 seconds.
BUGS: If you gives it a single strand that is shorter than the
dna sequence, sometimes, you would automatically win. I don't have a clue why. But it is so obvious,
that you can determine that the microarray would fail just by looking at it.
Usage: java Ver (recursion count) (dna file) (microarray file)