Abstract:
We consider the case where a pool of DNA molecules clones both,
flipped and not-flipped, have been cut by restriction enzymes.
Ideally, each clone is cut in the same positions, although in practice
due to errors, this does not always happen. The computational
problem is to determine where the cuts have occurred.
This is a key problem in determining the structure of the original
DNA molecule.
A single molecule is represented by a string of 1's and 0's, with cuts
represented by $1's$. A set of molecules clones (with errors) is
observed, but the orientation/parity of each molecule is unknown.
Clear is that the location of the observed cuts of one molecule are
dependent on the parity: flipping the molecule would result in the
cuts location, as observed, being ``flipped'' .
We propose a Bayesian approach to generate a posterior distribution on
the cuts and parity, given the data. We first present an approximate
algorithm where we attempt to divide the problem into subproblems, but
it is not guaranteed to solve the problem. Then, we propose another
approximate method based on a statistical framework and a mean field
annealing algorithm. It computes the maximum posterior marginal (MPM
estimator) and maximum aposteriori estimate (MAP estimator).
We also provide evidence that the exact solution of the problem is
intractable.