# Programming Assignment 2

Assigned: Feb. 8
Due: Mar. 8

### Overview

The following problem is known as EXACT SET COVER: You are given a universal set U of elements and a collection W of subsets of U. The object is to find a subcollection Y of W such that every element of U is in exactly one set in Y.

For instance, suppose that U is the set {a,b,c,d,e,f,g,h,i} and that W is the following collection:

```W1 = {a,d,f,h}
W2 = {b,e,g}
W3 = {e,g}
W4=  {d,e,i}
W5 = {b,d,f,h}
W6 = {a,c,i}
W7 = {c,d,e}
```
Then one solution would be Y={W3,W5,W6}

This assignment has two parts. In Part I, you are to implement the Davis-Putnam algorithm and use it to solve EXACT-SET-COVER. Part II is experimentation.

### PART I: IMPLEMENTATION

In part 1, you will write three programs.

• 1. An implementation of the Davis-Putnam procedure, which takes as input a set of clauses and outputs either a satisfying valuation, or a statement that the clauses cannot be satisfied.
• 2. A front end, which takes as input an instance of the EXACT SET COVER problem and outputs a set of clauses that can be input to (1).
• 3. A back end, which takes as input the output of (1) and translates it into a solution to the original problem.

### Compiling EXACT SET COVER to propositional logic

The EXACT SET COVER problem can be expressed propositionally as follows: Each atom is a set in X. The atom corresponding to set Wi is TRUE if Wi is in Y and false otherwise. There are two kinds of clauses: Let E be an element of U, and let W1, W2 ... Wk be the sets in W that contain E. Then
• 1. Assert W1 V W2 V ... V Wk. At least one of these must be in Y in order to cover E.
• 2. For each pair Wi, Wj both containing E, assert ~Wi V ~Wj. It is not possible that Wi and Wj are both in Y, since they both contain E.
Thus, in the above example, since the element "d" appears in sets W1, W4, W5, and W7, the corresponding clauses would be:
```W1 V W4 V W5 V W7.
~W1 V ~W4.
~W1 V ~W5.
~W1 V ~W7.
~W4 V ~W5.
~W4 V ~W7.
~W5 V ~W7.
```
and similarly for all the other elements in U.

#### Input / Output.

All three programs take their input from a text file produce their output to a text file. (If you want, you may use standard input and output.)

#### Davis-Putnam

The input to the Davis-Putnam procedure has the following form: An atom is denoted by a natural number: 1,2,3 ... The literal P is the same number as atom P; the literal ~P is the negative. A clause is a line of text containing the integers of the corresponding literals. After all the clauses have been given, the next line is the single value 0; anything further in the file is ignored in the execution of the procedure and reproduced at the end of the output file. (This is the mechanism we will use to allow the front end to communicate to the back end.)

The output from the Davis-Putnam procedure has the following form: First, a list of pairs of atom (a natural number) and truth value (either T or F). Second, a line containing the single value 0. Third, the back matter from the input file, reproduced.

Example: Given the input

```1 2 3
-2 3
-3
0
This is a simple example with 3 clauses and 3 atoms.
```
Davis-Putnam will generate the output
```1 T
2 F
3 F
0
This is a simple example with 3 clauses and 3 atoms.
```
This corresponds to the clauses
```P V Q V R.
~Q V R.
~R.
```

If the clauses have no solution, then Davis-Putnam outputs a single line containing a 0, followed by the back-matter in the input file.

Note: Your implementation of Davis-Putnam must work on _any_ set of clauses, not just those that are generated by the EXACT SET COVER program.

An empty line represents the null clause. Davis-Putnam should immediately reject any set of clauses containing the null clause.

You may assume that there are no more than 1000 atoms and no more than 10,000 clauses.

#### Front end

The front end takes as input an instance of the EXACT SET COVER problem and generates as output a set of clauses to be satisfied.

The format of the input contains the following elements:

• A list of the elements of U, in a single line. An element is a character string of up to 2 characters. Elements are separated by white space.
• A list of the sets in W. There is one set per line. Each line contains the line number followed by the elements separated by white space.
Thus, the above problem would be encoded in the following input:
```a b c d e f g h i
1 a d f h
2 b e g
3 e g
4 d e i
5 b d f h
6 a c i
7 c d e
```
You may assume that the line numbers are consecutive and that the input is correctly formatted. You do not have to do any error checking on the input. (The point of including the line number is to make it more readable for humans.) The output consists of
• 1. A set of clauses suitable for inputting to Davis-Putnam as described above. Note that these constraints are already in clausal form (CNF) and therefore you do not have to implement a program to translate arbitrary Boolean formulas to CNF.
• 2. A key for translating the numbers used for propositional atoms in the clauses into sets of elements. This is to allow the front end to communicate to the back end. The format of this is up to you. My suggestion would be a sequence of lines of the form "number, set of elements."

#### Back-end

The back end takes as input the output that Davis-Putnam generates when run on the output of the front end. It generates as output the list of the sets in Y. For instance, one possible output for the above problem would be
```3 e g
5 b d f h
6 a c i
```
If the input indicates that the clauses have no solution, the back end should output the message "NO SOLUTION".

### Another example for Davis-Putnam

The following is the input-output pair just for the Davis-Putnam module --- not the front and back ends --- corresponding to the example in the class notes.

### PART II: EXPERIMENTATION

First, write a program RandomEXS(N,K,S) that takes as input three numbers N, K, and S, and generates an instance of EXACT SET COVER where the set U has N elements; the collection W has K elements; and each set in W has S distinct elements chosen at random from U.

Second, try to map out the phase-transition behavior as a function of K where N=20 and S=4. That is: when K < 5 there are no solutions, so these are not worth bothering with. When K = 5 and not much larger than 5, few random instances will have solutions, and presumably your program will run quickly. If K is sufficiently large (I don't know how large is "sufficiently") then almost all instances will have solutions, and again presumably the program will generally run quickly. At some value of K in between, about half the instances will have solution; at that value and values close to it, the problem will probably run much more slowly.

So begin by trying small values of K and large values of K and then squeeze toward the middle to hone in on the transition value of K as close as you can. At each value of K that you test, you should run at least 10 instances. If this becomes impossibly time consuming for some value of K, then you can report that as a transitional value, and stop search at that point. For each value of K that you test, you should report: (a) the percentage of instances that have a solution; (b) the average and median time required for Davis-Putnam to execute.

### Deliverable

You should email to the TA (a) the source code; (b) instructions for running it; (c) the report on your experimentation. You should also email your report to me, but do not mail me the code or the running instructions. You do not have to email the code for RandomEXS or for running the experiment.