PILOT1 EDR scores -- Dec. 15, 2004

ref     test    Value   UnwF    Value-weighted F

Results for 25 documents

ldc1    ldc2    83.5    85.7    91.4
ldc2    ldc1    86.3    85.5    92.9*

ldc1    nyu     80.4    83.3    89.8
ldc2    nyu     80.5    83.2    89.9
ldc1    ibm     65.3    75.4    81.9
ldc2    ibm     67.5    76.1    83.2
ldc1    bbn     78.3    81.3    88.7
ldc2    bbn     80.1    81.5    89.7

Results for 20 documents

ldc1    sra     74.9    80.4    86.9
ldc2    sra     76.6    81.9    87.9

Results for non-weblog documents (21 doc)

ldc1    ldc2    85.0    87.2    92.2
        (       87.0    92.9    93.2    for mentions)

* Because treatment of generics in value formula is   asymmetric, value-based scores are not symmetric.

Record (?) low value:
  LDC1-LDC2 SUBSTANCES value = -63.8%
Why?  Most substances are generic, and so don't contribute to value; as a result, 100% value is a small number (about 3, I believe). A single, crucial conjoined NP in VOA1224 was tagged differently by the two annotators ...  one annotator tagged it as a single entity, the other as four separate entities.  Net result is a value of -1.50 out of 0.50, or -300% from that sentence.

There were a number of 'typos' in the subtype field.  Fixed those by hand ... not sure how they could have arisen.

Please let me know what further scores you want.

Ralph