PILOT1 EDR scores -- Dec. 15, 2004
ref test Value
UnwF Value-weighted F
Results for 25 documents
ldc1 ldc2
83.5 85.7 91.4
ldc2 ldc1
86.3
85.5 92.9*
ldc1
nyu
80.4 83.3 89.8
ldc2
nyu
80.5 83.2 89.9
ldc1
ibm
65.3 75.4 81.9
ldc2
ibm
67.5 76.1 83.2
ldc1
bbn
78.3 81.3 88.7
ldc2
bbn
80.1 81.5 89.7
Results for 20 documents
ldc1
sra
74.9 80.4 86.9
ldc2
sra
76.6 81.9 87.9
Results for non-weblog documents (21 doc)
ldc1 ldc2
85.0
87.2 92.2
( 87.0
92.9 93.2 for mentions)
* Because treatment of generics in value formula is
asymmetric, value-based scores are not symmetric.
Record (?) low value:
LDC1-LDC2 SUBSTANCES value = -63.8%
Why? Most substances are generic, and so don't contribute to
value;
as a result, 100% value is a small number (about 3, I believe).
A single, crucial conjoined NP in VOA1224 was tagged differently
by the two annotators ... one annotator tagged it as a single entity,
the other as four separate entities. Net result is a value of
-1.50 out of 0.50, or -300% from that sentence.
There were a number of 'typos' in the subtype field. Fixed those
by hand ... not sure how they could have arisen.
Please let me know what further scores you want.
Ralph