ACE semantic structures

The ACE [Automatic Content Extraction] program of the early and mid 2000's defined a set of semantic structures, including entities, relations, and events. The entities were introduced first, then the relations, and finally the events; the events implemented by Jet are from the 2005 ACE evaluation. The set of entity types and relation types changed somewhat from year to year; Jet ACE can handle several of the variants.

ACE also defined an external representation of these semantic structures, APF (ACE Program Format). For each source document, the ACE system generates an APF document; the original source document is not modified. The APF document contains character offsets with respect to the source document to indicate the provenance of the information extracted. [Note: in counting character offsets in the source document, XML tags are not counted and end-of-lines are counted as a single character, regardless of whether they are actually represented as 1 or 2 characters.]

The interal representation of an APF file is an AceDocument object, which contains zero or more AceEntity, AceValue, AceTimex, AceRelation, and AceEvent objects. (See the Jet javadoc API documentation for details.)

The ACE representations are presented briefly at the NIST ACE web site. Details on all the entity, relation, and event types are available through the ACE web site of the Linguistic Data Consortium.

Entities

Entities are the most basic building blocks of the semantic representation. There are 7 types of entities: persons, organizations, GPEs (geo-political entities: locations with a government), [other] locations, facilities, vehicles, and weapons. Each entity (class AceEntity) has one or more mentions (class AceEntityMention) within the document. Each mention is either a name, a nominal, or a pronominal mention.

Relations

An Ace relation (class AceRelation) specifies some relationship between two Ace entities, such as an employment relation (employee -- employer), a location relation, or a citizenship relation. The specific set of types and subtypes varied a bit from year to year. Each relation (class AceRelation) may be expressed one or more times in a document; each expression becomes an instance of a relation mention (class AceRelationMention). A relation connects two entities; a relation mention connects two entity mentions. For Ace, the two entity mentions connected by a relation mention must appear in the same sentence (this simplifies the annotation task as well as the training procedure for relation extractors). Later versions of Ace included temporal information on relations, but this is not supported by Jet.

Events

Whereas an Ace relation expresses a purely binary relationship, an Ace event (class AceEvent) allows for a trigger and an arbitrary number of arguments, each with a role. The trigger is generally a single word, the word which most directly describes the event (typically a verb or nominalization). In analogy with entities and relations, there may be one or more mentions of the same event in a document; each is an event mention (class AceEventMention). All the arguments of a single event mention must appear within the same sentence. These arguments may be entity mentions, time expression mentions, or value mentions.

A time expression is an expression representing a date or time, following the TIMEX2 standard. Following the pattern for entities, relations, and events, we have AceTimex and AceTimexMention classes, although each AceTimex has but a single AceTimexMention. A value is an event argument other than an entity or time expression; currently crimes, prison sentences, contact information, job titles, and numeric expressions such as percentages and monetary amounts are included. Again, we have the two classes AceValue and AceValueMention, although no reference resolution is done on Ace values, so each AceValue has a single AceValueMention.