APFtoXML
The APFtoXML utility extracts information from an
Ace APF file and produces a file with the selected information marked
by
in-line XML tags such as <ENAMEX TYPE=type> for names.  It
is
invoked by
xjet
AceJet.APFtoXML year apf-directory
output-directory filelist apf-extension output-extension [gazetteer pre-dictionary] flag flag ...
where
  - year 
 
  - is one of 2003, 2004, or 2005, reflecting
the different APF formats used
 
  - apf-directory
 
  - is the directory which contains both
    the text and apf files
 
  - output-directory
 
  - is the directory which will contain the files with in-line XML
tags
 
  - filelist
 
  - is a file containing a list of the
documents to
be processed, one per line;  text and apf files are relative to
apf-directory;  output files are relative to
output-directory.  If a line in this file is F, the text file is read from F.sgm, the apf file is read from F.apf-extension, and the output file is F.output-extension .
 
  - apf-extension
 
  file extension for apf files (added to document name)
  - output-extension
 
  file extension for output files (added to document name)
For 2004, pre-nominals were tagged PRE whether they were names or not,
so additional information is required to identify names. This is
provided by two additional files,
  - gazetteer
 
  - a Jet gazetteer, listing country and state names
 
  - pre-dictionary
 
  - a list of words, indicating for each whether or not they are names
    
   
  - flag
 
  - one or more of sentences
timex mentions types names, indicating a type of information to
be included in the output files
    sentences: 
output <sentence>
tags
    timex: 
output <timex2> tags
    mentions: 
output <mention entity=n>
tags indicating co-reference relations
    types: 
include ACE type and subtype features with mention tags
    names: 
include ENAMEX tags