APFtoXML

The APFtoXML utility extracts information from an Ace APF file and produces a file with the selected information marked by in-line XML tags such as <ENAMEX TYPE=type> for names.  It is invoked by

xjet AceJet.APFtoXML year apf-directory output-directory filelist apf-extension output-extension [gazetteer pre-dictionary] flag flag ...

where
year
is one of 2003, 2004, or 2005, reflecting the different APF formats used
apf-directory
is the directory which contains both the text and apf files
output-directory
is the directory which will contain the files with in-line XML tags
filelist
is a file containing a list of the documents to be processed, one per line;  text and apf files are relative to apf-directory;  output files are relative to output-directory.  If a line in this file is F, the text file is read from F.sgm, the apf file is read from F.apf-extension, and the output file is F.output-extension .
apf-extension
file extension for apf files (added to document name)
output-extension
file extension for output files (added to document name)
For 2004, pre-nominals were tagged PRE whether they were names or not, so additional information is required to identify names. This is provided by two additional files,
gazetteer
a Jet gazetteer, listing country and state names
pre-dictionary
a list of words, indicating for each whether or not they are names

flag
one or more of sentences timex mentions types names, indicating a type of information to be included in the output files
sentences:  output <sentence> tags
timex:  output <timex2> tags
mentions:  output <mention entity=n> tags indicating co-reference relations
types:  include ACE type and subtype features with mention tags
names:  include ENAMEX tags