ProcessDocuments
The ProcessDocuments utility provides a general
facility for processing a set of documents and then writing out these
documents with some of the resulting annotations encoded in XML.
It is invoked by
xjet Jet.ProcessDocuments
propsFile docList inputDir inputSuffix outputDir outputSuffix
where
- propsFile
- is the Jet properties file specifying how each document
is to be processed
- docList
- is a file containing a list of document file names, one per line
- inputDir
- is the directory containing the files to be processed
- inputSuffix
- is the file extension to be added to document name to obtain name of output file
- outputDir
- is the directory containing output files
- outputSuffix
- is the file extension to be added to document name to obtain name of output file
For example, if doclist contains
a
b
c
then
xjet Jet.ProcessDocuments props doclist in txt out nam
will
read in/a.txt and write out/a.nam
read in/b.txt and write out/b.nam
read in/c.txt and write out/c.nam
The set of annotations which are written out as XML tags in the
output file are determined by property WriteSGML.type
in the props file. The value of this property may be
a single annotation type, a comma-separated list of
annotation types, or "all".