From davidk@eskimo.com  Wed Oct 20 21:45:11 2004
Received: from mail.ccom.net (mail.CCOM.NET [64.246.178.210])
	by mx.cs.nyu.edu (8.12.11/8.12.11) with SMTP id i9L1j9Sm024528
	for <shasha@cs.nyu.edu>; Wed, 20 Oct 2004 21:45:09 -0400 (EDT)
Received: (qmail 11566 invoked from network); 20 Oct 2004 18:45:09 -0700
Received: from 216-145-15-170.ipg.ccom.net (HELO eskimo.com) (216.145.15.170)
  by mail.ccom.net with SMTP; 21 Oct 2004 01:45:09 -0000
Message-ID: <417714A2.3080008@eskimo.com>
Date: Wed, 20 Oct 2004 18:45:06 -0700
From: David Kerlick <davidk@eskimo.com>
Reply-To: davidk@eskimo.com
Organization: Visualization Sciences Consulting
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5) Gecko/20031007
X-Accept-Language: en-us, en, de-de, uk
MIME-Version: 1.0
To: sburoff@optonline.net, "'Dennis Shasha'" <shasha@cs.nyu.edu>
CC: joralemonshelly@onebox.com
Subject: Re: PDF to text
References: <0I5I00CJO02I3Y@mta5.srv.hcvlny.cv.net>
In-Reply-To: <0I5I00CJO02I3Y@mta5.srv.hcvlny.cv.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Status: R
Content-Length: 1002

Hello Steve,

Can anyone try the .pdf conversion on WA: King?


http://www.metrokc.gov/elections/2004nov/pollingplaces4GENPPC.pdf


OK guys, I found a free package that does PDF to text. Its called PDFBox.
I've attached a script to run it. Here is what you need to do.

    1. Download PDFBox-0.6.7a.zip from
       http://www.pdfbox.org/ or
       http://aleron.dl.sourceforge.net/sourceforge/pdfbox/PDFBox-0.6.7a.zip
       and install it.
    2. Define the environment variable PDFBoxHome to the full path of the
       PDFBox home directory (the one containing the lib and external
       directories among other things).

You can run it in two different ways. To get the output to the console, do:

     bsh pdfToText.bsh -console <PDF file>

To write the output to a text file, do:

     bsh pdfToText.bsh <PDF file> <text file>

I tested it with the file at 
http://www.co.fort-bend.tx.us/county_services/elections/pollplaces.pdf
and it seems to work OK. Let me know how it goes. Thanks.

Steve

