[nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX
Daniel Convissor
danielc at analysisandsolutions.com
Thu Mar 4 21:12:19 EST 2004
On Thu, Mar 04, 2004 at 08:42:55PM -0500, DeWitt, Michael wrote:
> I am using xpdf to get text out of pdfs.
... IF you've got X windows going.
Let's see what Panix has...
d> apropos pdf | grep text
latex, elatex, lambda, pdflatex (1) - structured text formatting and
typesetting
pdftotext (1) - Portable Document Format (PDF) to text converter
(version 2.02)
That second one looks like it'll fit the bill.
d> man pdftotext
... snip ...
Pdftotext reads the PDF file, PDF-file, and writes a text
file, text-file. If text-file is not specified, pdftotext
converts file.pdf to file.txt. If text-file is '-', the
text is sent to stdout.
... snip ...
d> pdftotext afile.pdf - | grep stringicareabout
Works like a charm.
Enjoy,
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
More information about the talk
mailing list