NYCPHP Meetup

NYPHP.org

[nycphp-talk] Stripping formatting from a word document

Anirudh Zala arzala at gmail.com
Wed Jul 18 02:28:07 EDT 2007


On Wednesday 18 Jul 2007 01:07:49 csnyder wrote:
> On 7/17/07, Jon Baer <jonbaer at jonbaer.com> wrote:
> > I think he was asking about a .doc file directly?  Im suprised that
> > manipulation of Word docs always comes up on the list + the resources are
> > pretty limited.
> >
> > One project I found a while ago was antiword in which the sources are
> > available:
> >
> >
> > http://www.winfield.demon.nl/
>
> There was a reference earlier to catdoc. The url for that project is
> http://www.wagner.pp.ru/~vitus/software/catdoc/
>
> The changelog shows slightly more recent activity than AntiWord, but I
> suppose it all breaks with Office 2007 (or whatever year they're up to
> in Redmond).

This is very nice command line utility to extract text from Word and 
Powerpoint slides. It runs on most of *nix systems. Can be used to extract 
text (in form of CSV) from XLS files also.

We have been using it very happily for long time.

Thanks

Anirudh Zala



More information about the talk mailing list