NYCPHP Meetup

NYPHP.org

[nycphp-talk] Php in the twilight zone

Jack Scott lists at jack-scott.com
Fri Apr 21 10:24:04 EDT 2006


On Fri, 2006-04-21 at 16:21 +0300, Iulian Manea wrote:

> The script is used for spidering a site, which is quite big .. so the 20
> minutes isn't that much. But each time the script finds a new link it
> flushes it to the browser, so the connection shouldn't timeout or anything
> ...

This doesn't fix your immediate problem, but if you are on *nix you
could run wget, lynx, or webBot to spider the site and then parse out
those results? 

I have had to do this in the past and used wget to recursively spider a
site and create html files locally. Once that is done I grep the results
and pipe them to sed and/or (g,n)awk to fine tune the desired results.

There are a ton of similar windows utilities out there as well if that
is your platform.

Hope this helps,

Jack






More information about the talk mailing list