Factor/To do/Spider
- make filters compile somehow
- random sleep
- redirects
- https
- cookies
- connect timeout, page timeout, data timeout, overall timeout, stopping spiders if overall timeout is reached
Not immediately needed
- parallel version
- retry framework
- retry connection-failed
- option to turn off dns caching
- proxies
- option to check if pages exist but not download them
- custom user agent string
- custom http headers
- spidering of results of a spider
- save to database
- save to directories/files
- follow relative links only
- support ftp spidering
- bytes per second download rate limit
- download quota
This revision created on Thu, 2 Oct 2008 01:19:27 by erg