It’s just data

Spider Threads

Joe Gregorio: This branch includes httplib2 to handle the fetching. I have added a new config option ‘spider_threads’ that you can set to the number of threads you want to use when spidering. The default is 0. When spider_threads is set to zero httplib2 is not used and feedparser is used to fetch the feeds. Note that the threading only applies to HTTP(S) URIs, all other URI types are done in the main thread and handled by feedparser. All parsing is also handled only in the main thread.

I’ve merged this work into my branch.  While there is more work to be done (e.g., better reporting of status codes, IRI support) a rather dramatic speedup is possible with this option, even with a relatively low setting, like 5.

You can see this in action by viewing my log file.


Detecthing Not Modified Reliably

Yesterday, I more fully integrated Joe’s threading work into Venus.  From an end user’s perspective, one benefit of this is that the first time you specify spider_threads, you will see immediate benefit as the Last-Mo... [more]

Trackback from Sam Ruby

at

Add your comment