Venus Rising
Today I’m making available Planet Venus, which can only be described as a radical refactoring of the Planet code base.
The reasons for a radical refactoring are several. My primary reason is that I find that I don’t enjoy working on codebases that don’t have an automated regression test suite. Furthermore, as codebases with such a test suite tend, in my experience, to be more modular; it generally is difficult to bolt on such a test suite afterwards. On this point, I would glad to be proven wrong.
A second reason is that a number of people have identified memory consumption as a performance issue with the existing planet. The current design is to read all the content and meta data associated with every post for every feed that you are subscribed to into memory, update it, write it, and then make multiple passes through this data. While CPU utilization issues can be mitigated with tools like nice, memory issues are a bit harder to address.
A final reason is that there has been an as-of-yet unmet demand to provide for customization. Conceptually all of the use cases for GreaseMonkey apply equally to feeds, and in particular, the canonical one of wanting to use the Coral content network selectively applies here too. This is difficult for feeds, not only because of the various feed formats are out there or due to invalid feeds, but also because some elements may contain plain text, escaped HTML, or embedded XHTML. Having all markup be pre-sanitized and converted to well formed XHTML will all relative URI references pre-resolved makes the job of producing a plugin script much easier.
This is a work in progress, and not really even ready for experimental use just yet. I’ve been working on it slowly over a period of time, and this week I happened to have extended periods without network access, and this was something I could play with offline.
If you have an existing planet and want to try this out, take your config.ini, change your cache_directory to point to an empty directory, and run the following commands:
python spider.py config.ini python splice.py config.ini > examples/index.html
While I don’t yet have template support (patches welcome), I do have a sample xslt file that will produce something recognizable.
Cool stuff...
Looks like there might be a glitch in the tarify.cgi config though as the generated tarball and zips are empty.
Posted by Ryan Cox atSam Ruby: Venus Rising
wearehugh : Sam Ruby: Venus Rising - planet++ Tags : feedparser feeds planet...Excerpt from HotLinks - Level 1 at
I noticed your normalization routine is very small. Is that really sufficient to clean feeds? If so, that’s surprisingly... um, small.
Posted by Ian Bicking at
I noticed your normalization routine is very small. Is that really sufficient to clean feeds? If so, that’s surprisingly... um, small.
UFP and BeautifulSoup do the heavy lifting, I’m just handling what’s left over.
Posted by Sam Ruby atHi Sam,
2 bugs...
#1: add this feed in config.ini ([link]) and there is a infinite recursion as follows:
File “/usr/lib/python2.4/site-packages/xmlplus/dom/ext/__init_.py”, line 231, in GetAllNs
parent_nss = GetAllNs(node.parentNode)
File “/usr/lib/python2.4/site-packages/xmlplus/dom/ext/__init_.py”, line 231, in GetAllNs
parent_nss = GetAllNs(node.parentNode)
File “/usr/lib/python2.4/site-packages/xmlplus/dom/ext/__init_.py”, line 221, in GetAllNs
for attr in node.attributes.values():
File “/usr/lib/python2.4/site-packages/_xmlplus/dom/minidom.py”, line 827, in _get_attributes
return NamedNodeMap(self._attrs, self._attrsNS, self)
RuntimeError: maximum recursion depth exceeded
#2: On Windows, the files are not getting created because the id(?) has invalid characters for a WinXP platform.
DEBUG:planet.runner:Socket timeout set to 20 seconds
INFO:planet.runner:Updating feed [link]
Traceback (most recent call last):
File “spider.py”, line 12, in ?
spider.spiderPlanet(sys.argv[1])
File “planet\spider.py”, line 86, in spiderPlanet
spiderFeed(feed)
File “planet\spider.py”, line 68, in spiderFeed
file = open(out,'w')
IOError: [Errno 2] No such file or directory: ‘c:\\junk\\planet\\cache\\tag:blogger.com,1999:blog-20638905.post-115500430463204032’
thanks,
dims
I noticed your normalization routine is very small. Is that really sufficient to clean feeds?
Sam has recently committed several patches to UFP and BeautifulSoup to allow them to take whatever random crap they find in feeds and turn them into well-formed XHTML. It’s quite impressive, actually, and this project is kind of the culmination of that effort.
Posted by Mark atadd this feed in config.ini ([link]) and there is a infinite recursion
My guess is that the Python runtime library is jumping to a conclusion. Follow that link and scroll down. Look closely at the bottom of the entry entitled A Trip to 19th Century - America. I can only imagine what that looks like after first the UFP and then BeautifulSoup got processed it.
On Windows, the files are not getting created because the id(?) has invalid characters for a WinXP platform.
OK, I’ve verified that colon characters are a problem on win32. Now the question is whether the mapping should be platform specific, or whether the ability to migrate caches to another architecture is an important feature.
Posted by Sam Ruby at“Mars will follow Earth, and will be in Ruby.”
Oh man! There is no planet sun or star could hold you, if you but knew what you are.
Posted by Bill de hOra atFurthermore, as codebases with such a test suite tend, in my experience, to be more modular; it generally is difficult to bolt on such a test suite afterwards. On this point, I would glad to be proven wrong.
I agree completely. There is a great book that makes the work a little easier though: Michael C. Feathers' Working With Legacy Code. He even defines legacy code as “code without tests”. :)
Posted by Michal Wallace atSam,
For the “colon characters are a problem on win32”, may i suggest a simple urlencode()/urldecode() which i believe will work across architectures?
thanks,
dims
Taking a look at what the original purpose of the filename function was, and at my existing cache, I decided to go for shorter, more cruft free names; in the process I made the names Win32 file system compatible.
Those that ignored my advice and deployed this code and chose to update would be well advised to flush their cache.
I also made a change to treat pretty-printer errors (including the near-infinite recursion) as non-fatal.
Posted by Sam Ruby atSam,
the recursion is gone now. will try win32 later and let u know if there’s a problem.
thanks,
dims
[from ade] Sam Ruby: Venus Rising
“Planet Venus, which can only be described as a radical refactoring of the Planet code base” Now all we need are regular releases as opposed to the snapshot-of-the-month club and I’ll be a happy bunny...Excerpt from del.icio.us/network/nephariuz at
Btw, anyone interested in this sort of thing might also want to check out Plagger.
Posted by Aristotle Pagaltzis at
links for 2006-08-17
Bare Naked App » Blog Archive » Displaying percentages (tags: ajax css webdesign ui) autotut: Using GNU auto{conf,make,header} (tags: development autoconf automake howto) PycURL Home Page (tags: python curl http lib) Brad Choate: ack (tags: cli...Excerpt from Breyten's Dev Blog at
links for 2006-08-17
From the blogroll… Why use anything else? Venus Rising Google Talk Adds Voice Mail, File Sharing...Excerpt from The Robinson House at
For the ghosts of the Lazy Web
I was thinking of an idea to leverage FOAF, but it’s probably nothing new. I want to leverage foaf:OnlineAccount more and have a service that using FOAF generates an OPML file to all of the user’s content distributed on the Web for...Excerpt from Elias Torres at
I’m the lazy web
My last post was on making use of foaf:OnlineAccount information found in FOAF files to create a complete OPML or better yet feed (or personal planet) of all the information your friends are dumping all over the web. As already stated, I...Excerpt from Elias Torres at
links for 2006-08-17
the dreaming tree » Blog Archive » bodies (tags: gfmorris_comment) Through a Glass, Darkly » Back to school, back to school, to prove to dad that I’m not a fool. The sound I just heard was Kari vowing to never speak to me again. Or something....Excerpt from Geof F. Morris's Indiana Jones School of Management at
Bloglines working to make things better
Bloglines is still having difficulties with subscriptions and posts: yesterday evening it happened again, and I lost unread entries from random subscribed weblogs. At least, this time their problem report seems to be a little bit more concerned than...Excerpt from The Long Dark Tea-time of the Blog at
Mars will follow Earth, and will be in Ruby.
Do you mean this Mars?
Posted by Giulio Piancastelli atDo you mean this Mars?
Looks like a name-squatter.
- Bugs ( 0 open /0 total )
- Support Requests ( 0 open /0 total )
- Patches ( 0 open /0 total )
- Feature Requests ( 0 open /0 total )
- Forums Public Forums ( 2 messages in 2 forums )
- To DoSurveys Surveys ( 0 surveys )
- SCM Repository (CVS: 0 commits, 0 adds) (SVN: 0 updates, 0 adds)
I get the following :
$ sudo python /var/www/venus/splice.py /var/www/planet/planets/quebecois.eu/config2.ini > /var/www/unix.tv/index.html
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/index.html.tmpl
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/atom.xml.tmpl
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/rss20.xml.tmpl
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/rss10.xml.tmpl
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/opml.xml.tmpl
ERROR:planet.runner:Unable to locate template /var/www/planet/planets/quebecois.eu/foafroll.xml.tmpl
user@server:/var/www/planet/planets/quebecois.eu$ ll
total 72K
4,0K -rw-r--r-- 1 user user 2,2K 2006-07-27 01:53 atom.xml.tmpl
4,0K -rw-r--r-- 1 user user 2,9K 2006-09-10 13:59 atom.xml.tmplc
8,0K -rw-r--r-- 1 user user 5,7K 2006-10-01 17:23 config2.ini
8,0K -rw-r--r-- 1 user user 5,7K 2006-09-30 01:51 config.ini
4,0K -rw-r--r-- 1 user user 921 2006-07-27 01:53 foafroll.xml.tmpl
4,0K -rw-r--r-- 1 user user 1,2K 2006-09-10 13:59 foafroll.xml.tmplc
8,0K -rw-r--r-- 1 user user 5,0K 2006-09-29 03:05 index.html.tmpl
8,0K -rw-r--r-- 1 user user 5,4K 2006-09-29 03:05 index.html.tmplc
4,0K -rw-r--r-- 1 user user 626 2006-07-27 01:53 opml.xml.tmpl
4,0K -rw-r--r-- 1 user user 971 2006-09-10 13:59 opml.xml.tmplc
4,0K -rw-r--r-- 1 user user 1,2K 2006-07-27 01:53 rss10.xml.tmpl
4,0K -rw-r--r-- 1 user user 1,5K 2006-09-10 13:59 rss10.xml.tmplc
4,0K -rw-r--r-- 1 user user 838 2006-07-27 01:53 rss20.xml.tmpl
4,0K -rw-r--r-- 1 user user 1,3K 2006-09-10 13:59 rss20.xml.tmplc
Gabriel: two things.
First, can you try running “python runtests.py” to verify that all is well?
Then can you set the following in your config.ini and try again?
log_level = INFO
You might need to set template_directories in your config.ini file.
P.S. There now is a planet.py main program.
Thanks Sam!
Another thing ... I would like to activate the coral_cdn_filter.py filter but I simply don’t know what to do?
I simply want to be able to view the blogger images on my planet.
Posted by Gabriel atWhat you need to do is to add:
filters = coral_cdn_filter.py
to either the [planet] section, or to each of the feeds on which you want this filter to be run.
Note: filters are run before the data is written to the cache, so you will either need to delete the cache or wait until new entries appear.
Posted by Sam Ruby atOK thanks I now see the changes to the index.html, but the image still don’t show up for the blogger feeds ... Maybe I’m missing something?
When I test an URL, for example [link] and still get the forbidden error 403.
My test planet running venus is at [link]
Posted by Gabriel atПланета Венера
Днес смених софтуера, който задвижва “Българска свободна планета” и “Планета GNOME”. Всъщност “смяна” е силно казано, защото проектът Venus e напра... [more]Trackback from Arcane Lore at
Ясен Праматаров: Планета Венера
Днес смених софтуера, който задвижва “Българска свободна планета” и “Планета GNOME”. Всъщност “смяна” е силно казано, защото проектът Venus e направен на базата на добре познатия Planet. Sam Ruby е много енергичен ентусиаст явно - след като дълго...Excerpt from Българска свободна планета at
Bloglines vs. Google Reader
Estou ficando meio decepcionado com o Bloglines pela confusão que ele anda fazendo atualmente em identificar entradas novas em um blog, principalmente quando eu peço para o programa manter entradas como novas. Eu andei experimentando com o Google...Excerpt from Superfície Reflexiva at
Things that inspire envy of computer languages I don’t use
Ruby, Python, Perl, PHP, JSP, etc. For web development, the languages themselves have certain strengths, but they also, eventually, acquire software projects that make them famous. Ruby, of course, has Rails, and through that, all the software of 37...Excerpt from Closer To The Ideal at
Why I keep using my own pulse
I’ve been a fan of personal feed aggregation services for a long time. I’ve been trying: Suprglu iStalkr Feedfriend and Plaxo Pulse I’ve even built my own pulse once and twice . Now Plaxo announces something new: The Plaxo Pulse Widget allows you to...Excerpt from Lars Trieloff's Collaboration Weblog at
Why I keep using my own pulse
I’ve been a fan of personal feed aggregation services for a long time. I’ve been trying: Suprglu iStalkr Feedfriend and Plaxo Pulse I’ve even built my own pulse once and twice . Now Plaxo announces something new: The Plaxo Pulse Widget allows you to...Excerpt from Lars Trieloff: Recent Changes on SuprGlu at
[from wearehugh] Sam Ruby: Venus Rising
planet++...Excerpt from del.icio.us/network/blech at