It’s just data

Meme Tracker in IronPython

Dare Obasanjo: My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby’s meme tracker (source code) from CPython to Iron Python. Sam’s meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions.

More recent code can be found here.  Fetches titles from HTML, handles etags, matches both www. and non-www. versions of a URI.  Handles people who point to things multiple times.  Allows you to group people who tend to all “vote” in bulk.  Note: I consider the alternate link to be a vote too, which gives a small bump to people who post original content vs links.

I’d also recommend that you invest some time into converting from a simple regular expression to a real HTML parser.  You’ll need it anyway for titles.


I avoided using an HTML parser because there isn’t one that ships with the base IronPython + .NET Framework installs. In fact, I haven’t really found an HTML parser for .NET that can handle all the craziness that is the tag soup you can find on the WWW.

Thanks for the links to the most recent versions of the code. I assumed 1 vote per site fixed people who point to things multiple times?  Alternate link as vote is a great idea. Nice.

When do you use the bulk voting feature?

Posted by Dare Obasanjo at

It is not an uncommon occurrence for somebody to point to the same thing across multiple posts.

I’ve seen people who I’m individually subscribed to also post on a group blog, and had instant an meme.  Listing both as part of the same memegroup means that the union of the two (or more) blogs gets a single vote.  This isn’t anywhere near as common as other issues.

Other counter measures have built up over time.  For example, “ensure that somebody new points to this entry.  This guards against groups of related links which several posts point to all.”  This is fairly common.

Posted by Sam Ruby at

My current mememe headache is Technorati tags.  In the venus site that I’m working on right now, the mememe plugin consistently shows me that one certain site likes to link to a certain Technorati tag link.  Are there any countermeasures against this sort of thing showing up in the memes list?

I had previously added in a line to the plugin code to prevent any technorati.com links from showing up in the list, but this was far too manual for my taste.  And besides, I like to be able to merge in the latest updates to the codebase from time to time without having to worry about little hacks like that.

Posted by Scott Johnson at

Dare, if you’re looking at an HTML parser, you can either use html5lib with IronPython (if compatible, I don’t know IronPython at all) or contribute to my Twintsam project ;-)

Posted by Thomas Broyer at

Scott: one certain site consistently linking to something shouldn’t cause any problems — except on planets with very few feeds, and there a meme function tends to report noise.  Even with as many subscriptions as Planet Intertwingly has on “slow news weeks” like now, you will see posts with as few as two links start to creep up into the list.  The current top of the list only has three links!

But beyond that, all I can think of is to create a black list, probably based on regular expressions.  The actual expressions could be in the config.ini.

Posted by Sam Ruby at

Add your comment