are push technologies. Mark's
Automatic linkbacks and my (unnamed)
excerpting functions are pull technologies triggered by
referers. The purpose of each is roughly the same: to
bring my news to me instead of having me have to go foraging for
I actually experimented with mark's code for a bit, but the
biggest problem I had was that it looked like it would require
continual investment to weed out the ever growing number of
I was also concerned about the feedback loop that could occur given
the amount of back traffic I get whenever I mention anything on
By relying on links, I'm also weeding out people who don't have
RSS feeds or
can't follow instructions. Unfortunately, I'm also
juicy bits (from a data mining perspective) from their
weblogs. For my purposes, feeds like
are best: they contain all the rich content yet provide a
prepackaged, simple and clean excerpt for me to deal with. Or
Hammersley's who chose to take the initiative and send the
excerpts to me.
One thing that has pleased me is that I have noted that several
people have added link information that wasn't previously
Portals and such are not as large a problem as I imagined, although I do manually maintain a list. Some code could probably be added to check for multiple inbound links on the page, since most portals will list the most recent 5 items (or whatever) from your feed. If a page has all of the most recent 5 links, chances are it's a portal.
The feedback loop from other people doing the same thing is a problem, though. I noticed this when I linked to something Dave Johnson wrote. He now implements a similar system of "further reading" linkbacks, and my linkback script picked up his linkback to my linkback. Or something. Anyway, it was just a bunch of machines talking to each other, which is fine, but I generally don't want to expose that in my UI.
I always said that my "further reading" was like a prisoner's dilemma: great for me as long as nobody else did it. Once everybody's doing it, the feedback loops start and the signal-to-noise ratio skyrockets and it becomes worthless.
oh, bah. the juicy bits were in the <title> and précis line in my feed. however, see http://ken.coar.org/blog/index?entry=71 -- if you want the entire content, add 'words=0' to the GET arguments. (use some other number to get an appropriately-sized excerpt.) this allows you to select just how much content you want in my RSS response to your query.
Does the feedback loop make linkback totally impractical? I mean... even if you implement linkback using RSS feeds gleaned from link tags, somebody could start refererencing linkbacks inside RSS feeds (just as some bloggers put reader comments inside RSS feeds) and you'd have another feedback loop.
What about if you only scan their HTML file one initial time for the linkback at a certain url, and from then on simply incremeent the visitor count in your database or file or whatever, when you get linkbacks from there? Surely then the flow would go:
They link to you. People follow link to your site. Your system scans their site and finds their link, produces an excerpt on your page. They detect visitors going to their site from yours, scan yours, add a linkback. People can follow either their linkback or their original link to your site, and it doesn't matter which, all your script does is increment the visitor count by one and use the same excerpt.
Dave - I could put comments in my "regular" feed, but I don't. Furthermore, I don't know anyone who does. This makes it easy to break the loop.
To show the extent of the problem, here's a true story - a fair number of my referers are from 0xdecafbad. It seems that people frequently use my blogroll, and he is at the top. He has a "recent referrers" list, and since I'm often on it, I get hits. If I scan his html, these are valid links to specific blog entries...
Lach, you are correct, I can optimize this a bit further. By the way, I am not tracking hits by post by referrer.
For what it is worth, when not debugging my script, I validate links only once an hour. If there are multiple hits within the hour, I still only check once. If multiple distinct pages reference the same rss feed, I again will only check that once per hour.
feh. type too fast, see what you get. changes, sam: see http://ken.coar.org/blog/index?entry=72 . you want the entire content for mining? use "?words=all&sanitise=false" on the RSS URL. (which can be for the current selection of entries [10 by default, but customisable with "count=n"], a specific entry, all entries for a particular month or day, or all entries within a particular timeframe).
hope this helps..
heh.. one thing i notice: people who rely on automatic interblog communication are sometimes treating those who haven't come up to that level as second-class citizens, or at least less important than those on the bleeding edge. for instance, despite the glory of mark pilgrim's who-has-linked-to-me referral scanback concept, somehow it hasn't managed to locate any of my referrals to him. i wonder why? somehow i doubt that any trackback references to his articles get ignored. or maybe i'm just paranoid. yeah, that's it.
RSS controls and the [in]glory of browser tailoring Updated: Thursday, 16 January 2003 07:01 EST Mark Pilgrim has decided to deal with client differences in CSS handling by having browser-specific stylesheets. That was one of the things I...
if rss.find('ken.coar.org/blog/index.rss')>0: rss+='?words=all&sanitise=false'
I've added <em> to the list of tags I support. Originally, this comment field was intended to be text only, but those dang users...
My rss2 feed has an indication of the number of comments. There also is a separate feed for comments in various flavors of rss. In fact, you can get an rss feed for any single blog entry by simply replacing ".html" with ".rss" or ".rss2" or ".txt" or ".esf" or...
Finally, there seems to be something I am still debugging in my script... it complains about a unicode error, but unfortunately (as near as I can tell) Python is reporting it on the wrong line. Expect continued bursts of activity on your rss feed as I attempt to isolate and squash...
Sam Ruby and Mark Pilgrim, who both have weblogs with automatic linkback implementations, both linked to my Introducing Automatic Linkbacks in Roller post the other day. This created a linkback feedback loop. Luckily, I anticipated that some...
Matt has put together a nice Roller 0.9.7 TODO list for himself. Cool stuff. The "remember me" feature sounds especially useful. Apart from finishing-up the linkback feature, the main thing I would like to do is to fix comments. I would like to...
I still owe you a write-up of the Roller linkback implementation. I'll get around to that when I get around to finishing it. I'm way too busy with other things to work on linkbacks right now, but I have been giving it some thought. The linkback...
Cool. I already use Technorati to help me find links to excerpt. What interests me is Jabber alerts on comments of any kind. First to me (of course!), and then perhaps to those who register interest in such thing. And/or IRC, which...
Trackback from Sam Ruby
Thoughts on what has worked, and what has not worked so well, with my automated linkback excerpter function thingamabob TrackBack and Pingback are push technologies. Mark's Automatic linkbacks and my (unnamed) excerpting functions are pull technol...
I just saw the WordPress Trackback Validator plugin fly by my aggregator and immediately installed it. I knew Dan online back in middle school, so with this endorsement, I installed it instantly: The Computer Security Lab at Rice just released the...