I was about to deploy Mark's latest
toy on my site, but now I'm not so sure it would be a good
idea. Over the past few days I have had 82 hits from Mark's
site, with 36 being to specific blog entries. The problem is
that none of the excerpts which would be Mark's words. (The
actually wouldn't be mine either, try
A random thought that might both solve this problem and provide
even more reliable excerpts: if what is obtained from the targetURL
has a
RSS autodiscovery link, then perhaps the primary RSS feed
for the site should be harvested instead.
The bug you noticed (not dividing on LI) has been fixed and will be in the next version. The more general comment of feedback loops is valid.
There's already a DIV around the further reading list with a class="linkbacks". I could modify the script to filter out everything within that DIV, but that's only a solution to feedback loops between linkbackparser.py sites. It doesn't solve the more general problem that the only reason I'm linking to you is because some automated script noticed that you're linking to me. (No offense. :)
Others have noticed this. Back when Ben Hammersley was syndicating my RSS feed in a box, my script would notice the clickthroughs and put them in the list, people visiting my site would click through to his site, only to find the automated link back to my site. Not a satisfying experience. I try to filter out dedicated portal pages, but it's a manual process and ultimately a losing battle.
An almost-solution to feedback: strip tags from the excerpt, and then reject the referrer if you've just gotten text from your own entry. You would end up rejecting a style of quoting I used to use, [blockquote]the quote[a href="the source"][/blockquote], but then that isn't really something that you want to be using as an excerpt anyway.
Just making a random note that while Mark indicates that the problem mentioned above will be fixed in the next version, he slyly doesn't mention when a next version might be.
Why this is interesting to me is that I've prototyped integrating Mark's linkbackparser, and may be able to find a few spare moments this weekend...
The one additional piece of information I'd like this script to return is the title. The value of the nearest preceeding title or header tag would do just fine.
At that point, I would have an item title, link, and description... sound familiar?