I write this with my simple blosxom hacker hat
on. I have not been involved in the creation of either
trackback or pingback,
but I have taken a look at both specifications and have implemented
some basic functionallity for each. Here's what I have found:
Automating *back (either one) requires harvesting links from
Automating *back requires retrieving the target pages
Trackback records metadata in only one type of
format. Parsers must look for the two different formats in
which pingback metadata information is recorded.
Trackback allows one to record metadata for multiple blog
entries in a single web page. Pingback only allows you to
record one. The compensating advantage that this provides to
pingback is that it allows you to have pingbacks for non-textual
Trackback pings have title and excerpt
information. Pingback's lack of this information,
coupled with the support for non-textual information reduces most
pingbacks to simple (e.g., [1 2 3]) displays. Compare the
differences for yourself with the two links above.
Trackbacks are done with HTTP GET or POST. Pingbacks are
done with XML RPC. I was able to integrate the former
directly into blosxom. The latter will likely require a
separate package to be installed.
Overall, I'm not sure that I find the compromises required to
support binary data formats compelling enough to justify the
Trackback may only store information in one type of metadata, but it is harder to parse than pingback's. (I guarentee that I can find ways to write the trackback metadata which are valid according to the trackback spec but which will confuse every trackback implementation in existence.)
Pingback doesn't _need_ to record metadata for more than one post per page because the metadata is the same for every post: simply the server's URI. Only trackback requires different metadata for each post.
Pingback pings can have all the extra data you want, such as excerpts, titles, authors, modification times, place of origin, language, etc, simply by reusing the existing HTML metadata standards. Pingback doesn't need to invent yet another redundant way to store metadata as trackback does. (Most pingback implementations show at least the title of the pingback, some show excerpts.)
Trackback _also_ requires XML-RPC if you implement the complete spec.
None of the design decisions behind Pingback were made because of binary data formats, by the way. Only HTML documents were originally intended to be supported. That binary formats can be supported just fell out of the design.
With an XML-RPC interface for new pyblosxom, implementing pingback (at least the server part) should be easy. I was dabbling with it today and at last put it off for a while.
Putting in the full pingback spec requires pyblosxom to become a web client other than just serving pages, unlike trackback, this complicates things somewhat. Sending pingbacks and trackbacks in an automated (and transparent) way is still an issue though (In the blosxom context). Without the help of cron or command line, pyblosxom will need to be 'aware' of it's surroundings (I won't go that far).
I personally don't find SGML link tags any easier or harder to parse then the XML RDF description. Feel free to take a look at my code for parsing trackbacks and break it.
I see no way of determining what the valid targets are for a page pingback. I guess trackback could have separate servers for each blog entry, but I doubt that would be very useful.
I concede that pingback servers could actively go out and retrieve the source and parse it... which requires more work to implement. It also doesn't seem trivial to automatically extract excerpts given a random html page.
I can still do something quite useful with trackback given only http get or post.
Having features "fall out" of an original design is often an indication of a good design.
The pingback spec gives you an exact algorithm to parse the <link> tags -- and you need no more than a regular expression search. You certainly don't need to parse SGML.
To break your RDF parsing code is easy, the user need but include the string "</rdf:RDF>" in either a comment or CDATA block inside the RDF block.
You don't _need_ to determine what the valid targets are for a page pingback -- because every permalink is automatically a valid target.
Trackback already _has_ separate servers for each blog entry (you have to give a different server URI for each permalink you want to trackback, since the entry's ID is part of the server URI!).
Pingback, on the other hand, only needs a single URI: the site's pingback server. Then you can send it any two URIs, and if the target is a permalink on the site then the server does the rest of the work. So with pingback there is no need to ever look at what URIs are pingbackable.
Most pingback servers _do_ actively go out and retrieve the source and parse it. Trackback servers should as well, since it is the only way to ensure you are not getting spammed with untrue links. Yes, it requires work to implement, but in practice if you are going to write a system that doesn't blindly believe anything it is told, you have to do that with trackback too.
Getting excerpts is not that easy, I don't personally bother because I don't see the point. But people have done it (I think simon.incutio.com has example source -- ask him for it).
Having spent a while arguing about the metadata thing (and losing, see http://www.aquarionics.com/misc/blogite/ for the mailing list archives) I'd have to agree with you on that. As for title information, my implimentation grabs the <title> tag if I can find one and the URL if I can't
Since I've upgraded b2, my blog now has trackback and pingbacks. Basically, TB and PB allows entries between different blogs to be linked. Trackback is a manual process - you have to enter the referenced entry whilecomposing your post. Then your ...
In response to your statement that "Trackback is a manual process": for the benefit of any Googlers who are as confused as I was, this statement is true for your blog but not for Trackback in general. Trackback has a well-defined method for auto-discover (documented in the spec). However, not all blog platforms implement it.
WordPress notably supports Trackbacks to WP entries but only supports Trackbacks from WP entries if the user goes to the trouble of finding a Trackback URL and manually entering it in a form. As a result, lots of documents out on the web say things like “Pingback is automatic but Trackback is manual”, a true statement only for WordPress and certain other platforms.
No biggie, just trying to help another misguided reader like myself. :-)
It’s not a big secret: I’m writing blogging software. This, in a sense, sucks because it’s already been written many, many times; while I coded I often (like, every 20 minutes) tried to think of how this could be abstracted into a plugin...
Regarding breaking the RDF, why does it matter? In that case, all that happens is the server fails because the author of the page you’re linking to screwed it up. You could just as easily say “what if the author screws up the pingback <link> tag formatting”. It just doesn’t matter if the pingback fails because the RDF is badly formed.
Just because trackback’s spec doesn’t require the server to check the tracking client before accepting its ping, doesn’t mean any sensible programmer won’t implement this checking, just as they’d be stupid to not validate and sanitise the data that comes from it.
As far as I can tell, it’s just another stupid format war that’s created twice as much work for me as a blog software programmer.