It’s just data

Comments Please

Tim Bray:

I hope soon to begin implementing a comment system for ongoing. This space is my notebook where I’ll work out the design. Since, as of this writing, the system exists only in theory, if you have a suggestion you’ll have to send me an email. I’ll publish the helpful ones.

I have to send you an email?  I think not.  ;-)

Below are a few suggestions.  Use or ignore as you see fit.


First, I think ongoing would be better with comments.


My vote: make sure that every comment has a URI.  And provide a comment feed that crosses all entries.  And an alternate HTML version of that feed.

Second, I’d like to take this occasion to learn at least one new technology.


General Outline

For every ongoing fragment, there will be a little directory tree, containing one subdirectory each for incoming, rejected, and accepted comments. The comments will live in these directories, one file each, with long names consisting of a timestamp and a random number. These directories will generally not be publicly Web-accessible.

My comments are also one per file.  Also with a timestamp.  I haven’t found the need for a random number, but that sounds like a good idea.

All comments will always be moderated. There will be a simple Web interface offered to people to compose comments, and another for me to use to approve or reject them. I’m not inclined to building a safe HTML editing control from scratch, but will look around for components that will let people compose somewhat-rich comments without involving extra work or risk.

There are two points here.  One is the rich editing.  My approach is simple: escape everything that comes in, and then selectively un-escape a few things that are useful.  My recommendation: the first thing to look for is something that works well with utf-8.  That’s harder to retrofit.

As to moderation, I’m against it for two reasons.  The minor one is that it impedes the flow of a conversation.  Discussions tend to die down in 48 hours or less.  A time delay of even 12 hours can impede the conversation.

The more major one is that I tend to prefer to avoid obligations.  If I had to approve comments, that would quickly become an obligation, and I would likely dread it.  And my weblog is entirely here at the whim of my enjoyment.  If it ever stops being fun, this blog is history.

Comments may be submitted and (while still in the incoming directory) deleted or updated using the Atom Publishing Protocol, although I’m not sure how useful that will be.

I supported a predecessor of this for a while.  The only person who used it with any frequency was Dare.  Might be a chicken and egg thing.

Every time a batch of comments is accepted, all the accepted comments will be built into a static Web-accessible page and a small XML file will be created containing the number of comments; this will be used to decorate the body of the fragment via XMLHttpRequest.

Mine is 100% static.

There will be no built-in authentication system, because the Internet has enough of these. I will, however, eagerly participate in any shared-identity system that seems standardized and doesn’t feel like a vendor land-grab. The use of such a system might make the Atom Protocol useful rather than incidental.

I’ve also flirted with these, and have yet to be convinced.

IMHO, the best identity system is to have people post on their own blogs, and for you to include links to these posts in your comments.  Which, of course, creates an entirely new vector for spammers...

The comments will be threaded; that is to say, you will be able to attach a comment either to an ongoing fragment or to another comment.

I’ve seen that done, but I’ve intentionally avoided that feature.  If you get hundreds of comments on a given entry, it makes sense.  Otherwise, YAGNI

Issue: Release?

I will never release the core publishing software for ongoing to the world, simply because I wrote it in a big hurry and with no other objective than supporting my idiosyncratic writing workflow. However, if this comment system proves to be useful and generally applicable, I’ll consider releasing it.

Mine’s not “released”, but available.  I’m not proud.

Issue: Spam

I suspect that I will not be subject to serious, automated, large-scale spam attacks, simply because the number of instances of the system will, initially, be one. However, because I might decide to release it, I will design in a plugin-based spam-fighter module.

My comment system is unique, but I got 1,246 posts to my weblog yesterday.  A few of them were real.  Others were previews.  The rest were not.

And, no, that’s not a typo.  One thousand, two hundred, and fourty six posts.

You may have a one-off system.  But you also have a precious commodity: a relatively high Google rank.  Many spammers are real people in foreign places.

I have some ideas about spam-fighting approaches that I haven’t seen anyone else try.

Don’t underestimate the value of experience. 

First, a throttle is essential.  More than a few posts in a row from the same IP address, or with the same content, or which specifies the same URL must be stopped in order to contain the cleanup efforts.

A forced preview stops a surprisingly large number of the spam attempts.

My favorite spam avoidance technique on my blog is a big red warning sign that only shows to strangers.  And my favorite aspect of it?  It’s a lie.  The sign is like the fake security signs you see on some people’s lawns.  I don’t have a moderation system to back this up.

Yes, I do search for a few spam words.  But I try to retire them quickly.  More effective is looking for lame attempts at bad HTML techniques.  For example, position: absolute, target="_blank", [url], [link] or even HREF (all caps).

These and all my other countermeasures are hidden in plain sight.

Issue: Platform

I am subject to the seductive emanations from the Rails Rabble, and one of the reasons I thought I’d do this was to learn it. Having said that, try as I might, I don’t see why I’d need to use a database, and Rails’ sweet spot is generally considered to be database-backed apps. On the other hand, I have an intuition that in a few years, Ruby the language is going to loom larger than Rails the framework; so I definitely want an excuse to get up close and personal with the language. So, I dunno; does Rails offer enough other good things to make it an attractive choice even if you don’t need a database? There’s all that model/view/controller goodness, but I’ve always found the MVC approach a little more attractive in theory than in practice. Hmm... the jury is out. Is there another framework whose sweet spot is a little less database-centric? Or is Rails the way to go?

ActiveRecord is entirely optional in Rails.

My biggest issue with Rails is that performance is an issue with CGI, and that’s what my hosting provider offers currently.  FastCGI or better is a requirement.

Ruby and Rails are hot, and I’m definitely on that bandwagon, but I wouldn’t count Python out.  It is the language of BitTorrent and BZR.  And seems to be the language of preference for all things Ubuntu.

PHP and Perl might be less “hot” but they are definitely work-horses.