Well Formed Entry Issues

External Comments:


Channels are outside of the current discussion (which explicitly focuses on entries), but if you want to talk about them, see EntryChannels. To focus on the related issue of internal/external models, see SiteAndSyndication.

language tag

[MaciejCeglowski] I would suggest adding a language attribute (allowing for multiple languages/entry) to the minimal set.

[R. Soderberg] I second this, with the provision that at least one language/content pair exist, and that no unpaired language/content elements exist.

[JamesSnell] I see language as being an attribute of the content, not the WellFormedEntry itself. This would allow a single WellFormedEntry to contain multiple ContentModules, each targeting a different language.

[TimBray] This discussion may be at the wrong level. What we want to establish is that some part of the content will consist of human-readable text which is of necessity in a human language, and that it's either possible or compulsory to assert which language. But watch out: I may want to include little fragments of text in other languages, as small as a single word, so language tagging needs to operate at a micro, not macro level. So it may be a mistake to expose this in the data model at all.

[MaciejCeglowski] Language information is crucial to make the post searchable. You can always create a pathological case where an entry consists of single words in a dozen languages, but in practice you'll see contiguous blocks of same-language text in a multi-lingual blog. And there are already XHTML constructs to handle tagging at the micro level. All we need is a list of languages on the WellFormedEntry, or ContentModule.

[MarkNottingham] It seems like language information would need to be attached to both the content modules and the title, description, etc. nodes.

[JamesSnell] Perhaps a Language ExtensionModule would be helpful to address this. It could be used to extend the WellFormedEntry, ContentModule and various ExtensionModule data models in a consistent way.

Content and/or Description. To require or not to require?

(Originally from language tag discussion)

[MichaelBernstein, RefactorOk] A WellFormedEntry might actually contain no text (not even a title, unless titles are required), making a language attribute unnecessary. Granted, this might not be ideal (accessibility-wise), but it would still be well-formed.

multiple content modules

Moved to MultipleContentDiscussion, 19June

how many authors

Moved to NumberOfAuthorsDiscussion, 17June


See also PermaLinks and Authors

[TimothyAppnel] What is the unique location pointing to? What is embedded as WellFormedContent? Seems a description (excerpt and/or title or other) is needed.

[JoeGregorio] Yes, let's clarify that the permalink points to the web location of the entry, and not a pointer to the thing the entry is talking about. That is, if in your entry you are commenting on a story in the NYTimes, the permalink should point to your blog entry on the web, and not the NYTimes article.

[ShelleyPowers] There's been discussion offline about the definition behind these items. For instance, what is a 'permalink', what is an 'author'. I think it's important that we define these, and effort has started on this([WWW]here, [WWW]here, and [WWW]here). For instance, if one considers 'author' to be a person, this has major impact on implementation, later, if author can also be multiple people, or an organization or group.

(comments regarding wiki page vs entry moved to WikiPageAsEntry)

Date and Time

[PaulMorriss] Currently a lot of blogs don't have a timezone on the time when a post was made. I assume it is local time, though for server based blogging software a lazy programmer could just put the local time for where the server is rather than the user. What would be the most useful is local time and UTC time. Why? Because local time will tell you something more about the post - someone posting at 3am local time isn't asleep when they normally might be. Universal time will tell you just how current a post is - it might have been posted a minute ago.

[BillKearney] A timestamp should indicate it's offset from UTC, as per the ISO8601 spec. If it doesn't indicate a timezone then the only reasonable conclusion is that it's a UTC timestamp. Otherwise what's local? Better to encourage proper use of timestamps instead of playing guessing games. A timestamp is not a substitute for geopgraphic data, but lots of people make this incorrect assumption.

[KenCoar, RefactorOk] Date parsing is one of the banes of existence when it comes to multi-vendor distributed c/s models and interoperability. I'd like to see a new specification try to get it right; I'm very strongly in favour of a lexically-sortable UTC-based date/time format such as ISO 8601; being able to sort entries chronologically as strings, without requiring parsing into a binary format, is a big win. Compare "2003-06-23T07:38:00-04:00" and "2003-06-23T19:38:00+02:00" versus "Tue, 23 Jun 2003 11:38:00 GMT" and "Tue, 23 Jun 2003 15:38:00 GMT". If you're a human, the latter is probably easier to sort, though with a little work. But if you're a computer..? The former wins hands down.

[LeonardoHerrera, RefactorOk] I'm aware that there have been a lot of controversy regarding Date formats. I, personally, prefer the latter (easy to produce) but ISO8601 is easier to parse; also, it's more consistent with the principle of using DublinCore elements ( We should be clear that timezone must be specified, otherwise UTC is assumed.

[MartinAtkins] I'd be wary of making 'clients' handle timezones. Instead, can we maybe have a UTC-based 'post time' which defines the ordering, and then another optional 'event time' which acts as the 'display time' in any timezone (which must be specified). This makes it easy to get the posts in the right order relative to each other. One issue with this is that LiveJournal, which continues to lack proper support for timezones, can only do ordering based on the time an entry is posted, rather than the time the user says an entry relates to, so LJ can't generate a 'display time' featuring a timezone.

See also TimestampVsCreationDateTime

A caution about early design (of modules, particularly)

Some modules have long history of use (conceptually speaking) other modules are great ideas. A key benefit of identifying many modules is to get an idea of how a data model will use and describe those modules. Detailed design can be done later for modules that do not have common or prototype usage now.

Structured Components

[PhilWolff] Taking a look at [WWW]Qlogger, packages of structured data are becoming components of the post. Sub-schemas describe activities (golfing, commuting) and reviews (movies, marijuana). You can see how this creates more comparable data (show me all the movie reviews by warbloggers rated 4 out of 5 stars). It also opens a door for blogs to interop with enterprise applications. This is the path to the ComponentBlog.

CategoryArchitecture, CategoryModel