Well Formed Entry Issues
External Comments:
channels
Channels are outside of the current discussion (which explicitly focuses on entries), but if you want to talk about them, see EntryChannels. To focus on the related issue of internal/external models, see SiteAndSyndication.
language tag
[MaciejCeglowski] I would suggest adding a language attribute (allowing for multiple languages/entry) to the minimal set.
[R. Soderberg] I second this, with the provision that at least one language/content pair exist, and that no unpaired language/content elements exist.
[JamesSnell] I see language as being an attribute of the content, not the WellFormedEntry itself. This would allow a single WellFormedEntry to contain multiple ContentModules, each targeting a different language.
[TimBray] This discussion may be at the wrong level. What we want to establish is that some part of the content will consist of human-readable text which is of necessity in a human language, and that it's either possible or compulsory to assert which language. But watch out: I may want to include little fragments of text in other languages, as small as a single word, so language tagging needs to operate at a micro, not macro level. So it may be a mistake to expose this in the data model at all.
[MaciejCeglowski] Language information is crucial to make the post searchable. You can always create a pathological case where an entry consists of single words in a dozen languages, but in practice you'll see contiguous blocks of same-language text in a multi-lingual blog. And there are already XHTML constructs to handle tagging at the micro level. All we need is a list of languages on the WellFormedEntry, or ContentModule.
[MarkNottingham] It seems like language information would need to be attached to both the content modules and the title, description, etc. nodes.
[JamesSnell] Perhaps a Language ExtensionModule would be helpful to address this. It could be used to extend the WellFormedEntry, ContentModule and various ExtensionModule data models in a consistent way.
Content and/or Description. To require or not to require?
(Originally from language tag discussion)
[MichaelBernstein, RefactorOk] A WellFormedEntry might actually contain no text (not even a title, unless titles are required), making a language attribute unnecessary. Granted, this might not be ideal (accessibility-wise), but it would still be well-formed.
-
[TimothyAppnel] I question the usefulness of an entry with no text(description). What is the purpose of an entry if I don't know what it is without following the permalink?
-
[MarkCidade] Some entries consist solely of a (language-neutral) image.
-
[TimothyAppnel] I understand that, but as a consumer is that useful or "neighborly"? I say it's not. I have a feed with an image. Do I want to download it? Is it a screenshot of an error in IE or a picture of someone's cat? I feel strongly that some type of description or summary needs to be part of the minimal requirements.
-
[MarkCidade] Even if posting an image sans metadata detracts from the usefulness of the post, making other things mandatory will detract from the usability of just posting anything you want at any time. Additional info can be added later when it's more well-thought-of. I think that's better than forcing someone to enter gibberish just to satisfy some minimal constraints. Misleading or nonsensical data (e.g., asdfghjghsk) is worse than no data.
-
[TimothyAppnel] In light of this view then content should not be part of the bare minimum requirements. I shouldn't have to enter gibberish content if all I want to post is a permalink.
-
[MarkCidade] That may sound nonsensical, but that should be allowed if you want to establish a URL for possible future content. I think that at least 1 ContentModule should be specified (even if it's blank) as opposed to allowing for NULL content.
-
[TimothyAppnel] It is inconsistent to say 1 ContentModule is specified even if it's blank after stating that some type of description as a minimal requirement is bad because it forces a user to enter gibberish.
-
[MarkCidade] I think the whole point of a blog entry resides in its content. What language that content has (or if its author chooses to leave it blank or post a captionless image) is beside the point. A blog entry is content, not a PermaLink, TimeStamp or anything else.
-
[MichaelBernstein, RefactorOk] My point regarding language still stands. Even if content is required, it might not contain any text, making a language attribute for the item unnecessary. Only if description is required (presumably description should only be text), does a required language attribute make sense.
-
[TimothyAppnel] I agree that whole point of a log entry is its content which is why the comment that a ContentModule could be blank was puzzling. It would seem that the context of how it is being used is important here and is currently vague in the discussion. (Are we talking about authoring an entry or an entry in a syndication feed?) Looking conceptually at the later content with some type of description (excerpt/title/whatever) is necessary in order to be useful.
-
[MarkNottingham] Well, are we talking about JUST Weblog entries here (which is just a social convention, and I suspect will shift like sand), or something more capable?
-
[JamesSnell] Suggested solution: A WellFormedEntry MUST have at least one ContentModule. That ContentModule MAY be Null, meaning it contains zero actual content. It would be the semantic equivalent to <data xsi:nil="true" />. Language would be redefined as an ExtensionModule that MAY optionally be used to extend a WellFormedEntry or ContentModule.
-
[TimothyAppnel] I think content needs to be better defined in the requirements. Looking at this more pragmatically, like in a syndication feed, dealing with content will be a royal pain if anything and everything could be included -- and potentially with no text or description to make an entry "scannable." I'm against a null ContentModule. I don't see the point of content being a requirement then.
multiple content modules
Moved to MultipleContentDiscussion, 19June
how many authors
Moved to NumberOfAuthorsDiscussion, 17June
semantics
See also PermaLinks and Authors
[TimothyAppnel] What is the unique location pointing to? What is embedded as WellFormedContent? Seems a description (excerpt and/or title or other) is needed.
[JoeGregorio] Yes, let's clarify that the permalink points to the web location of the entry, and not a pointer to the thing the entry is talking about. That is, if in your entry you are commenting on a story in the NYTimes, the permalink should point to your blog entry on the web, and not the NYTimes article.
[ShelleyPowers] There's been discussion offline about the definition behind these items. For instance, what is a 'permalink', what is an 'author'. I think it's important that we define these, and effort has started on this(here, here, and here). For instance, if one considers 'author' to be a person, this has major impact on implementation, later, if author can also be multiple people, or an organization or group.
(comments regarding wiki page vs entry moved to WikiPageAsEntry)
Date and Time
[PaulMorriss] Currently a lot of blogs don't have a timezone on the time when a post was made. I assume it is local time, though for server based blogging software a lazy programmer could just put the local time for where the server is rather than the user. What would be the most useful is local time and UTC time. Why? Because local time will tell you something more about the post - someone posting at 3am local time isn't asleep when they normally might be. Universal time will tell you just how current a post is - it might have been posted a minute ago.
[BillKearney] A timestamp should indicate it's offset from UTC, as per the ISO8601 spec. If it doesn't indicate a timezone then the only reasonable conclusion is that it's a UTC timestamp. Otherwise what's local? Better to encourage proper use of timestamps instead of playing guessing games. A timestamp is not a substitute for geopgraphic data, but lots of people make this incorrect assumption.
[KenCoar, RefactorOk] Date parsing is one of the banes of existence when it comes to multi-vendor distributed c/s models and interoperability. I'd like to see a new specification try to get it right; I'm very strongly in favour of a lexically-sortable UTC-based date/time format such as ISO 8601; being able to sort entries chronologically as strings, without requiring parsing into a binary format, is a big win. Compare "2003-06-23T07:38:00-04:00" and "2003-06-23T19:38:00+02:00" versus "Tue, 23 Jun 2003 11:38:00 GMT" and "Tue, 23 Jun 2003 15:38:00 GMT". If you're a human, the latter is probably easier to sort, though with a little work. But if you're a computer..? The former wins hands down.
[LeonardoHerrera, RefactorOk] I'm aware that there have been a lot of controversy regarding Date formats. I, personally, prefer the latter (easy to produce) but ISO8601 is easier to parse; also, it's more consistent with the principle of using DublinCore elements (http://www.w3.org/TR/NOTE-datetime). We should be clear that timezone must be specified, otherwise UTC is assumed.
[MartinAtkins] I'd be wary of making 'clients' handle timezones. Instead, can we maybe have a UTC-based 'post time' which defines the ordering, and then another optional 'event time' which acts as the 'display time' in any timezone (which must be specified). This makes it easy to get the posts in the right order relative to each other. One issue with this is that LiveJournal, which continues to lack proper support for timezones, can only do ordering based on the time an entry is posted, rather than the time the user says an entry relates to, so LJ can't generate a 'display time' featuring a timezone.
See also TimestampVsCreationDateTime
A caution about early design (of modules, particularly)
Some modules have long history of use (conceptually speaking) other modules are great ideas. A key benefit of identifying many modules is to get an idea of how a data model will use and describe those modules. Detailed design can be done later for modules that do not have common or prototype usage now.
Structured Components
[PhilWolff] Taking a look at Qlogger, packages of structured data are becoming components of the post. Sub-schemas describe activities (golfing, commuting) and reviews (movies, marijuana). You can see how this creates more comparable data (show me all the movie reviews by warbloggers rated 4 out of 5 stars). It also opens a door for blogs to interop with enterprise applications. This is the path to the ComponentBlog.