[Referring to the refactoring that produced the first draft of the version now in content.]

[AsbjornUlsberg, RefactorOk] This actually looks neat. I have to digest the examples for a while, but at first glance it looks quite complete and thought through. The use of "src" versus "href" in the <content> element may have some benefits as well. Great work.

Content modules poll and discussion

Place your name under the option you like best: [OpenPoll]

'Null' means present but empty, see NullValues.

In a syndicated feed (principle use of RSS), what goes into a required, non-empty/null "content" for sites that don't distribute content? BiblioGraphy seems to already cover title and summary. RSS' vagueness about <description> goes away: either it's echo:summary (an abstract or excerpt) or it's echo:content. content:encoded and xhtml:body are echo:content. In each of these cases, echo:content appears to be optional/empty/nullable in the case of a syndicated feed that does not syndicate content.

See ContentDiscussion, ContentProblems, [WWW]Fatal Flaw, [WWW]Making encoding explicit, [WWW]Meta Content Format, MimeContent, EscapedHtmlDiscussion, [WWW]Escaped HTML discussion, EchoExample, ComponentBlog, NullValues, MultipleContentDiscussion, and SiteAndSyndication.

see AlternativeRepresentation: locations of alternative representations of the entry

[TimBray RefactorOk] I see no good reason for having multiple <content> children of <entry>. We need a compelling use-case or something very useful that you can prove you can't do before we impose this additional level of complexity on software authors.

Vote for the splitting of content into two here - One "content encoded" module that can be null and one "content by reference" module that can be null:

[TimothyAppnel] In the early days of this wiki the point was made that an entry is nothing without content. Therefore I am puzzled by the notion of a required content element that can be null. It seems to defeat the purpose and is a contradiction of terms, does it not? I am also still confused by what is content really and belief may be at the root of endorsement being "off." I believe a description and title is both helpful and necessary and should be metadata. Its been classed as an extension module that is (is not?) content. Fair enough. Why would I make an entry that is author, permalink and data with nothing else? How helppful is that? Can some clarify what I see as a contradiction of terms and unclear defintion? A use case for null content module perhaps?

[MartinAtkins] It'd be nice to be able to round-trip from LiveJournal to necho and back again with no loss of data, which means we need HTML subjects (or 'titles' as necho calls them). This implies the need for a type on the subject, although perhaps we can define a reasonable default for the sake of avoiding output bloat? (Reasoning is so that all LiveJournal-based sites which do syndication to be able to syndicate their entries between each other, thus creating the illusion that the users are present across sites.)

Optional Content

[KenMacLeod] The use-case for optional (and possibly a required but allowed to be empty) content is that an entry may consist entirely of its metadata and no "body".

Extensions go in Content

[KenMacLeod] The alternate case is that much of what we're calling "metadata" should be declared as "content" and placed in the content element, much like SOAP envelopes do with document message bodies. If so, that will change the model of content and require, possibly, some other way of indicating content type, including determining the intent of the "content body" solely by its namespace within a content type of "application/xml" or the use of [WWW]URI media types.

Possibly Stupid Question

[JonathanSmith] Do any of the above choices allow a web site to distinquish between high bandwidth and low bandwidth, and, if so, which is the best choice?

External Content Length

Should content have an advisory length attribute for referenced content? [OpenPoll]

What is the need for 'length'? can't that be determined in almost all cases by querying the resource (not usually even necessarily retrieving the resource)? Of type, language, encoding, and length of a referenced resource, isn't the 'length' the most likely attribute to be incorrect over time, making it just a guess? By comparison, a good argument can be made against allowing type, language, encoding, and then length too, when using src.

[TimBray] We are not here to invent cool new stuff that might be useful.

[HaroldGilchrist, RefactorOk] If this data is in the feed, a determination based on file size could be used by remote process to determine if file is to be downloaded. Also, most news readers today use only the information in the feed for viewing by the user (of course this could change with the addition of referenced content). The information would be used by the user to assist in determining if they download the file. I guess we could probe the file (I don't know how reliable this is) with another call but does the size of the file cost us that much (since we already probably have it) to have it in the feed? This attribute could be optional.

[SteveKirks, RefactorOk] Agree with Harold above. Handheld devices, especially cell phones could make determinations on what content to download based on user prefs. Use of the file size ""length"" above would permit this.

[LeonardoHerrera, RefactorOk] I support this stuff. I don't see many applications using it, but it can be useful in the handheld examples mentioned above. Not a big deal, it's pretty ignorable. My only observations are a) make it optional, and b) clearly state that this attribute is an approximation of the actual file size, not a definitive value to rely on. This way, handheld apps still can use this, and we avoid any security/reliability risks. (Here's a somewhat related thought: what about CRC?)

[JamesAylett, RefactorOk] This is metadata about the representation of the linked content. Given that this representation may change completely independently of the referencing document (the Atom feed in this case), putting it in the referencing document is dangerous. It's like putting an advisory type attribute on a link to a URI you don't control; what is the user agent supposed to do when it completely mismatches? HTTP has HEAD to allow a user agent to get the metadata if needed, and other protocol's lack of similar support is the problem for the protocol to solve, not us. (Which is pretty much what TimBray said above, and what the opening paragraph of this section says. But maybe it's clearer, I don't know.)

[MikeWarot, RefactorOK] While it's nice to have length, I don't think length alone is sufficient. I believe you need to have all of these for describing external content:

Other nice to have information:

MIME and URI media types

Would the support of [WWW]Mapping between URIs and Internet Media Types make it easier to define content types in support of hierarchical relationships, internal "plain text markup" schemes, or other extensible content types?

Multiple top-level content elements

[DonPark DeleteOk RefactorOk] Isn't order of appearance enough of a hint? BTW, +1 on Ken's proposal to add 'encoding' attribute to 'content' element. For maximum flexibility, we could introduce multi-stage transform like XML-DSig but that is an overkill for ((Echo)). This is weird, I wrote this in response to questions about how to figure out author preferred content type among multiple content types in the feed. On another page, my entries got deleted outright. Zeesh.

[HaroldGilchrist, RefactorOk] Do we need an optional "primary" content attribute for content? With the great possibility of having more than one referenced content type per entry, the primary content attribute could designate the content that is the central content to the entry message. Example: One thumbnail image, one larger image file of the same subject. If we designated the thumbnail as the primary content, the viewer could display the thumbnail and include links for the other larger image and any other referenced content.

[KenMacLeod, RefactorOk] My preference is for one content item only, which may contain a content type of multipart/alternative. I'm not aware of any precedence rules or preference parameters. Towards the end of the definition above, it states that multiple content elements within an entry are to be treated as multipart/alternative.

[HaroldGilchrist, RefactorOk] "My preference is for one content item only". I see that in the open poll. What is you argument here against multiple content?

If the situation is that type multipart/alternative and multipart/mixed will be a requirement of ECHO 1.0 then the ECHO feed software vendors will all support its use with their best effort. But if it is left optional in ECHO 1.0 and treated just like other optional content mime types (which in a pure media type sense it isn't) its best effort support is questionable. If left optional is the final decision, I would favor allowing repeatable content to address my stated concerns.

[SteveKirks, RefactorOK] Ken and Harold, with regard to multiple content items, I give handheld devices like cell phones as the example. The reader on the phone could intelligently determine which image to download based on the feed's content

[HaroldGilchrist, RefactorOk] I guess we could use "order of appearance" for precedence rules.

[KenMacLeod] Checking the specs re. multipart/alternative, [WWW]RFC2046 says:

[HaroldGilchrist, RefactorOk] This would seem to suggest (even though Freed and Borenstein probably were thinking variation meant different text types, not multi-media), if I have audio content and text content in (just using different medium) content of type multipart/alternative with the audio appearing last, the recipient system should understand that the entry prefers to be offered as audio.

If we use "multipart/mixed" for this example, the rule on "faithfulness to the original content" goes away and the order of appearance is still inportant but could have a different meaning defined elsewhere and not by the spec.

issue: dangers of html

HTML is often viewed as a form of content, but in reality it mixes content aspects with presentation aspects and perhaps even a bit of running code. This can pose a problem unless the recipient is very careful to filter out the undesirable bits. Such filtering poses a number of pragmatic implementation issues given the loose syntax rules for HTML and inconsistent implementation. Ensuring that such content is well formed (with characters properly escaped, tags perfectly nested and closed) eases these implementation issues.

Still, HTML is by far the most popular format for entries with most being written in it and nearly all being displayed in it. And much of the HTML that people write is not well formed. For instance, it is very common to find a naked & in URLs. It's the CGI standard, but to be correct HTML it is supposed to be escaped.

Another danger of HTML, in any form, is entities. Things like &nbsp; need to be declared otherwise you end up with non-well-formed XML. So the choices seem to be either supply a DTD, restrict the HTML used so that it doesn't contain any entities beside the base ones given in XML, or stuff it in a CDATA section.

[SamRuby, RefactorOk] I do just fine with the &#dddd; syntax. See [WWW]clean.

[JoeGregorio, RefactorOk] Raises a good question, is HTML without the entities really HTML? I did a little digging and was suprised to learn how many entities are defined for HTML.

[BillHumphries, RefactorOk] This becomes a headache quickly. It'd seem that the format would want to avoid Namespaces and [EntityDefinitions] so that it could be parsed by non-validating, non-namespace aware parsers -- of which, everyone's bound to have one lying around. However, XHTML seems to be the right format for the text media type, as I'd think any subset would be restrictive.

[MikeDavies, RefactorOk] Is restrictive a problem? (Keeping in mind the aim of a minimum specification - using the full HTML syntax could be optional, but mandate at least to provide a simplified set of elements and entities if the html media-type for content is used).

[JonathanSmith, RefactorOk DeleteOk] More discussion off wiki about entities. DonPark writes:

[AsbjornUlsberg] Can't we reach a consesus on this where numeric character entities (&#nnnn;) is preferred (default), and named character entities are legal, but not recommended? A DTD should be provided in the latter case, but won't be needed in the first. I think not having to use a DTD gives people a reason to use numeric entities over named ones.

[JamesAylett RefactorOk] We need to be very clear on the implications (please forgive and correct the following if my terminology isn't quite right). In order to allow named entity references, you must declare them, either in an internal DTD subset or by an external DTD reference. At least the latter requires a validating parser, and the former is quite a burden on the content producer. If you validate, throw away any hope of using extension elements from different namespaces, which in my mind negates a lot of the potential of having this one format to rule them all in the first place. So I agree with Asbjorn: define a DTD in the spec so people can use named entity references if necessary (eg it may make converting legacy feeds easier, where funky extensions are less important anyway), but the norm should really be to use numeric entity references (or a different character encoding), and certainly all consumers need to be able to process this. (NewsMonster falls foul of MovableType and anything else that builds XML by templates and so can end up using undeclared named entities in its RDF feeds; having numeric entity references as the norm should help the ViewSourceClan behave themselves in future.)

[AsbjornUlsberg] Good summarization, James. +1 Should we create a NumericVsNamedEntities page to poll this and reach consensus?

corollary: true content vs. template

The "true content" of an entry is usually the part that gets flushed through a template to appear at the end of a PermaLink. Internally, this is represented in tool-specific syntax. Externally, it is represented in an exchange format (often HTML).

It may be desirable for an external model to be able to link to an externalized representation, ie. without having to either embed it or find it inside of the template at the end of a permalink.

[JoeGregorio, RefactorOk] This format will be used not just for syndication but for publishing also, so it is important to allow full content.

issue: hierarchal relationship between content items

[ShelleyPowers] A weblog entry can also be a parent to other entities, each of which can also contain links, audio, video, etc. which can also be parents to other entites, and so on. See Related. [MarcCanter] This might be where some link up with the [WWW]ThreadsML [WWW]effort can happen.

[DannyAyers] But it's not necessarily hierachical, e.g. a single post can summarise several threads - many parents, one child. Using a tree model would prevent a range of dialogue approaches, e.g. thesis, antithesis -> synthesis. Needs to be a digraph, IMHO, and mainly for this reason (it would complicate Necho syntax) I reckon such relationships shouldn't be in the core.

See content, ConceptualModel, and ContentAndPermalink.


CategoryMetadata, CategoryModel