There are a lot of experimental NEcho 0.1 feeds out there of various
quality. I've seen some that aren't well formed XML (no, I'm
not providing any names). I've seen others that are
excellent. I'm going to take a look at two here that are high
profile, and implemented across a large number of blogs, and
therefore likely to be emulated by others. As others come
online, I'll try to do likewise.
Typepad's
feeds
validate, and are tracking to the standard as it evolves.
The feed is clean and simple. Title, subtitle, summary, and
content are well formed and inline. You can clearly see that
the original post (the one at the bottom) was modified from its
original form.
If you have a feed parser or format driver that you are
experimenting with, you should definitely try it against this
feed.
Blogger's feeds are clearly marked as
temporary prototypes. They should be considered
experimental. And for that purpose, they don't
disappoint. They contain a number of elements and attributes
that are currently marked as blogger extensions. A number of
these (e.g., generator) should be made common.
More interesting is the one issue that causes the feed to fail
to
validate. Looking closer at the feed, you can see that
summaries are not text/plain, but text/html. In most cases,
it is inline, but in one case it is escaped. By looking at
these side by side, you can see the differences.
Apparently, there are requirements for html in titles. At
a minimum, people argue for the ability enable the use of bold and
italics. Others significantly
overdo it,
IMHO.
This leads me to think that the right answer is to define all of
the content related items (title, subtitle, summary, content) the
same way: with a default of text/plain and with the single level of
escaping required by XML. Those that wish to use other types
or an additional level of escaping simply are required to note this
with an attribute.
I say this knowing that the discussion that Tim Bray
captured so well nearly three weeks ago is still
ongoing. Apparently there are multiple use cases for how
feeds are produced. And multiple use cases for how feeds are
consumed. And these give inconsistent guidance on what is
ideal.
What I will say is that given my experience with the
validator, I find
that people don't read specs carefully, if at all. More
often, they emulate what they see. They follow
examples. And when they see mostly escaped content, they
emulate poorly. If you want to see what I mean, ask somebody
to create a title of "Ben & Jerry's". Then tell them that
you want "Ben" in italics and "Jerry's" in bold.
Having a validator and working with people one on one to fix
their feeds certainly helps, but frankly is an uphill battle.
Particularly when people note that their feed "works OK in
aggregator X" - not the definition of interop that I for one
particularly aspire too.
So, while I'm sensitive to the notion that consumers would have
a few less lines of code to write if there were only one way, I
feel that we should face reality. Pick a default that matches
what most people are likely to do by hand, and a define an explicit
marker for what a number of programs will generate.
IMHO.
You are not by any chance talking about gasp funky and non-funky feeds? ;-)
Sam,
I'm confused as to why attribute that describes the kind of content uses MIME types. If I had never seen an Atom feed before and saw one with MIME types I'd assume that I could place more than just text/plain, text/xml or text/html in there.
I wouldn't be surprised if folks assumed that they could place any of the MIME types at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types as feed content.
Is this intentional. Is this a way to make the Atom format failry robust and extensible? E.g. support for FotoBlogs, AudioBlogs, VideoBlogs, etc naturally falls out of this.
Sam,
Besides images I also was thinking of MIME types like application/msword, text/rtf or audio/mpeg. Anyway, you haven't really answered the question. Is using MIME types intentional and something that is likely to be a part of Atom? If so what will be the guidelines about including non-textual content or non-Web friendly textual formats?
Re. guidelines for use, I'd say they are the same as for the Web in general. Non-Web-friendly formats are fine for intranets where all readers have the same software on their desktops, but generally unworkable for public sites.
Sam,
Cool. What I was asking is whether MIME types like application/msword, text/rtf or audio/mpeg will be allowed in Atom feeds and if they are expected to be supported with the Atom API.
If so can you show me what a sample feed with the above MIME types as the content of entries would look like as well as a sample post using Joe Gregorio's current API draft?
Quote: "Cool. What I was asking is whether MIME types like application/msword, text/rtf or audio/mpeg will be allowed in Atom feeds and if they are expected to be supported with the Atom API." Dare Obasanjo Looks like an interesting conversation on...
Feedback on feeds. A look at two high profile and widely deployed necho/atom feeds: Blogger and TypePad, and some thoughts on the implications of escaping and mime types. ... [Sam Ruby]...
I think I share a little bit of Dare's confusion, but I'm not sure where the actual discussion is going on anymore.
In the EscapedHtmlDiscussion the proposal was to use an "incredibly short" list of "modes" to singal content type. That seems to be very different then allowing the use of mime types which do not, by any stretch of the imganination represent a short list. (actually looking back at the page, that was someone addendum to TBray's proposal, but I think it was a good one)
Also, and maybe this is a dumb question, when HTML is being used inline, like in the above Typepad examples, what namespace are all those div's and p's in?
I'd argue that you should be able to pass along anything. There should be sufficient wrapper of the thing so:
the Atom Consumer can discern if this kind of thing is understood by the Consumer,
the Consumer is pointed to human readable documentation about the thing's type, the better to guide users what to do with the package (Do I want to open this? Where can I get a viewer?).
the Consumer can look up the XML semantics, if present; the better for the Consumer to learn a new well formed structure.
a Consumer, if so inclined and with user permission, can retrieve the default rendering html template and style sheet for this kind of thing, and anything else that might help with editing, presentation, or storage.
a Consumer can look up something that might tell how to authenticate that this thing really matches the claimed mime/type.
a Consumer can inspect the version, build, release or mod date of this structure/thing's definition. The better to check for updates (do I need a new version of the newsML renderer?)
mime type is insufficient to meet these needs, don't you think? Will someone please define a little wrapper for things?
kellan, "mode" was switched from "encoding" to differentiate it from XML's "encoding" declaration (a character set). It should be read as "mode of encoding". Its short list is "none" (element content is XML character data possibly with parsed XML markup, some are proposing this be called "xml"), "escaped" (element content escapes markup, either using entities or CDATA marked sections), or "base64" (used for binary content). Given those available encodings, the content can then be any MIME content type.
Phil, is type="application/xml" sufficient to tell the consumer to look at the namespace of the content to determine how to process it? If not, one solution is to allow 'type' to either be a MIME type or a URI media type (that proposal has been edited out of content but can be resurfaced). Another solution is to wait until post-1.0 and introduce a <component-content> extension that defines its processing rules.
Feedback on feeds Feedback on feeds. A look at two high profile and widely deployed necho/atom feeds: Blogger and TypePad, and some thoughts on the implications of escaping and mime types. ... [Sam Ruby]...
For mandatory parts of the spec (am sorry, I have not read through it, the Wiki is confusing me), I think that allowing arbitrary mime-types is a bad idea.
The problem is that it means bringing a whole world of new things into the spec, and requires that a compliant implementation of the aggregator can cope with that information.
Dare points it out too well by going to the extreme: encoding something as application/msword, it is only going to make things harder for people buildings apps with it.
This is what is so complex about building a complete mail system: dealing with these implicit referenced standards, specially when they are mandatory.
We have seen what happens when things are just liberally left underspecified, because "another spec takes care of it" with XML-RPC `Ascii'. Well, different people interpreted that in different ways, and that opened the gates to ambiguity.
The same applies here: if you accept mime-types, then every application has to fully understand these.
I would personally feel better, if a given subset is explicitly required, and others are permitted, but not required.