It’s just data

Blog Browser Format

Phil Ringnalda. Even though I can hear Sam muttering digital magpie in my ear...  Phil, you say this like it is a bad thing. :-)   I believe that you and I have common tendencies when it comes to exploration, but when it comes to choices, I find that I have a tendency to pick the dull and boring ones.

As to the topic of blog browsers, I do have a number of thoughts.  One set of thoughts is that the data being captured is not merely hierarchical, it is actually  hierarchical faceted metadata.  But mostly my thoughts are to the dull and boring topics of the file format itself.

For starters, my site is generated dynamically.  This means that you can see any blog entry, day, month, or year in any of several formats.  Here's August in rss2June 11th in txtEntries containg "Ringnalda" in esf.  I could also slice by categories if I were to use that particular feature.  You get the idea.  So, for starters, I'd like some name other than simply ".xml" for the files.xml format... then I could enable it everywhere.

Now as to the file format itself, it appears tailored to blogging applications that statically render their content.  What are the created and modified dates for each of the dynamically renderable slices I identified above?  Should I calculate number of bytes in each in anticipation that it might need to be generated?

It is also not clear how one extends this format.  If you look at my archives page, you can see that I have readily available a count of the number of entries.  Might this be useful?

Unfortunately, I can see how this dicussion will play out.  Somebody will say that  "files.xml is not a brilliant format. It is a compromise. It is for blog browsers. That's all it is for, for the 18,000th time."   Then three months later will say that it is the perfect format for some other application that none of us have thought of yet.  And nobody will be clear as to what applications are out there using this format, let alone know what the impact will be of any change.

We've played this game before.  Why not learn from the past?

All I am saying is: give this format a name.  And a namespace.  And specify from the beginning how (or even if) it can be extended.


Okay, I'll bite. What's wrong with files.xml, enumerate the problems, and let's address them. It's still quite early, too early to punt on perfection.

Posted by Dave Winer at

I thought I said it.

1) Give the format a name. I'd like to use that name as the extension for my dynamically generated content.

2) Put the elements into a namespace. It really isn't that hard to do now, and would be much harder to retrofit later.

3) Describe how (or if) the format can be extended. Perhaps this could even be a simple cut and paste from the RSS documentation.

Posted by Sam Ruby at

I'm confused,
Why should anyone create or use a special app for browsing RSS feeds instead of using a regular browser over RSS or whatever converted to HTML via XSLT.
The concept of blog browsers seems to completely miss the point of using XML in the first place.
The main issue I see with using XSLT is that it can't convert one XML document to multiple HTML documents (i.e. frames) which may or may not be an issue depending on how much functionality you want out of the blog browser UI.

Posted by Dare Obasanjo at

Sam, got it.

Don't you have any comments on the format itself?

Do you like or dislike the flat structure?

What about the attributes? Too many? Too few?

Posted by Dave Winer at

Dare, one of the reasons I'm interested in blog browsers outside the web browser is that there is no monopoly there, and no bigco owning the space to hold back progress.

Posted by Dave Winer at

Doesn't much matter to me whether you do it XSLT or C#, as long as you build me something that will actually preserve and use the metadata in RSS: if you can make a blog browser in XSLT that will search and filter by date, author, and category, pulling posts from multiple files, and then let a metaWeblog API client pull out posts keeping track of the feed title, the item title, and the item link, I'm all for it.

Posted by Phil Ringnalda at

Dare: just to let you know, I parse your comments as "why doesn't everybody use my favorite programming language and the web".

As to the second half, let me draw an anology... I like to use NNTP and Mozilla for access to high volume mailing lists. For RSS, some people like NetNewsWire.

Posted by Sam Ruby at

Dave, comments on the format itself:

1) The file and files elements should be in a namespace other than "". (I may have said this one before. ;-))

2) Flat is cool with me, but if you want my preference, I would prefer if the attributes other than path were made child elements.

3) None of the elements other than path are particularly easy to dynamically generate from within blosxom, so I would prefer if they were explicitly made optional.

4) Specify the rules for people to define new child elements. (Again, I may have said this one before ;-))

Posted by Sam Ruby at

Dave,
I don't see how creating a web interface to accessing weblog content puts you in the thrall of BigCos any more than the fact that http://www.scripting.com is available on the World Wide Web via HTTP does this already (if it does this at all).

Posted by Dare Obasanjo at

Yes, www.scripting.com is very limited by the whims of the monopolist browser vendor. That was what the whole thing about Smart Tags was about. They wanted to edit my content so that random words point to pages on their site. Now there's no doubt in my mind that that would be horribly illegal in the US, but I'd rather not have to go to court to protect free speech, so I want to, whenever possible, route around them. I see it as free speech insurance.

Posted by Dave Winer at

Sam,
This is the same thing I'd tell any of our customers who asks me

"We have an XML storage format for some important information and would like to provide a rich interface for navigating and viewing this data do we

1.) Use standardized technologies like HTML which can be viewed in browsers as diverse as Internet Explorer, Mozilla, Konquerer, Lynx, Opera, OmniWeb, etc as the presentation format and then write a tool to transform our format to this rich HTML view another standardized technology (XSLT) which is available on multiple platforms and programming languages?

or

2.) Should we build a custom tool in C++/Java/C#/etc that mimics a newsreader/web browser to navigate and view our custom XML structure?

I can't understand why anyone would pick (2) over (1).

Sam, couple of questions.

* Do you think that there would be more people able to view RSS as web pages than there would be those who would have a custom RSS browser?

* Do you think it is more likely that there would be a proliferation of different stylesheets to view RSS content or different custom RSS browsers?

Posted by Dare Obasanjo at

Dare, do you care to tell me then why the world needs Office 11? :-P

The format that Dave is advocating is describable via XSD. If he chooses to describe it that way, great. If not and I am so inclined, at some point I may do so (unlikely, but possible).

What I would like to know, in the meantime, is whether this is an open content model or a closed format.

Meanwhile, I very much appreciate the fact that you defined an RSS feed for your K5 diary.

Posted by Sam Ruby at

I'll concur with Sam's comments, especially the one about making "path" the only required attribute of "file". Requiring anything else locks out systems that, for whatever reason, can't generate "created", "modified", or "size" attributes.

As to whether those other attributes should be child elements instead, I'm not sure it matters, but I think it's important to allow for other child elements via namespaces so that non-Radio weblogs can store their own specific info.

Posted by Mark Gardner at

I think "modified" is a minimal requirement. How else could a restorer tell if something needs to be restored from a backup. Recall that's the purpose of this format in the first place.

Posted by Dave Winer at

Why does the world need Office 11?

The world doesn't...unless you consider Microsoft the world (which they seem to some days).

But Microsoft does need Office 11, which in turn needs WinXP or Win2K SP3...which means you have to accept a EULA that allows M$ to muck with your machine (think DRM) any time they feel it necessary.

Seems to be a theme here... ;-)

Posted by Andrzej Taramina at

I have the feeling I'm blissfully wandering into a minefield, but: does the need for a modified attribute for backup mean that the format needs to require it, or only that the application needs to?

I'd be perfectly happy with a spec that said something like "a file element is required to have a path attribute, has the following optional elements which may be required by a particular application, and any other attributes must be namespaced," but that reminds me of a lot of things that seem reasonable to me until I'm told that I'm not just stupid, but evil.

Posted by Phil Ringnalda at

Phil at the risk of being labeled both stupid and evil myself, what you say makes sense. Now I know how to approach the writing part of this little project, which Murphy-willing, I will undertake tomorrow morning. All the standard disclaimers apply.

Posted by Dave Winer at

When I saw files.xml the first time, the first thing I thought was: why not use the directory.opml format??

Posted by Sjoerd Visscher at

Sjoerd, good question.

That was my first thought too.

But then I asked why use directory.opml?

Do I really need to browse that file in an outliner? Maybe so.

What are the advantages of using directory.opml?

It's nice, it's not flat, it's hierarchic. I already have code to read and write, but then I also already have code for files.xml.

If I didn't have code, files.xml would be easier because it's flat not recursive.

So it's slightly more accessible.

We should really ask Brent and Phillip.

Posted by Dave Winer at

The files are stored hierarchic. So it's actually easier if the index then is hierarchic too. A change in directory is made explicit by an element.

Posted by Sjoerd Visscher at

Someone tell me I've missed the point. I've written one of these blog browser thingies and came across a problem due to the entirely flat nature of files.xml - not all <file> elements are the same (or equal), some point to rss descriptions of web logs for a month, some point to stories, some preferences. I can only determine what's being pointed to by looking at the name given in path - and not everyone is using the same convention. Wouldn't it be better to have at least one level of heirarchy (<rssfiles> <storyfiles> <preferences>). Using <rssfiles> would allow any convention the author liked, as a browser all I want to know is where to get the file. An associated issue is year/month, this only comes from a path naming convention - its OK, but it should be explicit in whatever docs there might be.

Posted by Pete Cole at

Dave wrote: "Recall that's the purpose of this format in the first place," but then agreed with Phil that just because apps may require a certain element or attribute doesn't mean the format should require it.

I'm glad, 'coz it would've been embarrassing to have to point out to Dave that the original purpose of RSS was simple syndication, and now he's using it for weblog storage and backup. :-)

Posted by Mark Gardner at

If we don't overload path with the file type (it's simple and compact, but a bit limiting), is there any convenient way to make it short, sweet, and extensible? Best I can come up with is type="rss|index" in the core, and radioWeblog:type="pref|whatever|other" for anything else app-specific.

Posted by Phil Ringnalda at

Using a "type" attribute has nice precedent in HTML, but usually implies a MIME type. So your RSS files should be something like type="application/rss+xml", and the index/archive file should be something like type="application/rsslist+xml". (How'd that be for a name for this file format? "RSS List"?)

This has the advantage of having a pre-existing set of types for a lot of files (images, text, OPML, etc.), and parsers can choose to download listed files depending on if they think they can handle the MIME type.

We'd need a MIME type for all those .fttb files in DHRB's RSS list, though.

Posted by Mark Gardner at

Dave: We should really ask Brent and Phillip.

Personally, it doesn't worry me. I already have code (from a previous project) to read directory.opml files, so supporting that would be no big deal. As long as all tools that generate this sort of backup work the same way, it doesn't matter to me.

Pete has a point; currently the type of each file is defined by its path. In BlogGazer, I run through files.xml and pull out the year and month from everything matching the regex backups/posts/(\d+)/(\d+)\.xml. That means anyone not using exactly that format won't show up properly.

Either we need to:

1. Make the paths standard, so it's invalid to produce monthly backup files that aren't in the backups/posts/(year)/(month).xml format

or:

2. Add in types and extra notes, so it's obvious where the posts are

or:

3. Completely change the format to be more descriptive

for example:

&lt;backups&gt;
&lt;posts&gt;
&lt;year number=&quot;2000&quot;&gt;
&lt;month number=&quot;1&quot; path=&quot;backups/posts/1.xml&quot;/&gt;
&lt;/year&gt;
&lt;/posts&gt;
&lt;/backups&gt;

I'd be happy with any of the above options, although I'd have to say that #1 is the simplest, which probably makes it the best choice for the moment. Ideas?

Posted by Phillip Pearson at

Woah, what happened to my XML?

Here it is with angle-brackets changed to ordinary ones:

(backups)
(posts)
(year number="2000")
(month number="1" path="backups/posts/1.xml"/)
(/year)
(/posts)
(/backups)

Posted by Phillip Pearson at

Phillip's option

1. Make the paths standard, so it's invalid to produce monthly backup files that aren't in the backups/posts/(year)/(month).xml format

is OK, though there is an archive out there that is using "archives" as the directory. The use of an extra level or two in the xml would ensure that if anyone decides upon some other place to store files it wouldn't break any blog browser or Blog application that was attempting to import data from another system.

Posted by Pete Cole at

It's currently impossible for Blogger users to generate files using the "backups/posts/(year)/(month).xml" file hierarchy unless they write a program to do it after the fact. So forgetting about blog browsers for the moment, that's raising a barrier between moving backups into and out of Blogger.

What's wrong with using Dublin Core "title" child elements of the "file" element, and if you must use a regex to extract outline data, parsing that? (Here's an example.)

Posted by Mark Gardner at

PS: I know I implied earlier that it's important to cater to other applications of this format, even if those applications are outside the format's original purpose. The point is to enable those applications without *breaking* the original purpose. And I think it's breakage to prevent Blogger users from backing up their content in a form other weblogging systems can use.

Posted by Mark Gardner at

I agree with Dare Obasanjo "I'm confused, Why should anyone create or use a special app for browsing RSS feeds instead of using a regular browser over RSS or whatever converted to HTML via XSLT."

I think the concept should be a more dynamic use like a blog / instant messenger merge.

Posted by Blog Browser at

Interesting comment.

Given that most of the second-generation blog comment spams I've seen involved quoting a previous comment with a comment that sort of vaguely seemed in context without actually having any meaning, then linking to something else with search keywords for a name, if I were a spam comment filter I would assign your comment a score of around 85%. On investigating, the fact that the site you link to offers absolutely no information, but links to a number of "partners" in totally unrelated fields would kick me up to 99%, and the fact that Google puts this entry above yours for the words "blog browser" would put me at 100%.

Posted by Phil Ringnalda at

209.11.42.194

Stephen Galluccio -/- Built2.com -/- dun&bradstreet id: 945406064
128 Marine Avenue, Suite 2A
New York City, NY 11209
US

Domain Name: BLOGBROWSER.COM

Administrative Contact-
Stephen Galluccio: stephen@built2.com
Built 2
128 Marine Avenue 2A
New York City, N.Y. 11209
US
Phone- 718 836 3390
Fax-
Technical Contact-
Stephen Galluccio: built2@acedsl.com
Built 2 labs
9229 Shore Road 1C
New York City, NY 11209
US
Phone-
Fax-

Record update date: 2002-11-20 08:18:33
Record create date: 2002-11-20
Record expires on: 2003-11-20
Database last updated on: 2002-12-14 23:23:45 EST

Posted by Sam Ruby at

Phil,
"vaguely seemed in context without actually having any meaning"

I stand by my comment, it seems that you want to create useless products that can be done by a simple html page.

I work for a trading firm and speed of information is valued and think a blog / instant messenger merge would be useful. It would also introduce blogCircles.


It could be for business:
Project teams / biz units that publish blogs that would all feed into the blog browser so other units can provide instant comments on development. In the case of traders, they often work in sub groups to trick the markets or risk mgmt. and currently use 2 programs (news feeds + IM). Anytime tech developers can cut down the "swivel chair effect", it is always of interest to the group.

It could be for consumers:
Kids would set up blogCircles, groups of bloggers or blog topics to watch. The key is that they can respond to the originating blog or break off into individual conversations at any time but all have access to the original topic.


I am working in project/interface builder on a beta that will be up by the end of the week.

Cheers and Merry Christmas.

Posted by Blog Browser at

For what it's worth (getting back to the discussion of formats), I've modified my files list to put the document titles in an attribute instead of a sub-element. Does this help the current batch of proto-apps?

I get the feeling there's not a lot of heat surrounding this. And it's hard to figure out how to continue the discussion with all interested parties when all the talk is scattered across several weblogs' comment threads. Would anyone be up for a mailing list so we can zero in on a format?

Posted by Mark Gardner at

I've set up a blogbrowser Yahoo group -- care to coalesce the discussion there?

Posted by Mark Gardner at

Comment spammers win one

The comment spammers have at least partly won. If they can change your behavior and your attitudes, they've won, or......

Excerpt from phil ringnalda dot com: Comment spammers win one: Comments at

Sam Ruby

Some info on Mr Blog Browser....

Excerpt from phil ringnalda dot com: Comment spammers win one: Comments at

Phil Ringnalda. Phil, you say this like it is a bad thing. I believe that you and I have common tendencies when it comes to exploration, but when it comes to choices, I find that I have a tendency to pick the dull and boring ones. As to the topic...

Excerpt from phil ringnalda dot com: Why Blog Browsers?: Comments at

Add your comment