Phil
Ringnalda. Even though I can hear Sam muttering digital
magpie in my ear... Phil, you say this like it is a bad
thing.
I believe that you and I have common tendencies when it
comes to exploration, but when it comes to choices, I find that I
have a tendency to pick the dull and boring ones.
As to the topic of blog browsers, I do have a number of
thoughts. One set of thoughts is that the data being
captured is not merely hierarchical, it is actually
hierarchical faceted metadata. But mostly my thoughts are
to the dull and boring topics of the file format itself.
For starters, my site is generated dynamically. This means
that you can see any blog entry, day, month, or year in any of
several formats. Here's August
in rss2. June
11th in txt. Entries
containg "Ringnalda" in esf. I could also slice by
categories if I were to use that particular feature. You get
the idea. So, for starters, I'd like some name other than
simply ".xml" for the files.xml format... then I could enable it
everywhere.
Now as to the file format itself, it appears tailored to
blogging applications that statically render their content.
What are the created and modified dates for each of the dynamically
renderable slices I identified above? Should I calculate
number of bytes in each in anticipation that it might need to be
generated?
It is also not clear how one extends this format. If you
look at my archives page,
you can see that I have readily available a count of the number of
entries. Might this be useful?
Unfortunately, I can see how this dicussion will play out.
Somebody will say
that "files.xml is not a brilliant
format. It is a compromise. It is for blog browsers. That's all it
is for, for the 18,000th time." Then three months later
will say that it is the
perfect format for some other application that none of us have
thought of yet. And nobody will be clear as to what
applications are out there using this format, let alone know what
the impact will be of any change.
We've played this game before. Why not learn from the
past?
All I am saying is: give this format a name. And a
namespace. And specify from the beginning how (or even if) it
can be extended.
Okay, I'll bite. What's wrong with files.xml, enumerate the problems, and let's address them. It's still quite early, too early to punt on perfection.
I'm confused, Why should anyone create or use a special app for browsing RSS feeds instead of using a regular browser over RSS or whatever converted to HTML via XSLT. The concept of blog browsers seems to completely miss the point of using XML in the first place. The main issue I see with using XSLT is that it can't convert one XML document to multiple HTML documents (i.e. frames) which may or may not be an issue depending on how much functionality you want out of the blog browser UI.
Dare, one of the reasons I'm interested in blog browsers outside the web browser is that there is no monopoly there, and no bigco owning the space to hold back progress.
Doesn't much matter to me whether you do it XSLT or C#, as long as you build me something that will actually preserve and use the metadata in RSS: if you can make a blog browser in XSLT that will search and filter by date, author, and category, pulling posts from multiple files, and then let a metaWeblog API client pull out posts keeping track of the feed title, the item title, and the item link, I'm all for it.
Dare: just to let you know, I parse your comments as "why doesn't everybody use my favorite programming language and the web".
As to the second half, let me draw an anology... I like to use NNTP and Mozilla for access to high volume mailing lists. For RSS, some people like NetNewsWire.
1) The file and files elements should be in a namespace other than "". (I may have said this one before. ;-))
2) Flat is cool with me, but if you want my preference, I would prefer if the attributes other than path were made child elements.
3) None of the elements other than path are particularly easy to dynamically generate from within blosxom, so I would prefer if they were explicitly made optional.
4) Specify the rules for people to define new child elements. (Again, I may have said this one before ;-))
Dave, I don't see how creating a web interface to accessing weblog content puts you in the thrall of BigCos any more than the fact that http://www.scripting.com is available on the World Wide Web via HTTP does this already (if it does this at all).
Yes, www.scripting.com is very limited by the whims of the monopolist browser vendor. That was what the whole thing about Smart Tags was about. They wanted to edit my content so that random words point to pages on their site. Now there's no doubt in my mind that that would be horribly illegal in the US, but I'd rather not have to go to court to protect free speech, so I want to, whenever possible, route around them. I see it as free speech insurance.
Sam, This is the same thing I'd tell any of our customers who asks me
"We have an XML storage format for some important information and would like to provide a rich interface for navigating and viewing this data do we
1.) Use standardized technologies like HTML which can be viewed in browsers as diverse as Internet Explorer, Mozilla, Konquerer, Lynx, Opera, OmniWeb, etc as the presentation format and then write a tool to transform our format to this rich HTML view another standardized technology (XSLT) which is available on multiple platforms and programming languages?
or
2.) Should we build a custom tool in C++/Java/C#/etc that mimics a newsreader/web browser to navigate and view our custom XML structure?
I can't understand why anyone would pick (2) over (1).
Sam, couple of questions.
* Do you think that there would be more people able to view RSS as web pages than there would be those who would have a custom RSS browser?
* Do you think it is more likely that there would be a proliferation of different stylesheets to view RSS content or different custom RSS browsers?
Dare, do you care to tell me then why the world needs Office 11? :-P
The format that Dave is advocating is describable via XSD. If he chooses to describe it that way, great. If not and I am so inclined, at some point I may do so (unlikely, but possible).
What I would like to know, in the meantime, is whether this is an open content model or a closed format.
Meanwhile, I very much appreciate the fact that you defined an RSS feed for your K5 diary.
I'll concur with Sam's comments, especially the one about making "path" the only required attribute of "file". Requiring anything else locks out systems that, for whatever reason, can't generate "created", "modified", or "size" attributes.
As to whether those other attributes should be child elements instead, I'm not sure it matters, but I think it's important to allow for other child elements via namespaces so that non-Radio weblogs can store their own specific info.
I think "modified" is a minimal requirement. How else could a restorer tell if something needs to be restored from a backup. Recall that's the purpose of this format in the first place.
The world doesn't...unless you consider Microsoft the world (which they seem to some days).
But Microsoft does need Office 11, which in turn needs WinXP or Win2K SP3...which means you have to accept a EULA that allows M$ to muck with your machine (think DRM) any time they feel it necessary.
I have the feeling I'm blissfully wandering into a minefield, but: does the need for a modified attribute for backup mean that the format needs to require it, or only that the application needs to?
I'd be perfectly happy with a spec that said something like "a file element is required to have a path attribute, has the following optional elements which may be required by a particular application, and any other attributes must be namespaced," but that reminds me of a lot of things that seem reasonable to me until I'm told that I'm not just stupid, but evil.
Phil at the risk of being labeled both stupid and evil myself, what you say makes sense. Now I know how to approach the writing part of this little project, which Murphy-willing, I will undertake tomorrow morning. All the standard disclaimers apply.
Someone tell me I've missed the point. I've written one of these blog browser thingies and came across a problem due to the entirely flat nature of files.xml - not all <file> elements are the same (or equal), some point to rss descriptions of web logs for a month, some point to stories, some preferences. I can only determine what's being pointed to by looking at the name given in path - and not everyone is using the same convention. Wouldn't it be better to have at least one level of heirarchy (<rssfiles> <storyfiles> <preferences>). Using <rssfiles> would allow any convention the author liked, as a browser all I want to know is where to get the file. An associated issue is year/month, this only comes from a path naming convention - its OK, but it should be explicit in whatever docs there might be.
Dave wrote: "Recall that's the purpose of this format in the first place," but then agreed with Phil that just because apps may require a certain element or attribute doesn't mean the format should require it.
I'm glad, 'coz it would've been embarrassing to have to point out to Dave that the original purpose of RSS was simple syndication, and now he's using it for weblog storage and backup. :-)
If we don't overload path with the file type (it's simple and compact, but a bit limiting), is there any convenient way to make it short, sweet, and extensible? Best I can come up with is type="rss|index" in the core, and radioWeblog:type="pref|whatever|other" for anything else app-specific.
Using a "type" attribute has nice precedent in HTML, but usually implies a MIME type. So your RSS files should be something like type="application/rss+xml", and the index/archive file should be something like type="application/rsslist+xml". (How'd that be for a name for this file format? "RSS List"?)
This has the advantage of having a pre-existing set of types for a lot of files (images, text, OPML, etc.), and parsers can choose to download listed files depending on if they think they can handle the MIME type.
We'd need a MIME type for all those .fttb files in DHRB's RSS list, though.
Personally, it doesn't worry me. I already have code (from a previous project) to read directory.opml files, so supporting that would be no big deal. As long as all tools that generate this sort of backup work the same way, it doesn't matter to me.
Pete has a point; currently the type of each file is defined by its path. In BlogGazer, I run through files.xml and pull out the year and month from everything matching the regex backups/posts/(\d+)/(\d+)\.xml. That means anyone not using exactly that format won't show up properly.
Either we need to:
1. Make the paths standard, so it's invalid to produce monthly backup files that aren't in the backups/posts/(year)/(month).xml format
or:
2. Add in types and extra notes, so it's obvious where the posts are
or:
3. Completely change the format to be more descriptive
I'd be happy with any of the above options, although I'd have to say that #1 is the simplest, which probably makes it the best choice for the moment. Ideas?
1. Make the paths standard, so it's invalid to produce monthly backup files that aren't in the backups/posts/(year)/(month).xml format
is OK, though there is an archive out there that is using "archives" as the directory. The use of an extra level or two in the xml would ensure that if anyone decides upon some other place to store files it wouldn't break any blog browser or Blog application that was attempting to import data from another system.
It's currently impossible for Blogger users to generate files using the "backups/posts/(year)/(month).xml" file hierarchy unless they write a program to do it after the fact. So forgetting about blog browsers for the moment, that's raising a barrier between moving backups into and out of Blogger.
What's wrong with using Dublin Core "title" child elements of the "file" element, and if you must use a regex to extract outline data, parsing that? (Here's an example.)
PS: I know I implied earlier that it's important to cater to other applications of this format, even if those applications are outside the format's original purpose. The point is to enable those applications without *breaking* the original purpose. And I think it's breakage to prevent Blogger users from backing up their content in a form other weblogging systems can use.
I agree with Dare Obasanjo "I'm confused, Why should anyone create or use a special app for browsing RSS feeds instead of using a regular browser over RSS or whatever converted to HTML via XSLT."
I think the concept should be a more dynamic use like a blog / instant messenger merge.
Given that most of the second-generation blog comment spams I've seen involved quoting a previous comment with a comment that sort of vaguely seemed in context without actually having any meaning, then linking to something else with search keywords for a name, if I were a spam comment filter I would assign your comment a score of around 85%. On investigating, the fact that the site you link to offers absolutely no information, but links to a number of "partners" in totally unrelated fields would kick me up to 99%, and the fact that Google puts this entry above yours for the words "blog browser" would put me at 100%.
Stephen Galluccio -/- Built2.com -/- dun&bradstreet id: 945406064 128 Marine Avenue, Suite 2A New York City, NY 11209 US
Domain Name: BLOGBROWSER.COM
Administrative Contact- Stephen Galluccio: stephen@built2.com Built 2 128 Marine Avenue 2A New York City, N.Y. 11209 US Phone- 718 836 3390 Fax- Technical Contact- Stephen Galluccio: built2@acedsl.com Built 2 labs 9229 Shore Road 1C New York City, NY 11209 US Phone- Fax-
Record update date: 2002-11-20 08:18:33 Record create date: 2002-11-20 Record expires on: 2003-11-20 Database last updated on: 2002-12-14 23:23:45 EST
Phil, "vaguely seemed in context without actually having any meaning"
I stand by my comment, it seems that you want to create useless products that can be done by a simple html page.
I work for a trading firm and speed of information is valued and think a blog / instant messenger merge would be useful. It would also introduce blogCircles.
It could be for business: Project teams / biz units that publish blogs that would all feed into the blog browser so other units can provide instant comments on development. In the case of traders, they often work in sub groups to trick the markets or risk mgmt. and currently use 2 programs (news feeds + IM). Anytime tech developers can cut down the "swivel chair effect", it is always of interest to the group.
It could be for consumers: Kids would set up blogCircles, groups of bloggers or blog topics to watch. The key is that they can respond to the originating blog or break off into individual conversations at any time but all have access to the original topic.
I am working in project/interface builder on a beta that will be up by the end of the week.
For what it's worth (getting back to the discussion of formats), I've modified my files list to put the document titles in an attribute instead of a sub-element. Does this help the current batch of proto-apps?
I get the feeling there's not a lot of heat surrounding this. And it's hard to figure out how to continue the discussion with all interested parties when all the talk is scattered across several weblogs' comment threads. Would anyone be up for a mailing list so we can zero in on a format?
Phil Ringnalda. Phil, you say this like it is a bad thing. I believe that you and I have common tendencies when it comes to exploration, but when it comes to choices, I find that I have a tendency to pick the dull and boring ones. As to the topic...