UserPreferences

Categorization


Categories are an important component, and available in most blogging systems. How should we support them in Atom?

See also : SyntaxExtensionMechanism, ExtraInterop

[WWW]Ridiculously Easy Group Forming

Convergence on this is important, as it affects both what a WellFormedEntry contains, and the AtomApi.

[ZhangYining RefactorOk], Categorization is also a(the?) way reader subscribes to one weblogger's feeds of blogs that he is interested reading.

Category for an entry: Optional; Multiple Category for an entry: Optional;

NormanWalsh [WWW]writes, "My entries are also divided into broad categories and have subjects. The distinction between category and subject is a bit vague, but it's roughly where does the essay fit into the general framework of the universe of things described by my blog as a whole (category) and what interesting things, people, places, events, etc., are mentioned by particular log entries (subjects)."

In that context, one might consider a "category" like a table-of-contents and "subjects" like an index.

Blosxom has a hierarchy of categories, mirroring the filesystem/URL namespace. Title of the post is quite different from category.

Upcoming MovableTypePro allegedly also has hierarchical categories. B2Evolution has hierarchical categories too.

MovableType allows an entry to be associated with multiple categories. I believe B2Evolution and RadioUserland can do this too.

[RuiCarmo RefactorOK] here's a thought: wiki entries can be categorized by references/backlinks to other entries. Why not sets of interrelated entries, instead of fixed categories? PhpWiki has a SubPages concept, but I found it lacking, so I implemented [WWW]SeeAlso. I've never actually needed categories since.

[AdriaanTijsseling RefactorOK] Categories must be included. A SeeAlso is certainly useful, but being able to categorize data is intrinsic to human nature. Plus that it is already widespread in most blogs. Best to have hierarchical categorization with the possibility of multiple categories.

[ChristianCrumlish, RefactorOk] Where does the idea of categories fit into this model? Clearly categories are optional, but even if you have them, they can be conceptually handled different ways. For example, in the Radio model, the default "Home" category can be unapplied, and the interface encourages multiple-category application. In the MT model, there is no equivalent of "not on the home page" and the interface encourages single-category application. I don't know if these distinctions have ramifications for the data model, but since I'm interop/interchange-advocate boy, I'm wondering.

[MattMower RefactorOk] My RadioUserland weblog uses categories as a way of routing content to different weblogs (e.g. my public blog, a test blog and an intranet blog). As implemented by UserLand I find categories unsuitable for use as an organising tool on a single weblog. This is because they have to be created in advance, have no relationships to each other or the weblog and duplicate information.

Instead we have developed a client (k-collector) that allows each post to be associated with multiple topics from a shared topic set. These topics are then presented in the RSS (via the [WWW]ENT extension) for use in filtering & routing posts. At the moment our topics are hierarchical based upon a type (e.g. Person, Place, Thing) but that may change to allow multiple levels of hierarchy. The way we present topics in ENT is designed to encourage their being backed by an XTM topic map which further defines the topic and it's relations.

[DavidEngel, RefactorOk] There seems to be a good deal of support for internal categorization. I'm interested in knowing how external categorization / linkage would work (the hinted at DMOZ, LOC).

[Skware, RefactorOk] I think it's worth thinking of internal categories as attirbutes or keywords that are associated with the entry, rather than the entry being associated with a category. In UML speak we I'm kind of saying Entry has an attribute, rather than category contains entry. This leaves the problem of indexing and collecting categories as an external part of the spec.

[DiegoDoval RefactorOk] movable type for example uses categoryIDs. The categoryIDs are then attributes of the entries. Setting category info as attributes of an entry reflects common usage today. By using a category ID we'd be adding a level of indirection, enough to let applications handle the relationship as they require.

[JeffreyWinter, RefactorOk] I have always thought that the XBEL format represented an interesting means of providing direct categorization, and an ability to directly manage categories via an HTTP/XML API. I've written up some thoughts on the subject [WWW]here. Something similar could be considered here, although this tpye of hierarchical representation may strike some as too complicated. I find it pretty valuable myself :).

[ArveBersvendsen, RefactorOk] When we're looking at categorization, we should also look into TopicMapping

[NicholasAvenell, RefactorOk] I'm tempted to look at Categories as another relationship, as TrackBack and SeeAlso would be. This is partly selfish, because me (and my blogging system) has the ability to relate to a category as something seperate from the title (For example, the entry "Introducing the ESF Specification" would relate to the category "ESF" by the phrase "Original announcement") meaning that simple "This belongs to this category" methodology is over-simple.

Dublin Core

DublinCore has a "category" definition, which they call "subject":

DublinCore also defines several qualified subjects, such as Dewey Decimal Classification and Library of Congress Subject Headings. We can define our own qualifications or, more likely, leave it open where one provides a URI of a provider of categories and then keywords or identifiers drawn from that provider.

The following examples show terms in the Atom namespace, that would be derived from elements in DublinCore, ie. '[Atom]subject' IS-A 'dc:subject'. 'provider' is not a DublinCore attribute, but one that we would define to support our needs.

An unqualified subject, like just a keyword, might look like:

  <entry xmlns="uri/of/Atom">
       :
       :
    <subject>reverberation</subject>
    <subject>resounding</subject>
  </entry>

An entry using keywords and DMOZ:

  <entry xmlns="uri/of/Atom">
       :
       :
    <subject>reverberation</subject>
    <subject provider="http://dmoz.org/">Arts:Movies:Titles:L:Looking for an Atom</subject>
  </entry>

An entry describing a location:

  <entry xmlns="uri/of/Atom">
    <title>In the City of New York</title>
       :
    <subject provider="http://geourl.org/">40.7650070, -73.9861298</subject>
  </entry>

For this example, see also GeoLocation.

[DannyAyers] I do like the above use of DC but I think we need to support a wide as possible range of categorisation mechanisms (DC, ENT, TMs, RDF etc.). One solution would be to include a <metadata> element that can contain any valid XML which would refer to its parent element (see also ExtraInterop).

[AsbjornUlsberg] Calling an element "metadata" is kind of silly, as everything in Atom except the text inside <content> is metadata. An <addinfo> (additional information) element, or something in that direction would be better, imho.

The name of the element wouldn't really be an issue I don't think - metadata just seemed the obvious choice, and it's in use already in SVG. - Danny

[AdamRice] Categorization is an interesting-but-hairy problem. There are many different schemes, all of which !Echo should, ideally, accommodate. Let's see:

This is just a first stab at defining the different categorization schemes--I'm sure others can think of more. FWIW, I consider keywords and categories related but not identical (so does Movable Type, for that matter). Perhaps once we nail down how things are categorized we can nail down a syntax for representing categories. This also suggests publishing the author's personal categorization scheme as a reference point. It might be a GoodThing for !Echo to provide a structure for doing so, but not to require it anytime anyone wanted to use categories.

Working from this, I'll suggest


[BrianMcCallister] XML is hierarchical! I would suggest representing categorical hierarchies via nested elements rather than delimited tokens. I would further specify that categories can be freely amnipulated by anyone in the stream of getting you the feed. In other words an aggregator is free to re-categorize, remove subjetcs, change subjects, etc. Categorization is suggestion.


[JakobVoss] Category is a must but do not make it too complicated nor undefined. There are only two cases:

* freely created by the author (Keywords) * fixed folders to coose from (Categories)

A keyword is just a string and only the author realy knows what is meant by it. A category must be defined somewhere so you have to provide an URI/URL. Categories that are not related to any controlled vocabulary or formal classification scheme are only stupid keywords.

  <entry xmlns="uri/of/Atom">
    <title>You have to see this movie!</title>
       :
    <keyword>Star Trek XVI</keyword>
    <category provider="http://dmoz.org/">Arts:Movies</category>
    <category provider="http://myblog.org/myowncategory/">recommendations<category>
  </entry>

And do not try to model hierarchical categories on this level! A category is a category no matter how it is related to other categories (hierarchical, oppsitional, related...). You do not want to model all this relations.


CategoryExtension, CategoryMetadata, CategoryModel, CategoryRss