- Authors versus persons and roles
- Avoiding duplication of author data
- Uniquely identifying an author across feeds
Authors versus persons and roles
[AsbjornUlsberg] This goes in the same line as I've been going this whole time; that elements and attributes should be as generic as possible. Therefore I think the <author>, and of course also the <contributor> elements should begone, and rather introduce one new element that replaces both of them; <person>. Each person with relations to the <entry> has a specific role; <person role="author" />, <person role="painter" /> e.g. The sub-elements of <person> is of course still up to debate, but I don't think it's appropriate to lock ourselves into the "author"-perspective.
[KenMacLeod] I generally agree with the use of roles or qualifiers to specialize attributes. In this context, though, DublinCore provides definitions for 'creator', 'contributor', and 'publisher' and I believe that should be taken into consideration.
[DavidJanes, RefactorOk] As per my comments on [SyntaxConsiderations], consider renaming "Author" to "Poster" or "Creator". Make this element required and narrow as possible. Then all other contributor types can fit in under the generic <person role="..."> type.
GaryFIf we are to have role attributes instead of elements, then the acceptable role values should be specified (extensible through namespaces). This way aggregators know exactly what they're looking at. i.e. an author is always "author", and not "writer".
[DannyAyers] The biggest problem with person + role is that the interpretation of any enclosed elements/attributes or even what would be allowed would vary according to the value of the role. At best it makes it harder to interpret, at worst it means that the syntax can't be validated. I'd strongly suggest taking advantage of the work done by DublinCore on this. It's simple to do but sophisticated in what it can describe, and standard.
[BillDehora] What Danny and Ken said. Use the Dublin Core elements. If you happen to use different names for them, then just say what they are wrt DC.
[AsbjornUlsberg] I'm not sure about this, but what I understand from Dublin Core, is that an "element" isn't necessary an XML element, it's a concept of naming; it's a containter for data which has a predefenition. When looking at Encoding Dublin Core Metadata in HTML, I think this comes out clear. All DC "elements" are prefixed with "DC." and are serialized into <meta> HTML elements to provide the information somehow, in HTML documents. This is because you can't just introduce <ctreator> tags in HTML without breaking the DTD, and even valid through custom DTD's, no UA will understand it.
I think this manifests the principal; DC elements doesn't necessarily have to be XML elements. The point is that we adopt the concept and names from DC, and take it from there. If we feel like plunging "creator" into an attribute, that's OK, but the values of the attribute "rel" (in this case) has to be an enumeration of the DC elements. If one wants to extend this, it can be done through namespaces. Give me a smack if I've misunderstood DC completely, but do it gentle, OK?
[JeremyGray] As I've mentioned elsewhere regarding things like timestamps, there is little need, desire, or justification (other than to be contrary) for re-inventing the wheel. Let's respect prior art. In fact, let's go beyond respect and leverage prior art everywhere possible. Directly. Without grabbing just a name, a concept, and/or an explanation, because these usages tend to backfire due to the assumptions they inadvertantly encourage. In short, use Dublin Core where possible, and by use I do not mean reinterpret or co-opt.
[AsbjornUlsberg] Jeremy, are you saying that we should use <createor>, <contributor>, <publisher>, etc., because Dublin Core uses the term "elements", or am I misunderstanding? The use of the word "element" in DC has nothing to do with XML-elements.
Avoiding duplication of author data
[SimonWillison] I am unhappy with the need to provide full author information for every entry: the duplication of author data adds unnecessary additional information, making feeds larger than they need to be and increasing the effort needed to parse a feed (it also makes hand-rolled feeds harder to maintain). I propose using reference attributes instead. I have posted an example of such a feed at EchoFeedWithAuthorRefs.
[Martin Atkins : RefactorOk] I agree completely. Each author having an ID in the form of a URI, just like entries do, means that aggregators will be able to know when the same person appears across multiple feeds. See the section on uniquely identifying authors across feeds, below. However, I do think that the actual displayable author information for every author referenced by ID in a document must appear in the document somewhere. We can't just assume the aggregator already knows which name goes with a given URI.
[AsbjornUlsberg] Looks like an ok solution to me. Also, I think everything <feed> has in common with <entry> should be inherited by <entry>, and overridden in the <entry> if needed. These common properties are e.g. dates, authors, language etc.
[ZhangYining] -1. As discussed before, an <entry> represents a resource that have its own URI. Date or language, etc do not fit this criteria. We should stick on this unless there is a compelling reason not to. However, particularly on Author info, I am okay with it being a kind of entry.
[AsbjornUlsberg] I'm not sure what you are talking about here. Do you oppose the inheritance of xml:lang, xml:base, dates, authors etc. from <feed> onto <entry>? If you do, I can't say I understand your reasons. Would you care to explain a bit more thoroughly?
[JeremyGray] While I am not against inheritance, it is worth noting that inheritance starts to fall down when one moves past instance documents and into persistence mechanisms (whether at server-side or client-side), at which point the various reference mechanisms suggested here and on other pages really start to resonate with me.
[AsbjornUlsberg] I don't follow. I might look like a moron just shouting "I don't understand" everywhere on this page, and it might be that it's monday, but I really don't understand what you're talking about.
Uniquely identifying an author across feeds
[MartinAtkins : RefactorOk] Having a globally-unique ID for each person in the form of a URI (as used with entries) will allow for cross-referencing within an aggregator and allow you to say, for example "Specially flag all entries from anywhere by this person".
This not only applies to entries by that person in their own weblog, but also to syndicated comments from other feeds. I'm not really sure if this requires a new field to be added to everyone's comment forms where people can enter their unique ID URI or whether we should just have people enter it as their URL. The latter has the problem that people are likely to want to reference different URLs in different contexts, so the globalness of the identifier will be diluted.
Systems like LiveJournal, where comments come from authenticated users with accounts, can thankfully generate this kind of thing very easily and transparently, using a URL like this: http://www.livejournal.com/userinfo.bml?user=username. Of course, users must be allowed to override that URL if they want in the case that they already have a unique ID URI from elsewhere which they wish to carry on using when making LiveJournal entries/comments.
[AsbjornUlsberg] I like the idea, and maybe the author-URI can be obtained through introspection somehow? If we change the introspection-procedure currently supported by the API to rather just be one XML-file at a pre-defined and standardized location, people can point to their website, the introspection-file will be obtained because it's URL is well-known, and the Author's URI will be read from the file.
This might demand several levels of introspection-files, as a hierarchical system like the .config files in ASP.NET. This allows huge sites like Radio to serve a "config.atom" file for the whole website, and then individual authors can serve their own "config.atom" files that overrides the Radio-file or adds additional information (like the Author URI).
The problem is with people who don't have a website, or maybe even don't have an email adress. Non-anonymous users that only provide their name in a comment-form and nothing else. These authors should also be identified, but maybe that's too far fetched? Maybe we can define several levels of authorization, where each level has a "uniqueness"-guarantee? The levels could be:
Introspection-Author-URI: 100% unique
Named email adress (e.g. email@example.com): 100% unique
General email adress (e.g. firstname.lastname@example.org) + Name: 90% unique
General email adress (e.g. email@example.com): 10% unique
Name: 1% unique
Anonymous: not unique
The percentages are just fictional, but it illustrates what I'm thinking. Comments?
There is talk of allowing zero or more <link> elements in the <author> element, with the purpose of each link described via the 'rel' attribute. Thus, you could have something like:
<link href="uri-to-some-foaf" rel="FOAF" type="application/rdf+FOAF" title="my FOAF file">