ElementsVsAttributes

Often when creating an XML schema based on a data model, you have to choose between putting data in element content or in attributes. It's hard to decide which one to use and there are no obviously right answers that apply everywhere.

Example: using an element:

<book><title>Perelandra</title>...</book>

Example: using an attribute

<book title="Perelandra">...</book>

See CoverPages, Using Elements and Attributes.

One solution is for the application to allow either element content or an attribute to be used, interchangably. This is supported by RelaxNg.

Elements:

can have child elements, be nested, and have mixed content
be repeated.
can be used with <![CDATA[ ... ]]>
when used exclusively, appear more regular and consistent to the reader
provide an extensibility point for future child attributes and elements

Attributes:

are always just character data
are easier to access in some libraries
in sequential, callback-driven parsers attributes are available as soon as the object they apply to is opened, in a single event
can only be declared once
give a more condensed appearance to the reader
are restricted from extensibility, which, while usually undesirable may in fact be desirable for certain tightly controlled constructs

In some schemas, all "information" is stored in element content and attributes are used only to provide properties of the specific content, such as an ID, a type, or a language.

XML Namespaces and xml:lang are declared using attributes, so one can't exclusively use elements when using those.

Comments

[AsbjornUlsberg, RefactorOk] Also, XML-ID requires to be an attribute, and even requires the value to be a character literal with characters from [a-z] and/or [0-9], starting with a [a-z], e.g. <feed id="f982938effeEgr23443Ffeffdf">. I think mixing elements and attributes will just end up confusing users, and will make it more difficult to build a data model out of an Echo feed. The spec should decide what are elements and what are attributes.

[MikeDaconta, RefactorOk] I do not believe that attributes and elements should be interchangeable (See http://www.daconta.net/elems-and-atts.html). Except for needing mixed content, I recommend looking at this in relation to tight-versus-loose coupling. Things that are characteristics of an element (and thus cannot conceptually stand on their own) should be attributes of the element.

[JeremyGray, RefactorOk] There are a number of good strategies for selecting between elements and attributes, but one additional thing worth mentioning that hasn't been explicitly stated here is that elements provide for a future point of extension whereas attributes cannot. I think this is important enough that I've added it to the list above. Future-proofing is good, especially when it comes at little or no price (elements). Conversely, clear verboten areas which specifically deny extensibility can also have uses (attributes).

[DeveloperDude] I don't mind the use of attributes. I mind the overuse of attributes. A good rule of thumb I use is if you can conceive of any implementation of a value being more than just a small text string, then implement as an element. That is, error on the side of elements over attributes.

JeremyGray

Discussion from EchoExample

[JamesSnell] I made "id" an attribute to reflect GeorgBauer's comments below about possibly using attributes for technical, non-content related pieces of information (such as content type, language, etc).

[AsbjornUlsberg] I think attributes has it's usage, specifically for data that shouldn't be visible to any end-user. I think there are many elements in the existing RSS standard that should be attributes, and not to mention more generic naming, so that the attribute's value don't always have to be an alias for "true/false". E.g; Instead of "base64Encoding='true'", it should of course be "encoding='base64'".

MatthewThomas

link

homepage

weblog

E.g.

<entry>...<link>http://foo/127</link>...</entry>

<entry link="http://foo/127">...</entry>

[GeorgBauer] An idea: maybe restrict attributes for "technical, non-content" stuff? For example this way the type is something that should be noted with an attribute, as it describes the content of an element, and is not the actual content itself. This way I think the "id" element should be an id attribute at the entry element - this would make clear the destinction Sam gave in his comment, as the id attribute denotes some technical identification, but the permalink element denotes the actual link for the entry. This could give a way to select date formats, too: just have a format attribute at every timestamp element that gives "iso" or "rfc822" or some other denotion of format.

ZhangYining

[DannyAyers, RefactorOk] my personal preference would be to use elements for the stuff and attributes for modifiers.

[BillHumphries] No. For example, I've got 5 years worth of blog entries, a couple thousand, and I know (but have been too lazy to fix) that many are not valid. So you need the type in order to tell whatever's marshalling the entries what to do. And the CMS has to do the right thing when generating an Echo element.

[JamesSnell] Are we coming to a design style decision now that we're ready to vote on?

Use Elements for all data corresponding semantically to the entry (e.g. entry identifier, links, etc)
Use Attributes to convey data technically relevant to the Echo document itself (e.g. language tags, MIME types, etc)

[AaronSw] I'm not sure it's a good idea to vote on design principles at this stage. It might be better to take these things on a case-by-case basis.

[DiegoDoval] +1.
[JeremyGray] +1.
[KenMacLeod] +1 on premature. Note that RelaxNg supports a design where data can be in either elements or attributes, allowing the benefits of either.
[ZhangYining] -1+1 It's better to have a rule-of-tumb or guidelines, and then go by case-by-case for specific issues. It helps not to start with too many variations.

[KenMacLeod] I don't think a discussion about element- or attribute-value styles should be on a per-element basis. See Syntax and ElementsVsAttributes.

[AsbjornUlsberg] Uhm. What's the difference between "case by case" and "element by element"? Take AaronSw's statemen; "I'm not sure it's a good idea to vote on design principles at this stage. It might be better to take these things on a case-by-case basis.", and that both you, DiegoDoval and JeremyGray agreeing, I don't really understand what your position is.

I think it would be best to define a set of rules to follow regarding this, but until we do, we should imho discuss each case independantly, like now -- with LinkElementDiscussion.

[KenMacLeod] Ok. Re. "premature", what I meant was that style rules can be determined after we have a more solid model to work from, so different cases of style should be reflected in a complete EchoExample. I'm Ok too working on those cases of style using a small subset, even one element, of an example, as long as it's rolled-up eventually into a consistent design principle.

[AsbjornUlsberg] I fully agree. Each case should of course adhere to the higher-level rules, whenever they are set. Is it still too early to start defining these rules, you think?

[KenMacLeod] I'd say "now" is a fine time to start.

[AsbjornUlsberg] Ok. Then "let's", shall we?