UserPreferences

PaceSimpleLanguageTagging


Abstract

Allowing xml:lang to appear anywhere in an Atom document complicates an implementation of Atom’s underlying model by requiring every string in an Atom feed to store its own associated language tag.

This proposal suggests that language tagging of every element in an Atom document is an edge-case, and that language tagging should be restricted to entries and heads in line with the existing syndication standards.

Status

Open

Author: DavidPowell

Rationale

xml:lang can appear anywhere and specifies a language for all children of its containing element. This seems simple enough in the context of an XML document, but in the context of any other model, it imposes a significant implementation burden.

Many Atom implementations will use RDBMs or Object Oriented implementations, rather than XML DOMs to model and store atom content internally.

The current use of xml:lang means that anything that looks like a string in Atom, must be represented using a (string, language) class in a typical OO implementation, or by requiring an extra column per field in a typical RDBMS implementation.

It also means that the interpretation of "Simple Extension constructs", as described by PaceExtensionConstruct, would also need to define values in terms of (string, language) types. This makes a mapping onto metadata models that don’t support language tagging, such as WebDAV properties, very difficult.

Allowing xml:lang in a syndication format is an invention with significant implementation costs, it should be removed and replaced with a simple atom:language tag for head and entry, which defines the primary language of the "feed" and "entry" entities, which does not imply that all child elements inherit this metadata.

Proposal

Remove this paragraph in Section 2:

  Any element in an Atom Document MAY have an xml:lang attribute,
  whose content indicates the default natural language of the
  element's content.  Requirements regarding the content and
  interpretation of xml:lang are specified in XML 1.0
  [W3C.REC-xml-20040204] Section 2.12.

Add Section 4.2.14:

4.2.14  "atom:language" Element

  The "atom:language" element’s content conveys the primary language
  of the feed.  atom:head elements MAY contain an atom:language
  element, but MUST NOT contain more than one.

  The content of this element, when present, MUST be a language code
  conforming to RFC 3066.

Add Section 5.15:

4.2.14  "atom:language" Element

  The "atom:language" element’s content conveys the primary language
  of the entry.  atom:entry elements MAY contain an atom:language
  element, but MUST NOT contain more than one.

  The content of this element, when present, MUST be a language code
  conforming to RFC 3066.

Impacts

If an author needs to tag parts of an entry with multiple different language tags, then this can be achieved by using HTML or XHTML mark-up in Text constructs using "lang" and/or "xml:lang" elements inside as defined by HTML.

It is already a requirement to use HTML to represent advanced text constructs, such as "ruby" 3.

Notes

This proposal is based on [WWW]draft-ietf-atompub-format-04.txt.


CategoryProposals