Abstract
Allowing xml:lang to appear anywhere in an Atom document complicates an implementation of Atom’s underlying model by requiring every string in an Atom feed to store its own associated language tag.
This proposal suggests that language tagging of every element in an Atom document is an edge-case, and that language tagging should be restricted to entries and heads in line with the existing syndication standards.
Status
Open
Author: DavidPowell
Rationale
xml:lang can appear anywhere and specifies a language for all children of its containing element. This seems simple enough in the context of an XML document, but in the context of any other model, it imposes a significant implementation burden.
Many Atom implementations will use RDBMs or Object Oriented implementations, rather than XML DOMs to model and store atom content internally.
The current use of xml:lang means that anything that looks like a string in Atom, must be represented using a (string, language) class in a typical OO implementation, or by requiring an extra column per field in a typical RDBMS implementation.
It also means that the interpretation of "Simple Extension constructs", as described by PaceExtensionConstruct, would also need to define values in terms of (string, language) types. This makes a mapping onto metadata models that don’t support language tagging, such as WebDAV properties, very difficult.
Allowing xml:lang in a syndication format is an invention with significant implementation costs, it should be removed and replaced with a simple atom:language tag for head and entry, which defines the primary language of the "feed" and "entry" entities, which does not imply that all child elements inherit this metadata.
-
RSS1.0 supports language tagging of feeds and entries using the "dc:language" element1. It also supports language tagging of the rarely used "textinput" element, and of images. An equivalent of the "textinput" element is unlikely to appear in Atom’s core. If "image" is supported by Atom core, then it should specify a language using an hreflang attribute, as atom:link does.
RSS2.0 supports language tagging of feeds using "language" 2.
Proposal
Remove this paragraph in Section 2:
Any element in an Atom Document MAY have an xml:lang attribute, whose content indicates the default natural language of the element's content. Requirements regarding the content and interpretation of xml:lang are specified in XML 1.0 [W3C.REC-xml-20040204] Section 2.12.
Add Section 4.2.14:
4.2.14 "atom:language" Element The "atom:language" element’s content conveys the primary language of the feed. atom:head elements MAY contain an atom:language element, but MUST NOT contain more than one. The content of this element, when present, MUST be a language code conforming to RFC 3066.
Add Section 5.15:
4.2.14 "atom:language" Element The "atom:language" element’s content conveys the primary language of the entry. atom:entry elements MAY contain an atom:language element, but MUST NOT contain more than one. The content of this element, when present, MUST be a language code conforming to RFC 3066.
Impacts
If an author needs to tag parts of an entry with multiple different language tags, then this can be achieved by using HTML or XHTML mark-up in Text constructs using "lang" and/or "xml:lang" elements inside as defined by HTML.
It is already a requirement to use HTML to represent advanced text constructs, such as "ruby" 3.
Notes
This proposal is based on draft-ietf-atompub-format-04.txt.