#!/usr/bin/perl use CGI qw/:standard :netscape/; use POSIX qw/strftime/; print "Content-type: text/html\n\n"; $page = param(page) || substr(path_info(),1) || 1; ($q,$ext) = $ENV{SCRIPT_NAME} eq $ENV{PATH_INFO} ? ('?page=','') : ('','.html'); @data = split(/\n===\s*/s, join("",)); ($title, $body) = ($data[$page] =~ /(.*?) ===\s+(.*)/s); print < XMLConf: $title EOH if ($page =~ /contents-?(\d*)/) { $page = $1 || 1; shift @data; $prev=''; foreach $page (@data) { ($title, $body) = ($page =~ /(.*?) ===\s+(.*)/s); $i++; next if $prev eq $title; $prev=$title; push @contents,"* <% $title %>\n"; } undef @data; for ($i=0; $i<=$#contents; $i++) { $data[int($i/12)+1] .= $contents[$i]; } $title="Contents"; $body="#pragma compact\n" . $data[$page]; $q .= "contents-"; } print "\n" unless ($prev=$page-1)<1; print "\n" unless ($next=$page+1)>$#data; $compact=1 if $body =~ s/#pragma compact\s*//; @chunks = split(/<%(.*?)%>/ms, $body); for ($i=0; $i<@chunks; $i+=2) { $chunks[$i] =~ s/&/&/msg; $chunks[$i] =~ s/\n/msg; $chunks[$i] =~ s/ /  /g; $chunks[$i] =~ s/^~~ (.*)/

\1<\/p>/mg; $chunks[$i] =~ s/^~ (.*)/

\1<\/div>/mg; $chunks[$i] =~ s/^! (.*)/

\1<\/p>/mg; $chunks[$i] =~ s/\*(\w.*?\w)\*/\1<\/b>/mg; $chunks[$i] =~ s/\[(\S*\.png)\]//g; $chunks[$i] =~ s/\[(\S*\.html?)\]/\1<\/a>/g; $chunks[$i] =~ s/\[(.*?) (\S*\.html?)\]/\1<\/a>/g; $chunks[$i] =~ s/\[(http:\S*) (.*?)\]/\2<\/a>/g; $chunks[$i] =~ s/\[:([^\n]*?):\]/\1<\/tt>/msg; $chunks[$i] =~ s/\[:(.*?):\]/

\1<\/pre>/msg;
  $chunks[$i] =~ s/\["(.*?)"\]/
\1<\/blockquote>/msg; } $body=join('',@chunks); print "

$title

\n\n"; print "
\n"; $ul = 0; $ol = 0; $out = ''; foreach $line (split("\n",$body)) { ($stars,$line) = ($line =~ /(?:(\*+)\s*)?(.*)/); $out .= "
    \n" while $ul++length($stars); ($hashes,$line) = ($line =~ /(#*)(.*)/); $out .= "
      " while $ol++length($hashes); $out .= "
    1. " if $stars or $hashes; $line .= "
      \n
      " if !defined($compact); $out .= "$line"; $out .= "
    2. \n" if $stars or $hashes; $stars = $hashes = undef; } $out .= "
\n" while $ul-->0; $out .= "\n" while $ol-->0; $out =~ s/(
\s*)+
    /\n
      /g; $out =~ s/(
      \s*)+<\/li>/<\/li>/g; $out =~ s/(
      \s*)+
      /\n
      /g; $out =~ s/(
      \s*)+$//sg; print "$out\n
\n\n\n\n"; __DATA__ === === === XML Conference 2003 === ~ Atom in Depth === Preface === [http://xml.coverpages.org/ni2003-10-22-a.html Robin Cover]: ["The key insights are these: design Atom such that content is not treated as a second class citizen; insist upon a uniform mechanism for expressing the core concepts independent of the usage; keep the format open and simple."] === Agenda === * *Background* * Core model * Syndication * "API" * Web Accessible Archive === Attribution === Atom is based heavily on the concepts and experiences with RSS. Special thanks go out to: * [http://www.netscape.com/ NetScape] * [http://www.userland.com/ UserLand] * [http://groups.yahoo.com/group/rss-dev/ RSS-Dev working group] === Attribution === Atom is based heavily on the concepts and experiences with RSS. Special thanks go out to: * [http://www.netscape.com/ NetScape] * [http://www.userland.com/ UserLand] * [http://groups.yahoo.com/group/rss-dev/ RSS-Dev working group] * and especially the community! === History === On October 22nd, Mark Pilgrim and I announced the availability of a [http://diveintomark.org/archives/2002/10/22/rss_validator RSS validator] built from the ground up to support all versions of RSS. === History === On October 22nd, Mark Pilgrim and I announced the availability of a [http://diveintomark.org/archives/2002/10/22/rss_validator RSS validator] built from the ground up to support all versions of RSS. Both in the development effort itself, and in examining both feeds that failed, and feeds that one aggregator developer or another *wished* would fail, we became frustrated by a number of ambiguities in these specs. === History === On October 22nd, Mark Pilgrim and I announced the availability of a [http://diveintomark.org/archives/2002/10/22/rss_validator RSS validator] built from the ground up to support all versions of RSS. Both in the development effort itself, and in examining both feeds that failed, and feeds that one aggregator developer or another *wished* would fail, we became frustrated by a number of ambiguities in these specs. Contrary to what some would have you believe, pretty much *all* of these ambiguities apply to *all* RSS versions. === Agenda === * Background * *Core model* * Syndication * "API" * Web Accessible Archive === Atom Data Model === Data model (part I) * title * link * id * summary * content * ... === RSS Title === #pragma compact Definitions: * RSS 2.0: "The title of the item." * RSS 1.0: "The item's title." * RSS 0.91: "When used in an item, this is the name of the item's link." Example: * [: Atom in Depth :] === Title escaping === ~ Can title contain markup? === Title markup: bad? === #pragma compact [" Hey I didn't know that [http://www.variety.com/RSS.asp Variety has an RSS feed]. It's been out since July. Yow. *Unfortunately it's not valid. It's got HTML markup in an item title* "] - <% http://archive.scripting.com/2002/11/01#When:7:35:30AM %> === Title markup: OK? === #pragma compact [" I'm giving feedback to a RSS feed provider who is including markup in the titles of items. *The [http://backend.userland.com/rss#hrelementsOfLtitemgt spec] is silent on whether this is allowed, so it must be allowed*. "] - <% http://archive.scripting.com/2003/06/13#When:8:12:30AM %> === Title markup === * Few titles need markup * [http://www.variety.com/RSS.asp Variety] occasionally has a line break * Others use bold and italics * Seen, but rare: hypertext links and del tags === Title example === ~ I <3 Nsync :> LiveJournal feeds often contain titles like: === Escaping example === This doesn't work: [: I <3 Nsync :> :] One level of escaping: [: I <3 Nsync :> :] Two levels of escaping: [: I &lt;3 Nsync :&gt; :] === Tool handling of titles === * Some will strip markup entirely * Some will display verbatim * Some will interpret as markup === Tool handling of titles === * Some will strip markup entirely * Some will display verbatim * Some will interpret as markup These strategies may look odd on occassion, but seldom result in fatal errors in the context of display === Tool handling of titles === * Some will strip markup entirely * Some will display verbatim * Some will interpret as markup These strategies may look odd on occassion, but seldom result in fatal errors in the context of display * that's OK for syndication, but as an API or archive? === Atom title === * Default is text/plain, i.e., *no markup* * Markup is permitted, *iff* explicitly indicated ** mode="escaped" ** type="text/html" === RSS Link === #pragma compact Definitions: * RSS 2.0: "The URL of the item." * RSS 1.0: "The item's URL." * RSS 0.91: "This is a url that a user is expected to click on" Example: * [: http://intertwingly.net/slides/2003/xmlconf/ :] === Link === Notes: * Original usage was to indicate what the item referred to. * Popular usage today tends to be to indicate where the item itself resides (generally referred to as the permalink). === Link === [" imho, link should be used only to link to the article being described by the post, it should only be used in the TLD context. I believe that was a very solid application and shouldn't be muddied. "] - <% http://archive.scripting.com/2003/06/25#theLizardBrainOfRss %> === Atom link === Based on [http://www.w3.org/TR/1998/REC-html40-19980424/struct/links.html#h-12.3 HTML 4.0]: * href - destination of link * rel - type of relation * type - mime type * title - for human consumption === Rel example values === * alternate * service.post, service.feed, service.edit * start, next, prev * related * [http://www.ietf.org/rfc/rfc2731.txt extensible] === Guid / rdf:about === Definition: * RSS 2.0: "guid stands for globally unique identifier. It's a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new." * RSS 1.0: " {item_uri} must be unique with respect to any other rdf:about attributes in the RSS document and is a URI which identifies the item." === Guid / rdf:about === Example: * [: http://intertwingly.net/slides/2003/seybold/ :] * [: :] Aggregators like NewNewsWire support [http://groups.yahoo.com/group/NetNewsWire/message/889 both equally] === Atom id === Unique identifier * must be a URI ** urn:, tag:, ... * need not be a URL ** http:, ftp:, ... === Description === Definitions: * RSS 2.0: "The item synopsis" * RSS 1.0: " A brief description/abstract of the item." * RSS 0.91: "a plain text description of an item" Example: * [: RSS is an XML-based technology... :] === Escaping again === [" The RSS2 spec, marvel of informality that it is, notes in passing that (entity-encoded HTML is allowed) with no words about what this might mean or how such HTML might be interpreted. This underspecification (inherited from many previous versions of RSS) leads to really stupid behavior even in good software "] - <% http://www.tbray.org/ongoing/When/200x/2003/04/22/RSS-Problems %> === Other problems === * Markup often contains relative URLs. Attempts to [http://blogs.law.harvard.edu/tech/discuss/msgReader$171?mode=topic retroactively fix this] for RSS have been inconclusive. Atom will adopt [http://www.w3.org/TR/xmlbase/ xml:base]. === Other problems === * Markup often contains relative URLs. Attempts to [http://blogs.law.harvard.edu/tech/discuss/msgReader$171?mode=topic retroactively fix this] for RSS have been inconclusive. Atom will adopt [http://www.w3.org/TR/xmlbase/ xml:base]. * Despite being clearly documented as a synopsis or abstract, quite a number of blogging tools put the full content of the blog entry in this element. === Other problems === * Markup often contains relative URLs. Attempts to [http://blogs.law.harvard.edu/tech/discuss/msgReader$171?mode=topic retroactively fix this] for RSS have been inconclusive. Atom will adopt [http://www.w3.org/TR/xmlbase/ xml:base]. * Despite being clearly documented as a synopsis or abstract, quite a number of blogging tools put the full content of the blog entry in this element. ** Solution: split description into two tags: *** summary *** content === Atom summary === *Optional* * Default is text/plain, i.e., *no markup* * Markup is permitted, *iff* explicitly indicated, examples ** mode="escaped" ** type="text/html" === Atom content === *Optional* * Default is text/plain, i.e., *no markup* * Markup is permitted, *iff* explicitly indicated, examples ** mode="escaped" ** type="text/html" === Atom Data Model === Data model (part II) * ... * author * contributor * issued * created * modified === RSS author/dc:creator === Definition: * RSS 2.0: author ** Email address of the author of the item. * Dublin Core: creator ** The person or organization primarily responsible for creating the intellectual content of the resource. === Author/Contributor === Nested elements for: * name * url * email === Dates === * Issued date (for display) * Last modification (for differences) * Initial creation === Extensibility === * Adding child elements w/namespaces ** new "columns" * Linking to data ** new "tables" === Example: events === Scenario: somebody you know mentions that she is going to be at a certain place at a certain time, like a conference. === Example: events === Scenario: somebody you know mentions that she is going to be at a certain place at a certain time, like a conference. * Wouldn't it be nice if you could directly add this information to your calendar? === Example: events === Scenario: somebody you know mentions that she is going to be at a certain place at a certain time, like a conference. * Wouldn't it be nice if you could directly add this information to your calendar? * There already are [http://www.imc.org/pdi/ existing, well known] file format for such data exchage (albeit non-XML). === Example: events === Scenario: somebody you know mentions that she is going to be at a certain place at a certain time, like a conference. * Wouldn't it be nice if you could directly add this information to your calendar? * There already are [http://www.imc.org/pdi/ existing, well known] file format for such data exchage (albeit non-XML). * Solution: [: :] === Other === * xml:base * xml:lang * dublin core === Agenda === * Background * Core model * *Syndication* * "API" * Web Accessible Archive === Syndication === * Original use of RSS assumed client rendered every item on every retrieval * For syndication, it is important to have a consistent scheme that one can rely on to: ** Identify if an entry has been seen before ** Identify if an entry has been changed === Syndication elements === *Required* elements: [: http://.../ :] [: 2003-12-08T17:00:00Z :] [" *The number of optional features in XML is to be kept to the absolute minimum, ideally zero*. As a result of this, any XML document has a high probability of being handled successfully by any XML processor."] - <% Design Principles for XML %> === Bandwidth === If you are serving data for a large number of customers, bandwidth is a concern. Conditional GET helps, but does not eliminate the concern about data size. [: :] [: :] [: 2003-12-08T17:00:00Z :] === Agenda === * Background * Core model * Syndication * "*API*" * Web Accessible Archive === "API" === Instead of focusing on Applications, the focus is on Data * Post * Get * Put * Delete Clients may optionally use [http://www.intertwingly.net/wiki/pie/DifferentlyAbledClients SOAP]. === API: What? === ~ WYSIWYG Want to post an entry? POST an Atom entry. What could be simpler? === API: Where? === Resources are identified via links [: :] [: :] ... placed directly in the feed. === Agenda === * Background * Core model * Syndication * "API" * *Web Accessible Archive* === Archive === * Atom entries contain both metadata and data * Connected together via links * Organized by date or category or whatever makes sense === Archive === link rel="..." * prev * next * index * ... === Resources === * Documentation: [http://www.mnot.net/drafts/draft-nottingham-atom-format-01.html format] [http://bitworking.org/projects/atom/draft-gregorio-09.html API] * Validator: [http://feedvalidator.org/ feedvalidator] * Coming soon: Voluntary compliance test suite === Summary === * content is a rich source of metadata * clients should be able to rely on required elements ** improved user experience * escaping and markup OK, *if* annotated ** improved user experience * data access vs "API" * links, links, links