UserPreferences

RestEchoSearchApi


Introduction

Rationale

The current API proposals include a simple GET-based search entry point. More sophisticated interfaces are likely to appear; this document discusses some issues relating to these, and looks to how much these various interfaces can be harmonised.

Observations

Work required

There are two levels of detail here, the first being more conceptual (questions like "should we do this?" and "where does this fit?"), and the second being more detailed.

Conceptual work

Detailed work

Search vocabulary

The search vocabulary gives a mapping from elements and attributes of the data of a feed to textual nouns that can be used in searches.

Some search structures may not need a vocabulary separate from what the standard <feed> XML format defines (the document-centric structure presented below does not, for instance), so the vocabulary used by other search structures, including the simple GET interface, should use a shared vocabulary derived from and as close as possible to the XML format names. Those structures that do use the vocabulary may choose to restrict those nouns it makes accessible.

As well as nouns derived from the data, meta-nouns may prove useful. An example from the draft API would be for the position of an entry in the entire feed, ordered by (decreasing) modification time. This might be called 'modification-position', or something equivalent but more terse. (In the current draft API this is pretty much the only noun, and so isn't given explicitly; since here we are considering richer searches, even with the GET interface, we prefer to make anything that might be searched against into an explicit noun or meta-noun.)

Proposed vocabulary mapping

A simple mapping from the XML data to textual nouns would be to take advantage of the element/attribute hierarchy, from the <feed> element down to the element or attribute in question. For instance, given the XML structure as follows:

<?xml version="1.0" encoding="utf-8"?>
<feed version="0.2" xmlns="http://purl.org/atom/ns#">
  <!-- required elements -->
  <title>dive into mark</title>
  <link>http://diveintomark.org/</link>
  <modified>2003-08-05T12:29:29Z</modified>
  <!-- optional elements -->
  <tagline>A lot of effort went into making this effortless</tagline><!-- 1 -->

  <id>tag:diveintomark.org,2003:3</id>
  <generator name="Movable Type">http://www.movabletype./org/?v=2.64</generator><!-- 2 name attribute -->
  <copyright>Copyright (c) 2003, Mark Pilgrim</copyright>

  <entry>
    <!-- required elements -->
    <title>Atom 0.2 snapshot</title>

    <link>http://diveintomark.org/2003/08/05/atom02</link>
    <id>tag:diveintomark.org,2003:3.2397</id>
    <issued>2003-08-05T08:29:29-04:00</issued>
    <modified>2003-08-05T18:30:02Z</modified><!-- 3 -->
    <!-- optional elements -->
    <created>2003-08-05T12:29:29Z</created>

    <summary>The Atom 0.2 snapshot is out.  Here are some sample feeds.</summary>
    <author>
      <name>Mark Pilgrim</name>
      <url>http://diveintomark.org/</url><!-- 4 -->
      <email>f8dy@example.com</email>
    </author>
    <contributor>

      <name>Sam Ruby</name>
      <url>http://intertwingly.net/blog/</url>
      <email>rubys@example.com</email>
    </contributor>
    <contributor>
      <name>Joe Gregorio</name>
      <url>http://bitworking.org/</url>

      <email>joe@example.com</email>
    </contributor>
    <content type="application/xhtml+xml" mode="xml" xml:lang="en-us"><!-- 5 xml:lang attribute -->
      <div xmlns="http://www.w3.org/1999/xhtml">
        <p>The Atom 0.2 snapshot is out ... [snip]</p>
      </div>
    </content>
  </entry>
</feed>

we could map the commented elements and attributes to the following nouns:

  1. tagline

  2. generator@name

  3. entry.modified

  4. entry.author.url

  5. entry.content@xml:lang

Note that there cannot be mappings for anything inside the <content> element, since that is the boundary of what our XML structure concerns itself with.

'.' was chosen as a separator for elements because it is readable and convenient, and we are unlikely to generate XML element names containing it.

'@' was chosen as a separator for attributes by analogy with XPath. '.' could be used instead, and this may prove clearer - the natural reading of 'generator@name' is almost exactly the opposite of the intention here.

This mapping does not consider extension elements or attributes in different namespaces. It is intended that search structures that do not need a vocabulary would be used in these cases.

Need we consider anything outside <entry>?

There is an open question: is there anything outside the <entry> element itself we might want to filter on? With the current XML structure, it seems unlikely, but this may change. If not, clearly a common subpart of the mapped nouns could be dropped (eg: the 'entry.' prefix in the above proposal; those not sharing it would not be accessible anyway).

Search structures

GET query

The GET interface effectively gives a very simple search structure, where a (small) set of filters is provided, all of which must be satisfied for an entry to appear in the search results.

Each filter takes the form of a boolean-valued binary operator from the search syntax, acting on (the value of) a noun from the vocabulary and a literal value. For instance, you might filter using the contains operator, the 'entry.author.name' noun and the literal string 'James' to find all entries written by someone called James.

The filters are expressed as GET query, '&'-separated, attribute=value pairs, with the attribute formed from the noun and operator, and the value being the value from the filter. The attribute name is constructed as 'search-' noun ['-' operator], with an omitted operator taken to be the equality operator.

For example, with a query API URI of /search?:

GET /search?search-modified-order-gt=15
GET /search?search-entry.title=My+entry
GET /search?search-entry.author.name-contains=James&search-entry.modified-gt=2003-08-05

Document-centric

The POST data could contain the skeleton of a <feed> element, but with the contents of the different searchable parts of the XML being filter expressions in the relevant search syntax.

An example, using XPath as the search syntax [derived from the [WWW]"Atom 0.2 snapshot":

<?xml version="1.0" encoding="utf-8"?>
<feed version="0.2" xmlns="http://purl.org/atom/ns#">
  <!-- Put no clause on the title -->
  <title />
  <link />
  <modified />

  <entry>

    <title />

    <author>
      <!-- Use the XPath 'contains'-function to find all entries where the author has the name 'John' -->
      <name>contains(., 'James')</name>
    </author>

    <!-- Get entries where 'modified' is newer than 5th august 2003 -->
    <modified>. &gt; 2003-08-05</modified>
  </entry>
</feed>

Any elements provided in the POSTed search data would be supplied for entries returned in the search. For example, the above search might return:

<?xml version="1.0" encoding="utf-8"?>
<feed version="0.2" xmlns="http://purl.org/atom/ns#search-results">
  <title>My Feed</title>
  <link>http://example.com/myfeed/</link>
  <modified>2003-08-05T12:29:29Z</modified>

  <entry>
    <title>Mumble mumble title</title>
    <author>
      <name>James Aylett</name>
    </author>
    <modified>2003-08-05T18:30:02Z</modified>
  </entry>

  <entry>
    <title>Another entry</title>
    <author>
      <name>James Aylett</name>
    </author>
    <modified>2003-08-07T12:17:31Z</modified>
  </entry>
</feed>

(A different namespaces needs to be used (which implies another Schema), because we may omit elements that the usual XML format declares as required.)

Query-centric

A simpler structure than the above would focus on the query in the POST data. It needs to give the query itself (in whatever search syntax is chosen), and also potentially which elements of the feed, entry and so forth to supply in the results. This is more compact than the document-centric structure, and has the advantage of grouping all filters into one place.

An example, using XQuery as the syntax:

<?xml version="1.0" encoding="utf-8"?>
<search-entries xmlns="http://purl.org/atom/ns#search-query-centric">
  <query>//entry[fn:contains(author/name, "James") and modified gt "2003-08-05"]</query>
</search-entries>

For this structure, it appears much harder to find a convenient way of asking for only some elements to appear in the result document. It might be best to restrict this structure to just the query itself, perhaps with a choice of result-by-reference (as with the current GET interface in the draft API) and result-inline (as our examples here work).

Search syntaxes

Simple GET

The simple GET interface is designed to work with the GET query structure, and so need only define its operators. The following are expected to be required:

XPath

XPath is a fairly natural syntax to consider for an XML-based format. An example was given above.

XQuery

XQuery is a fairly natural syntax to consider for an XML-represented database. An example was given above.

Discussion

[JamesAylett] I want to leave this all here to get comments, but my feelings at the moment are that for the vocabulary we should just consider within <entry>, use '.' to separate both attributes and elements, give access to everything for the GET interface, and start locking down metanouns. If necessary, we could come up with a way for the GET structure to complain that it doesn't recognise all your nouns (and return nothing). Given there isn't that much even in the maximal example of 0.2 on Mark's site, and I'm explicitly barring extensions from the vocabulary, this shouldn't be too onerous (but is really powerful).

I also think, after much consideration, that we really are going to need at least those three search structures, as they really seem to solve different types of problem.

[JoeGregorio] [WWW]Thoughts on finding entries to edit

[JamesAylett] Nice. Having the entryURI (ie a URI where the Pie XML representation is available) in the main feed document strikes me as a major step forward; advocating having it linked from the HTML version is even better. This might make search an unnecessary part of the editing facet of PieApi, so that search is a completely orthogonal facet.


CategoryApi