It’s just data

Show Me...

Danny Ayers: So how would you do this: “Show me all the posts tagged ‘sun’ in the last month by the bloggers aggregated at Planet Intertwingy"

It occurred to me that I had all the parts readily available to mock up something for this by applying XPath to the canonicalized Atom entries that Venus caches.

To answer Danny’s question, the following query turns up a number of results:

//atom:category[@term='Sun']

Which I guess provides an answer of sorts to Tim’s question, namely "any scheme you feel like”.

You can also use this to find posts which contain “♫” characters in titles (Tim’s not the only one!), or use MathML (Hi Jacques!).

I’m not sure if I am going to continue with this experiment, but while it is up, you are welcome to play with it.  Note: this implementation currently makes ample use of features like XPath so it is unlikely to work on IE just yet; in fact it may not work on any browser but Mozilla.

Most of the fun in creating this had little to do with the server side query logic, but in getting the client side AJAX aspects working.  For client to server, I settled on application/​x-www-form-urlencoded, and discovered that escape doesn’t handle non-ASCII characters and that encodeURIComponent should be used instead.  For server to client, I use XML, which turns out to be a quite reasonable way to represent a result set consisting of Atom 1.0 entries.  As the text constructs have already been canonicalized to XHTML, importNode can be directly used even if the content contains MathML and/or SVG.

If I do continue, I’ll likely add alternative query syntaxes (like simple word matches or regexps), and add embellishments (like a sidebar that sorts sources by name and dynamically updates counters of posts scanned and posts found).

But even with a query syntax that only a geek could love, I a have a renewed enthusiasm for the serendipity that aggressive normalization enables.


Can you expose:

function ns(prefix) {
  if (prefix == 'atom') return 'http://www.w3.org/2005/Atom';
  if (prefix == 'xhtml') return 'http://www.w3.org/1999/xhtml';
  return null;
}

To the input page somehow (and show what prefixes are bound to which namespaces), otherwise you can’t “find posts which... use MathML”, no?

I have, er...:

_imp.xml.xpathNamespaces.gecko = function (wrapper, namespacemap)
{
	wrapper._namespaceresolver = (function (prefix)
		{
			return namespacemap[prefix] || null;
		});
}

Pass a dict of prefix, namespace pairs - which the wrapper makes a func for document.evaluate with.

Posted by anon at

Can you expose: ... To the input page somehow (and show what prefixes are bound to which namespaces), otherwise you can’t “find posts which... use MathML”, no?

Thaat’s client side, so it only needs to support the prefixes that I happen to use in my hardcoded XPath expressions.  More relevant to you is the following:

ctxt.xpathRegisterNs('atom','http://www.w3.org/2005/Atom')
ctxt.xpathRegisterNs('xhtml','http://www.w3.org/1999/xhtml')
ctxt.xpathRegisterNs('mathml','http://www.w3.org/1998/Math/MathML')
ctxt.xpathRegisterNs('svg','http://www.w3.org/2000/svg')
ctxt.xpathRegisterNs('xlink','http://www.w3.org/1999/xlink/')
ctxt.xpathRegisterNs('planet','http://planet.intertwingly.net/')

Which means that one can find a list of posts which include MathML with the following query:

//mathml:*

Yes, in the fullness of time, making that list something that you can discover by means other than perusing the source code would be a good thing to do.

Posted by Sam Ruby at

Your xpath() function should call evaluate() on the node’s owner document. Fx trunk throws a WRONG_DOCUMENT_ERROR since you try to use the page’s document to evaluate an XHR node.

Posted by Robert Sayre at

Your xpath() function should call evaluate() on the node’s owner document.

Fixed.  Thanks!

Posted by Sam Ruby at

Ah.. er.. yeah.. brain rot. In browser-mode at the moment and somehow mentally morphed the js into Where The Problem Was. Actually reading the code would have shown it was a server-side-on-all-entries thing not client-side-on-current-feed thing.

Posted by Anon at

Sam, regarding the client ⇔ server communication, why are you using an HTTP POST to fetch the feed, and why are you using a custom wrapper element?  The service seems to respond just fine to GET in that I can link you directly to the posts answering Danny’s question.  Likewise, I don’t see the need for a custom wrapper element – you’re essentially returning an Atom feed so why not return one?  While you’re at it, why not build this functionality directly into Venus such that a user can query planet and get back a content-negotiated representation (XHTML or Atom) of the feed?  It’d allow for ad-hoc feed customization and let users sift for what they want to read from all of the sources on the planet.  Want only posts mentioning OpenID?  No problem.

Posted by Justin at

why are you using an HTTP POST to fetch the feed

Mainly because at this early point I am concerned about whether this will be performant and scale, and until I am sure, I want to discourage links.  Not that secrets last very long on the Internet.  :-)

At the moment, I’m parsing each entry individually, and applying the XPath expression to each document.  I may find that I get considerable speedup by concatenating the files in groups of 10 or 100, parsing the batch, and then selecting from that document the entries which match the supplied expression.

I’d also like to look into whether it makes sense to build in  support Feed Paging and Archiving.

and why are you using a custom wrapper element?

If it becomes a full feed, I will want to make sure that it is valid (has an id, etc).  That shouldn’t be a problem.  In fact, the client is already designed in a way that it doesn’t care what the wrapper element is.

The service seems to respond just fine to GET

At the moment, but if that becomes an issue, I will wire that off.  Conversely, convince me of what it will enable, and I may invest more time into it.

Better yet, feel free to deploy this and experiment with it on your server.  All my code, and even my subscription lists, are there for the taking...

why not build this functionality directly into Venus

This likely will be included into Venus at some point, particularly if it matures to the point where others seem interested in deploying it.

Posted by Sam Ruby at

(’Man’s name is Ayers, not Ayres. (Rob/Danny crossover?))

Posted by Aristotle Pagaltzis at

Ayers.

Posted by Phil Wilson at

Man’s name is Ayers, not Ayres

Fixed.  Thanks!

Posted by Sam Ruby at

Danny Ayres: So how would you do this: “Show me all the posts tagged ‘sun’ in the last month by the bloggers aggregated at Planet Intertwingy" (Read on Source)...

Excerpt from Megite Scobleizer News: What's Happening Right Now at

Hmmm. Well, for some reason, the MathML seems particularly screwed up in this application (try, e.g., this query and compare with the MathML in the original).

It seems to be a font issue, and it’s not (simply) a side-effect of injecting MathML nodes into the DOM. That seems to work fine.

[There’s an unrelated cosmetic issue which could be fixed with a

.numberedEq {float:right}

in your stylesheet. Entirely my fault.]

Posted by Jacques Distler at

Shown! (...now about that 'RDF Tax'...)

Me : So how would you do this: "Show me all the posts tagged ‘sun’ in the last month by the bloggers aggregated at Planet Intertwingy"? Sam Ruby : //atom:category[@term='Sun'] The background is a couple of days ago Tim Bray proposed a URN scheme for...

Excerpt from Planet RDF at

Searchable Planet

It occurred to me that supporting XPath addresses the 20% side of the 80/20 rule, but that is easy enough to rectify: here is a more general planet intertwingly query page. It needs a lit... [more]

Trackback from Sam Ruby

at

Add your comment