http://intertwingly.net/blog/1601.atom ../favicon.ico Sam Ruby It’s just data Sam Ruby rubys@intertwingly.net /blog/ 2008-04-27T07:24:21-04:00 tag:intertwingly.net,2004:1601 Fun with XPath

Last twenty weblog entries which:

2003-09-26T12:02:26-04:00
tag:intertwingly.net,2004:1601-1064613823 http://www.kuro5hin.org/user/Carnage4Life/diary 12-228-162-69.client.attbi.com form Dare Obasanjo Fun with XPath

Last 20 with a comment by me:

http://www.intertwingly.net/blog/?q=//atom:feed[contains(atom:entry/atom:author,%20'Obasanjo')]

Not bad. It seems your XPath engine doesn't support multiple boolean expressions in the predicate. I was tried the following query but kept getting 404s

http://www.intertwingly.net/blog/?q=//atom:feed[contains(atom:entry/atom:author,%20'Winer')%20and%20contains(atom:entry/atom:author,%20'Pilgrim')%20]

2003-09-26T13:03:00-04:00
tag:intertwingly.net,2004:1601-1064617192 http://icite.net/blog/ 205.141.209.19 form Jay Fienberg Fun with XPath Seeing that both //atom and //xhtml nodes are supported, how does your search engine handle those? Does it look at a single data set that includes both atom and xhtml versions of your posts, or does it route to two different data sets? 2003-09-26T13:59:52-04:00 tag:intertwingly.net,2004:1601-1064617861 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath
Dare, it seems that Mark Pilgrim simply signs his name as Mark.  A more reliable indicator would be the presence of "diveintomark" in the url, thus: posts containing comments by both Dave Winer and Mark Pilgrim.
2003-09-26T14:11:01-04:00
tag:intertwingly.net,2004:1601-1064621407 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath Jay, single dataset.  My atom version of my posts contain my xhtml version of my posts, inside the <atom:content> element. 2003-09-26T15:10:07-04:00 tag:intertwingly.net,2004:1601-1064628792 http://www.decafbad.com/blog 68.75.212.38 form l.m.orchard Fun with XPath

Wow, nice stuff.  You're doing this with a lots-of-little-files XML repository?

Keep doing this, and I might just have to clean up my blogging data and start playing with this.  :)

2003-09-26T17:13:12-04:00
tag:intertwingly.net,2004:1601-1064630973 http://dannyayers.com host89-204.pool80182.interbusiness.it form Danny Fun with XPath Very nice. You won't of course forget this is tying the relational semantics to a tree structure, and that has inherent limitations... 2003-09-26T17:49:33-04:00 tag:intertwingly.net,2004:1601-1064638890 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath

Les: yes, lots of little files.

Danny: care to identify a tangible limitation?  I like a good challenge...

2003-09-26T20:01:30-04:00
tag:intertwingly.net,2004:1601-1064646395 http://icite.net/blog/ adsl-63-201-94-189.dsl.snfc21.pacbell.net form Jay Fienberg Fun with XPath

Thanks for the info Sam.

To pick up you challenge to Danny about relational vs tree:

How about: show all the entries that have the same first word (i.e., without specifying what that word is).

This is a kind-of relational (recursive) join query.

But, in general, I bet that searching for matches on blog entries probably isn't really a case where the relational / tree structure limitations can be really explored.

The blog entries are, in one sense, essentially a single table (see my "table" syndication format view of my blog at: http://icite.net/blog/200309/really_tabular_synidcation.html ).

2003-09-26T22:06:35-04:00
tag:intertwingly.net,2004:1601-1064678442 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath

Jay, I may be misunderstanding what you are suggesting, but that sounds to me like something XSLT excels at.  In pseudo-code, what one can do with XSLT is:

foreach entry
  $id=id
  $word=substring-before(entry.content,' ')
  foreach preceding::entry(word=$word)
    print match($id,id)
2003-09-27T07:00:00-04:00
tag:intertwingly.net,2004:1601-1064690042 http://www.kuro5hin.org/user/Carnage4Life/diary tide103.microsoft.com form Dare Obasanjo Fun with XPath

Sam,
  I'm trying to figure out why your query is so complex why not

  for $e in collection("atom-files-directory")//atom:entry,
  $word = substring-before($e/atom:content/text(), ' ')
  where $word = $id
  return $e

My XQuery is a little rusty since I haven't kept up with the spec drafts but that should work. 

PS: You aren't accepting posts sent to your blog via the CommentAPI. Is this a bug on your end or mine?

2003-09-27T12:56:07-04:00
tag:intertwingly.net,2004:1601-1064702284 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath

Dare, I may not fully understand Jay's example, but he did indicate that a join was required.

P.S.  I just tried a few test posts via the Comment API, and they appeared to work.  Can you capture a wire trace?

2003-09-27T13:38:04-04:00
tag:intertwingly.net,2004:1601-1064711180 http://icite.net/blog/ adsl-63-201-94-189.dsl.snfc21.pacbell.net form Jay Fienberg Fun with XPath

Sam, your XSLT pseudo-code looks like it will work for what I was thinking it wouldn't work for, so I was wrong about this as being an example showing a limitation with a tree structure.

For my example, I was thinking of a query in SQL like:

select id from entries a join entries b where substr(a.entry,0,locate(a.entry,' ') = substr(b.entry,0,locate(b.entry,' ')

And I was thinking that this couldn't be expressed in a single XPath statement. But, SQL vs XPath is not the same issue as graph vs tree anyway.

2003-09-27T16:06:20-04:00
tag:intertwingly.net,2004:1601-1064808569 http://www.kuro5hin.org/user/Carnage4Life/diary 12-228-162-238.client.attbi.com form Dare Obasanjo Fun with XPath

OK, I see where I misunderstood his example. The XQuery should be

for $e in collection("atom-files-directory")//atom:entry,
  $word = substring-before($e/atom:content/text(), ' ')
  where

for $e2 collection("atom-files-directory")//atom:entry
  $word2 = substring-before($e2/atom:content/text(), ' ')
  where $word = $word2
  return true()

  return $e

2003-09-28T19:09:00-04:00
tag:intertwingly.net,2004:1601-1064845815 http://www.virtuelvis.com/quark/ 160.67.143.14 form Asbjørn Ulsberg Fun with XPath This is brilliant stuff, Sam. I see that expressions that haven't been executed (and therefore isn't cached) take some time. Do you have any thoughts on this and to the DoS issue of XPath searches? What would you do to prevent your site from being DoS-attacked with complex queries? 2003-09-29T05:30:15-04:00 tag:intertwingly.net,2004:1601-1064872825 http://www.intertwingly.net/blog/ rdu57-27-066.nc.rr.com form Sam Ruby Fun with XPath

The short answer is: if you try it, I'll block your ip address.  ;-)

The longer answer is: there are many ways to do a DoS against my site or any site.  As you note, I do have a cache, so I can easily easily put a per day cap on the number of unique queries I will serve (effectively disabling new unique queries for a day or so) enabling the rest of the site to be served.

2003-09-29T13:00:25-04:00
tag:intertwingly.net,2004:1601-1064878131 http://www.xmldatabases.org ip68-3-48-11.ph.ph.cox.net form Kimbro Staken Fun with XPath

The DOS issue is a little overstated. The example that was given for Syncato exploited a bug in Pathan that results in an infinite query execution time. This is fixed in a newer Pathan release that the site hasn't yet been upgraded to. Syncato also has a cache that would mitigate any repeat requests on the same query (assuming it's not exploiting a bug). Regardless it is always possible to DOS a site that generates content dynamically.

XPath may be slightly worse then previous tools, but this should not in any way dissuade anyone from exploring its potential. I can think of dozens of reasons why you "shouldn't" be doing this kind of thing, but I made an explicit decision to shove those aside and focus on the exploration of what power this kind of thing brings.

2003-09-29T14:28:51-04:00
tag:intertwingly.net,2004:1601-1064955089 chromium.sabren.com trackback Sam Ruby http://www.intertwingly.net/blog/1603.html Data Flow Data flow of comments to feeds, focusing on how indexing and caching work. Les Orchard types in this comment without needing to worry about formatting.  It it stored here in blosxom format as well formed XHTML.  The index page is regenerated with an up... 2003-09-30T11:51:29-04:00 tag:intertwingly.net,2004:1601-1064990943 http://www.crushingblow.com/ ip68-3-41-5.ph.ph.cox.net form Taylor House Fun with XPath
Paul Ford, of Ftrain.com has been using an XML-based content management system for years, although his is messy and not-so-dynamic, it performs much like Syncato.
2003-09-30T21:49:03-04:00
tag:intertwingly.net,2004:1601-1065026216 http://www.virtuelvis.com/quark/ 160.67.143.14 form Asbjørn Ulsberg Fun with XPath To the DoS-issue; Maybe it's just to keep the execution timeout for XPath queries low. Then, heavy queries will be terminated (and not take up much CPU time), and fast queries will be executed and preferably cached afterwards. 2003-10-01T07:36:56-04:00 tag:intertwingly.net,2004:1601-1065659244 chromium.sabren.com trackback Sam Ruby http://www.intertwingly.net/blog/1606.html Atom2Yaml The goal is to support these queries.  It will be interesting to see how _why handles the second one given that he is currently cheating on the content element.  ;-)... 2003-10-08T15:27:24-04:00 tag:intertwingly.net,2004:1601-1066012993 pingback Dare Obasanjo aka Carnage4Life - Top 3 Features I Want To Add To RSS Bandit Top 3 Features I Want To Add To RSS Bandit Early on when I started working on RSS Bandit I use to take my cues for feature from other .NET aggregators like Syndirella and SharpReader. However in the past couple of months I've realized that RSS Bandit is more featureful and provides more ... 2003-10-12T17:43:13-04:00 tag:intertwingly.net,2004:1601-1066249852 http://blog.bitflux.ch/?p=1398&c=1 excerpt Bitflux Blog Render Services; Enhanced XHTML Recently I had a business meeting where someone liked very much the SlideML presentation format - they struggle with Powerpoint. As I also showed them the KAYWA Blogsoftware, the question came up, if one could write SlideML via the Bloginterface. I... 2003-10-15T11:30:00-04:00 tag:intertwingly.net,2004:1601-1066433159 66.70.189.63 trackback Raw http://dannyayers.com/archives/001967.html Beyond XPath Sam Ruby has some RDF questions. Typically I'm too knackeredto give a proper answer right now. But if any RDF...... 2003-10-17T14:25:59-04:00 tag:intertwingly.net,2004:1601-1066728718 pingback Simon Willison: Using XPath to mine XHTML Using XPath to mine XHTML This morning, I finally decided to install libxml2 and see what all the fuss was about, in particular with respect to XPath. What followed is best described as an enlightening experience. XPath is a beautifully elegant way of adressing "nodes" ... 2003-10-21T00:31:58-04:00 tag:intertwingly.net,2004:1601-1066746973 red.gradwell.net trackback magpiebrain http://www.magpiebrain.com/archives/000106.html What XPath is, and why its a Good Thing For a while now some colleagues have been raving about XPath, but I must admit its something I’ve never really looked into. In a brief post Simon has managed to not only explain what XPath is, but also why its...... 2003-10-21T05:36:13-04:00 tag:intertwingly.net,2004:1601-1066752745 66.70.189.63 trackback Raw http://dannyayers.com/archives/001981.html Content Management and Data Mining with RDF, XPath, XHTML and the rest... Simon Willinson has a good post about using XPath to mine XHTML. In it he says "XHTML is an ideal...... 2003-10-21T07:12:25-04:00 tag:intertwingly.net,2004:1601-1074263413 http://dealmeida.net/blosxom/en/Links/links_2004-01-15 excerpt dealmeida.net 2004-01-15 links PGP Signing FOAF Files XHTML 1.0 Symbol Character References XHTML 1.0 Latin-1 Character References Languages/xml/xpath XHTML 1.0 Special Character References Foaf-check XHTML Web Design for Beginners - Part 2 Fun with XPath... 2004-01-16T04:30:13-05:00 tag:intertwingly.net,2004:1601-1092903308 http://marc.blogs.it/archives/2004/08/ming_the_mechan.html excerpt Marc's Voice Ming the Mechanic on Micro-Content..... Flemming Funch raps it out. My reply below.... "Microcontent" seems to be one of the buzzwords now. So, what is that, really?Jakob Nielsen, interface guru, used it (first?) in 1998 about stuff like titles, headlines and subject lines. The idea being... 2004-08-18T23:15:08-04:00 tag:intertwingly.net,2004:1601-1138940967 http://philwilson.org/blog/ 87.115.233.150.bbplus.dyn.plus.net form Phil Wilson Fun with XPath So, apologies to your Apache log, but I see you still support the XHTML queries, but the atom-namespace sample queries you provide are now (wrongly, in some cases, I think) broken. Is this particular feature of your blog no longer working, or is my XPath just too poor? 2006-02-02T18:29:27-05:00