It’s just data

Producing and consuming XML content

Jon Udell: The kinds of searches shown here are fun, up to a point, But the novelty quickly wears off because the only XML available for searching is metadata (channel titles, item titles, dates), not content. Here's where the other shoe drops. I've long dreamed of using RSS to produce and consume XML content. We're so close.

If I want to search content, I want full text search. Searching XHTML content isn't of much use to me given that there isn't enough useful semantics embedded in XHTML tags for me to primarily perform searches based on them.

The majority of the searches I want done are based on content, such as trying to find all references to the phrase "CommentAPI" or "RSS Bandit" in all the feeds I'm currently subscribed to.

So far the [X]HTML elements people have gotten much mileage from using as the basis of search are links which have been put to good use by both Google and SharpReader. It is likely that <cite> elements may also prove useful, only time will tell. As for the rest of [X]HTML it doesn't help much if at all.

The "XML will make smarter search engines" was the most bogus claim about XML made during the hype years which I hoped had died the ignoble death it deserved.

Posted by Dare Obasanjo at

Dare: <a href=""> tags are of extreme value to search engines *today*.  When Google spiders my rss 0.91 feed, it doesn't find any.  When it spiders my rss 2.0 feed, it does.

Posted by Sam Ruby at

And I said

So far the [X]HTML elements people have gotten much mileage from using as the basis of search are links which have been put to good use by both Google and SharpReader.



Posted by Dare Obasanjo at

Jim McGee asked a question, 'bout a year or so ago, about how to encourage people to use weblogs as knowledge-logs, or k-logs.

Lotta people were talking up the advantages of why OTHER people should "get it" and start using k-logs.  But nobody seemed to have a clear reason WHY or HOW...  Mebbe same applies to XML-search.

Seems to me that both the why and how can be provided by the READER, better than the author.  Full-text searches suffer, frequently, by too many extraneous hits.  Wastes a lotta processor too, to show me 2 bazillion hits for the keywords I used...  But if I could enter keywords and multiple categories to file articles/blogs under...?  In addition to categorizing by author and keywords the author provides, like subject-matter, privacy, importance, and reply needed/time urgency and such.. amongst other standard aids to the reader...??

(Some-a these would probably be more beneficial if/when RSS becomes used to send email.)
 

And if this could help prioritize, like Ben Hammersley suggested He'd like-ta see so any incoming blogs on Iraq could be automatically sent to "File 13" (or marked as skipped).. Well, that'd give me the incentive to try to library-ize things, so that *I* could re-reference them.  Help me control the flow of all this short-term knowledge comin' at me from all directions.  Help me turn it into a long-term advantage.

Posted by jt at

SharpReader certainly knows how to decode content:encoded.  I have my doubts that Google does likewise.  Dare, do you have any evidence to the contrary?

Posted by Sam Ruby at

Sam,
I meant [X]HTML links in general not just embedded in RSS.

Google's search is primarily based on understanding links in [X]HTML documents similarly SharpReader's threaded view of related items is based on understanding [X]HTML links embedded in RSS documents.

Posted by Dare Obasanjo at

[p="codefragment"]???  Someone needs to teach Jon about the [code] element in (X)HTML.

Posted by Mark at

> [p="codefragment"]???  Someone
> needs to teach Jon about the [code]
> element in (X)HTML.

Good point, forgot all about that, thanks.

In which case, how about:

[code class="java|python|c#"]

- Jon

Posted by Jon Udell at

It seems to me that doing what
Jon is, which is tags with specific classes is a far softer (and
easier) way of bootstrapping the semantic wen and things like that.

&nbsp;It might we worth using namespaces inside the class tags though,
such as [code class="lang:python"] where lang could be a namespace
defined in one of the link elements in a document.

I wish the mozilla html editor would allow associating the class of a
tag with a tag, ie, for eg, instead of heading 1, I'd like to be able
to say chapter or section, and that would get translated in the source
to [h1 class="chapter"] instead.

Take it a bit further and one could write a mozilla overlay which does
interesting stuff with classes using CSS and XUL. For example, when it
sees [div class="person", span class="name" id="Rahul"..] it could
create a popup a context menu which allows email/IM/latex-letter etc to
that person depending upon local capabilities, as well as the ability
to add to the local database/address-book.

I'm writing this post in mozilla's editing component, BTW, activated using the composite add-on to mozilla, available here: http://vietdev.sourceforge.net/vinamozie/mo_installer.php , so all this would be particularly useful!

Posted by Rahul Dave at

I didn't understand but about half what Rahul was saying, but have been told I have good instincts, which are that this would be more-than useful.

Didn't get any takers on my comment, but fergot to mention one-a the advantages I could see from this approach.

Identifying, by author/category/keywords, which feeds You want to pull ASAP, which to allow pushed on as-resources-available basis, and which to ignore.

Posted by jt at

Add your comment