It’s just data

URI Equivalence

In researching how Atom and the FeedValidator should handle URI equivalence, I took a look at how language environments with built in URI classes implement equality methods.

testuri.java produces:

 http://example.com/          http://example.com           false
 HTTP://example.com/          http://example.com/          true
 http://example.com/          http://example.com:/         true
 http://example.com/          http://example.com:80/       false
 http://example.com/          http://Example.com/          true
 http://example.com/~smith/   http://example.com/%7Esmith/ false
 http://example.com/~smith/   http://example.com/%7esmith/ false
 http://example.com/%7Esmith/ http://example.com/%7esmith/ true
 http://example.com/%C3%87    http://example.com/C%CC%A7   false

testuri.cs produces:

 http://example.com/          http://example.com           True
 HTTP://example.com/          http://example.com/          True
 http://example.com/          http://example.com:/         True
 http://example.com/          http://example.com:80/       True
 http://example.com/          http://Example.com/          True
 http://example.com/~smith/   http://example.com/%7Esmith/ True
 http://example.com/~smith/   http://example.com/%7esmith/ True
 http://example.com/%7Esmith/ http://example.com/%7esmith/ True
 http://example.com/%C3%87    http://example.com/C%CC%A7   True

Update: testuri.pl produces:

 http://example.com/          http://example.com           1
 HTTP://example.com/          http://example.com/          1
 http://example.com/          http://example.com:/         1
 http://example.com/          http://example.com:80/       1
 http://example.com/          http://Example.com/          1
 http://example.com/~smith/   http://example.com/%7Esmith/ 1
 http://example.com/~smith/   http://example.com/%7esmith/ 1
 http://example.com/%7Esmith/ http://example.com/%7esmith/ 1
 http://example.com/%C3%87    http://example.com/C%CC%A7   0

Java is totally borked.  The first seven examples are straight from section 3.2.3 of RFC 2616; they should all return true.

Test 8 was recently discussed on atom-syntax, and should also return true, although this is not explicitly clear from reading RFC 2396bis.

Posted by Mark at

Gack.  Just hypothetically, if someone wanted to write carefully-done URI comparator, I suppose the cleanest thing would be to subclass URI... maybe not.  Since it doesn't really have any exposed fields that seem useful, you might just as well do a URIEquivalenceChecker class with a single static method taking two URIs and some way of expressing how hard you want to try...

Posted by Tim Bray at

There are a finite number of URI schemes.  It should be possible to write a URI compare module/class that takes the quirks of each scheme into account.

Now that I've pretty much run out of useful features to implement for the Universal Feed Parser, maybe I'll work on this next.

Posted by Mark at

The "irc:" scheme does not appear in the assigned list. It amazed me, I've always used it in mozilla. It can be found in the wild.

Posted by Santiago Gala at

Perhaps I should have qualified: "registered" URI schemes.  The irc:// scheme has had several drafts over the years, but never made it to final RFC status.

Posted by Mark at

A few more can be found here which still does not include feed: nor tag: nor urn:uuid, all of which can be found out in the wild.

Makes me wonder if the registration process is broken to the point where the concept of registered schemes is increasingly becoming less and less relevant.

Posted by Sam Ruby at

Sam Ruby: URI Equivalence

Sam Ruby: URI Equivalence...

Excerpt from del.icio.us/tag/web at

Bookmarks

Some interesting recent reads: Bertrand on rhino shell Stefano on Semantic web specs Sam with some tests on URI equivalency (and more by clicking through the comments...) Observations from Paul Graham via Brian...... [more]

Trackback from Marc, himself, his blogs, and you reading them.

at

Sam Ruby: In researching how Atom and the FeedValidator should handle URI equivalence, I took a look at how language environments with built in URI classes implement equality methods. Randy: Question? How would the Python URI type or class do? Is it...

Excerpt from RSS at

More URI Equivalence

Mark: Java is totally borked.  Tim Bray: Gack.  Just hypothetically, if someone wanted to write carefully-done URI comparator (in Java). Randy: Bookmarked by someone....

Excerpt from RSS at

Sticky, it’s not just data.

I think I underestimated how sticky Moveable Type is. Vendors love things that make their product sticky. If developers really appreciated this software products would be even more sticky. Instead developers hate sticky; they call it things like...

Excerpt from Ascription is an Anathema to any Enthusiasm at

Preserving Identity

Mark Pilgrim's Identifying Atom article indirectly makes three assertions about what would be ideal in a syndication protocol with respect to ids, which I will paraphrase thus:  IDs are mandatory the semantics on how/when IDs are to be generated and wh... [more]

Trackback from Sam Ruby

at

Preserving Identity

Preserving Identity. Mark Pilgrim"s Identifying Atom article indirectly makes three assertions about what would be ideal in a syndication protocol with respect to ids, which I will paraphrase thus: IDs are mandatory the semantics on how/when IDs are...

Excerpt from Tralla.org : Search : Debian at

Inspired by Jeremy Smith, I've added a perl example.

Posted by Sam Ruby at

GentleCMS Development Log: Part 4

I’ve been up to no good again. I keep changing my directory structure around. Nothing feels quite right, but each time I change it, it seems a bit better than the last time. In any case, my svn repository for this project is now something of a...

Excerpt from Sporkmonger at

GentleCMS Development Log: Part 4

I’ve been up to no good again. I keep changing my directory structure around. Nothing feels quite right, but each time I change it, it seems a bit better than the last time. In any case, my svn repository for this project is now something of a...

Excerpt from Sporkmonger Blog at

In Java  equals  is very closely linked to hashcode.

If you encode before comparing in .equals, then you must encode before calculating your hashcode() for storage and retrieval in HashMap/HashTable/HashSet etc. That leaves you with a pretty slow hashcode() function, espeicially if you have to re-size the hashmap. So URI may “suck” for performance reasons. If you don’t encode in hashcode too then  [link] goes into your hashset and [link] won’t get it out.

Note that this definitely means you never want URL to be a key in your hashmap! :) wanna take bets on what hashcode() does to create a hash value (I haven’t checked yet).

Posted by anonymous at

Related discussion

Posted by Sam Ruby at

GentleCMS Development Log: Part 4

I’ve been up to no good again. I keep changing my directory structure around. Nothing feels quite right, but each time I change it, it seems a bit better than the last time. In any case, my svn repository for this project is now something of a...

Excerpt from gentlecms on SWiK at

Bogtha on Mr. Gosling - why did you make URL equals suck?!?

Wow. That’s monumentally bad for any library, let alone the standard library. Whoever thought that would be a good idea? > Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck. Sorry,...

Excerpt from programming: what's new online at

Legolas-the-elf on Ask Reddit: To counter todays negativity: What's the best code you've ever read, or the best programmer you've worked with?

Some of the most bizarre behaviour I’ve seen is in the JDK. `java.net.URL`, when comparing for equality, resolves hostnames and considers two URLs using different hostnames to be equal if they are using the same IP address. Unless you don’t happen...

Excerpt from programming at

Related: [link]

Posted by uo at

Add your comment