In researching how Atom and the FeedValidator should handle URI equivalence, I took a look at how language environments with built in URI classes implement equality methods.
testuri.java produces:
http://example.com/ http://example.com false HTTP://example.com/ http://example.com/ true http://example.com/ http://example.com:/ true http://example.com/ http://example.com:80/ false http://example.com/ http://Example.com/ true http://example.com/~smith/ http://example.com/%7Esmith/ false http://example.com/~smith/ http://example.com/%7esmith/ false http://example.com/%7Esmith/ http://example.com/%7esmith/ true http://example.com/%C3%87 http://example.com/C%CC%A7 false
testuri.cs produces:
http://example.com/ http://example.com True HTTP://example.com/ http://example.com/ True http://example.com/ http://example.com:/ True http://example.com/ http://example.com:80/ True http://example.com/ http://Example.com/ True http://example.com/~smith/ http://example.com/%7Esmith/ True http://example.com/~smith/ http://example.com/%7esmith/ True http://example.com/%7Esmith/ http://example.com/%7esmith/ True http://example.com/%C3%87 http://example.com/C%CC%A7 True
Update: testuri.pl produces:
http://example.com/ http://example.com 1 HTTP://example.com/ http://example.com/ 1 http://example.com/ http://example.com:/ 1 http://example.com/ http://example.com:80/ 1 http://example.com/ http://Example.com/ 1 http://example.com/~smith/ http://example.com/%7Esmith/ 1 http://example.com/~smith/ http://example.com/%7esmith/ 1 http://example.com/%7Esmith/ http://example.com/%7esmith/ 1 http://example.com/%C3%87 http://example.com/C%CC%A7 0
Java is totally borked. The first seven examples are straight from section 3.2.3 of RFC 2616; they should all return true.
Test 8 was recently discussed on atom-syntax, and should also return true, although this is not explicitly clear from reading RFC 2396bis.
There are a finite number of URI schemes. It should be possible to write a URI compare module/class that takes the quirks of each scheme into account.
Now that I've pretty much run out of useful features to implement for the Universal Feed Parser, maybe I'll work on this next.
In Java equals is very closely linked to hashcode.
If you encode before comparing in .equals, then you must encode before calculating your hashcode() for storage and retrieval in HashMap/HashTable/HashSet etc. That leaves you with a pretty slow hashcode() function, espeicially if you have to re-size the hashmap. So URI may “suck” for performance reasons. If you don’t encode in hashcode too then [link] goes into your hashset and [link] won’t get it out.
Note that this definitely means you never want URL to be a key in your hashmap! :) wanna take bets on what hashcode() does to create a hash value (I haven’t checked yet).