Sam Ruby

3 + 1 = 2

2007-12-28T09:36:00-08:00

I’ve got portions of HTML5lib working on Ruby 1.9, enough to pass Mars's unit tests. My initial reaction to Ruby 1.9’s support isn’t favorable. I definitely like Python 3K's Unicode support better. This feels closer to Python 2.5. In fact, I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

The problem is one that is all to familiar to Python programmers. You can have a fully unit tested library and have somebody pass you a bad string, and you will fall over. An example that fails with Ruby 1.9:

[0x2639].pack('U') + "\u2639"

The error that is produced is ArgumentError: character encodings differ. The left hand side specifies packing as UTF-8. The right hand side is expressed as Unicode, which Ruby represents as #. The problem is that the left hand side is actually stored as # which is a misnomer. In many ways this mirror’s Python 2.x’s vs except that with Ruby 1.9 both Strings are the same type.

Ruby 1.9 both mitigates and compounds the problem by providing a number of implicit conversions. Sometimes. Take a look at this code which produces this output. Specifically, look at rows 2 and 4, where two Strings, of the same type, encoding, length, and value produce different results when concatenated with UTF-8 strings. This type of magic destroys any confidence I have in unit testing as a viable strategy.

Update: no magic, just a bug.

My preference would be that # be abolished, in favor of # and a separate Bytes class. Generally, programmers would only see objects of class Bytes if they do “binary” file I/O, explicitly create constants of that type, or invoke methods such as String#bytes.

Other suggestions:

Array#pack('U') should behave like .map {|n| n.chr('UTF-8')}.join
If Ruby is going to support the specification of the default encoding on the command line, it should support Locale environment variables too.
If REXML is going to remain in the core libraries for Ruby, is should have a thorough audit. As XML is defined in terms of Unicode, REXML should never return binary strings. It also needs to be checked to prevent things like this from showing through:
```
rexml/element.rb:555: warning: Hash#index is deprecated; use Hash#key
```
Frankly, I’m a bit concerned that REXML is essentially unmaintained at this point: the mailing list is unresponsive, and bug reports appear to be addressed sporadically and new releases all too often seem to produce a regressions.

3 + 1 = 2

2007-12-28T11:01:27-08:00

Sam,

It sounds like your complaint is with Array.pack and the rexml library, not with all of Unicode in Ruby 1.9.

Given that the point of Array.pack is to serialize data into byte strings, I think its behavior is probably correct as it is. Admitedly confusing, though. A documentation clarification is probably in order. (Though pack() has always been a confusing method!)

Instead of using pack to convert Unicode codepoints to strings, try the Integer#chr method, with the desired encoding as an argument. (Your comment system won’t allow me to enter an example: it must think that I’m embedding JS or something).

I don’t know anything about the rexml library. But the 1.9.0 is not really expected to be stable yet, and I suspect that there are a number of libraries that haven’t been carefully ported yet.

Like so much of Ruby, I think you’ve got to give the Unicode support a chance to grow on you. I don’t understand why Matz made some of the choices he did, but they seem to work okay. Keep in mind, too, that the goal was not just to support Unicode but also to support Japanese encodings as well. So some of the design decisions might make a lot more sense to programmers who have to work with SJIS and EUC every day.

Finally, Ruby does inherit the default external encoding from the locale if you don’t specify an encoding with -K, -E or --encoding. This is the encoding assumed when you read from a file and do not specify a different encoding. (It is not used when you write to a file or read or write from a socket or pipe, however.) It respects the standard LC_CTYPE, LC_ALL, and LANG variables. Encoding.default_external returns the value. Encoding.locale_encoding didn’t make it into 1.9.0, but it is in the current sources and returns the default encoding for the locale even if -K, -E, or --encoding is specified.

(I attempt to explain all this in The Ruby Programming Language which should be in bookstores in about a month. I’m making the last-minute changes today.)

David Flanagan

3 + 1 = 2

2007-12-28T11:16:46-08:00

I clearly am aware of Fixnum#char, as evidenced by my first example.

For a pack-free example of inconsistent behavior, compare test1.rb with test2.rb. The character encodings only differ when UTF-8 is explicitly specified???!!!

Respecting LANG is good news for data files. Based on the example above, I take it that it doesn’t work for program files. Sorry for being unclear, that’s what I was referring to.

A quote from Sam Ruby

2007-12-28T11:45:36-08:00

I definitely like Python 3K’s Unicode support better [...] In fact, I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”. The problem is one that is all to familiar to Python programmers. You can have a fully unit tested...

3 + 1 = 2

2007-12-28T11:47:41-08:00

Sam,

Sorry that I didn’t read your post more carefully to see that you were already using chr. Given that chr exists to convert codepoints to characters, pack seems like a hacky way to attempt the same thing.

If a string literal contains a Unicode \u escape, then it will have utf-8 encoding.

Otherwise, if a string literal only has 7-bit ASCII characters, then it will have ASCII-8BIT encoding--essentially the legacy encoding from Ruby 1.8

Strings that are not 7-bit clean take their encoding from the source encoding of the file. The source encoding is specified with the coding comment you have in your test1.rb. Files that do not have a coding comment like that take their source encoding from the -K, -E, or --encoding command-line option. And if none of those are specified, then they are assumed to be ASCII-encoded. So if you run test2.rb with -Ku it ought to work the same as test1.rb.

The fact that the meaning of a string literal is dependent on the source encoding means that it is really important to start your Ruby programs with a coding comment. And it also helps to explain why the source encoding of a file is not derived from the locale--changing the locale could break the program.

Does this clarify anything? I’m not sure whether I’m actually addressing your point here or not.

3 + 1 = 2

2007-12-28T12:21:14-08:00

Sam,

I was wrong in my first comment about Encoding.locale_encoding. That is a newly-added internal method, not exposed by the API. You can use Encoding.locale_charmap to obtain the encoding name (as a string) for the current locale, if you need, for some reason, to distinguish it from Encoding.default_external.

David

3 + 1 = 2

2007-12-28T12:26:32-08:00

The fact that the meaning of a string literal is dependent on the source encoding means that it is really important to start your Ruby programs with a coding comment.

Did you actually try test1.rb? That code throws an exception if the coding comment is present, and works when it is not present. It took me quite a while to figure out why REXML (which uses pack by the way) worked when the exact same code copied into my source file (which uses the recommended coding comment) did not.

I am still at a loss why the second row and fourth row differ (but, again, only if the really important coding comment is present). And if the coding comment is not present, you get a completely different set of results.

One thing I like about Python and Ruby is that they are both approachable. But for the life of me, Ruby 1.9’s behavior in this area is virtually unpredictable.

A quote from Sam Ruby

2007-12-28T12:45:58-08:00

3 + 1 = 2

2007-12-28T12:55:25-08:00

Related issue: Astral Plane Characters and Ruby 1.9.

Sam Ruby: 3 + 1 = 2

2007-12-28T18:15:20-08:00

Sam says: “My initial reaction to Ruby 1.9’s support isn’t favorable.”...

3 + 1 = 2: I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

2007-12-29T04:15:17-08:00

[link] [more]...

David Flanagan Replaced By Soulless Husk

2007-12-29T04:15:22-08:00

As far as I can tell David Flanagan’s book on JavaScript is the only one worth that’s worth a damn , and in the days before Prototype it was absolutely crucial if you wanted to do anything interesting with JavaScript at all, so I hate to be a douche...

Ruby 1.9.0 - zpackané vydání?

2007-12-29T17:15:20-08:00

Během letošních Vánoc vyšlo Ruby 1.9.0 – nová “velká” verze jazyka Ruby. Vydání bylo dlouho očekáváno, jelikož současná řada Ruby 1.8 je na světě už poměrně dlouho (Ruby 1.8.0 vyšlo v srpnu 2003) a trpí nezanedbatelnými problémy –...

Ruby 1.9.0 - zpackané vydání?

2007-12-31T05:46:43-08:00

3 + 1 = 2: I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

2008-06-09T00:48:56-07:00

[link] [comments]...

3 + 1 = 2

2020-05-27T22:31:28-07:00

This may help you to determine deductible expenses and examine your expected tax the liability. 토토사이트

3 + 1 = 2

2020-05-28T22:46:50-07:00

Gives you the best site address I know there alone you’ll discover how simple it is. 안전놀이터

3 + 1 = 2

2020-05-29T08:57:39-07:00

We can help you with any sewerage or drainage problems at your property and we can identify and solve any safety issues with our expert knowledge and experience. Cardiff Plumbers

3 + 1 = 2

2020-05-29T22:44:54-07:00

I wear t have sufficient energy right now to completely read your site yet I have bookmarked it and furthermore include your RSS channels. I will return in a day or two. a debt of gratitude is in order for an extraordinary site. 우리카지노

3 + 1 = 2

2020-05-30T00:33:09-07:00

From your initial inquiry right up to when the job is completed, we provide excellent customer care. At RM Plumbing & Electrical we care about our reputation to give our customers 100% satisfaction ensuring that our electricians in Cardiff turn up on time every time and give our customers the best customer experience possible. RM Plumbing & Electrical’s Cardiff electricians are reliable and efficient and will be happy to help anyone looking for an electrical engineer in Cardiff or the surrounding areas. Electricians Cardiff

3 + 1 = 2

2020-05-31T02:45:34-07:00

Your choices of sanitary ware, tiles, storage and other top brand fitments can be sourced and approved with samples provided if wanted. A comprehensively assembled quotation for work, fittings and fixtures will be provided, along with an estimated timescale for the scope of the work. Bathroom installations Cardiff

3 + 1 = 2

2020-06-02T05:12:33-07:00

korea casino game blog Latest Casino and Gaming Industry News 바카라

3 + 1 = 2

2020-06-05T04:39:21-07:00

It’s late discovering this demonstration. At any rate, it’s a thing to be acquainted with that there are such occasions exist. I concur with your Blog and I will have returned to review it more later on so please keep up your demonstration. 토토사이트

3 + 1 = 2

2020-06-09T01:13:27-07:00

Truly inspired! Everything is extremely open and clear illumination of issues. It contains genuinely actualities. Your site is exceptionally profitable. A debt of gratitude is in order for sharing. 토토사이트

3 + 1 = 2

2020-06-09T06:30:11-07:00

3 + 1 = 2

2020-06-11T10:18:12-07:00

If you’re looking at getting a quote from a time served company, then the chances are you’ll receive a realistic guarantee based on the contractors previous work. Established contractors will most likely have a large portfolio of jobs that date back a while, so they will know how long their installations are likely to last. Resin Driveways

3 + 1 = 2

2020-06-14T06:59:18-07:00

An obligation of appreciation is all together for the better than average blog. It was amazingly useful for me. I m playful I found this blog. Thankful to you for offering to us,I too reliably increase some new helpful learning from your post. 메이저사이트

3 + 1 = 2

2020-06-15T03:44:48-07:00

In reality I read it yesterday however I had a few contemplations about it and today I needed to peruse it again in light of the fact that it is extremely elegantly composed. water trucks and flatbeds

3 + 1 = 2

2020-06-18T03:46:45-07:00

우리카지노 계열 더킹카지노 의 바카라사이트 및 메이저바카라 정보를 카지노시티 에서 확인하세요. 믿을 수 있는 카지노사이트 를 추천해드리고 있습니다 카지노사이트

3 + 1 = 2

2020-06-22T09:49:41-07:00

This is exceptionally noteworthy, but then essential towards simply click this special backlink: login sbobet

3 + 1 = 2

2020-06-23T20:39:49-07:00

해운대고구려, 부산고구려, 해운대룸싸롱 1등 달마대표 부산고구려