It’s just data

Ruby 1.9 Strings — Updated

My confusion from yesterday was due to a bug, which was promptly fixed — test case, fix.

Now that I understand what is intended, the situation is a lot clearer.  In Python 3.0, there are two types of strings, Bytes and Unicode, and the determination of the type is static.  With Ruby 1.9, there is one type of string, and the associated encoding is mutable.  The internal state of a given sequence of bytes with respect to the current encoding is: UNKNOWN, 7BIT, VALID, and BROKENUNKNOWN is a mechanism to delay the binding, and the combination of the bug and the delayed binding made the situation confusing as correctness of the result produced depended on the order of the operations performed.

The bug affected gsub!, but not sub, sub! or gsub.  With the released 1.9.0 version of Ruby, gsub! the state of the resulting string was not updated.  Oops.  Now that that is corrected, everything works as expected, for some values of expected.  Things I was not previously aware of:

The net result of all this is that any sequence of operations that produce a runtime exception in Ruby 1.9 would also produce a runtime exception in Python 3.0.  Some use cases that are entirely safe will not produce an exception in Ruby 1.9 when they would in Python 3.0.  Such an approach is entirely consistent with a dynamic language.


sjs on 3 + 1 = 2: I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

Sam Ruby has also updated the article, and has a [new post]([link]) about Ruby 1.9 strings. It was a bug in `gsub!`....

Excerpt from programming: what's new online at

Sam Ruby: Ruby 1.9 Strings - Updated

Sam Ruby: Ruby 1.9 Strings—Updated . A follow up to yesterday’s post: Sam’s principle complaints about Ruby 1.9’s character encoding support were down to a bug which has now been fixed....

Excerpt from Simon Willison's Weblog at

Sam Ruby: Ruby 1.9 Strings - Updated

Simon Willison : Sam Ruby: Ruby 1.9 Strings - Updated - Sam Ruby: Ruby 1.9 Strings—Updated. A follow up to yesterday’s post: Sam’s principle complaints about Ruby 1.9’s character encoding support were down to a bug which has now been fixed....

Excerpt from HotLinks - Level 1 at

Sam Ruby: Ruby 1.9 Strings — Updated

A useful explanation of some of the details of how Ruby 1.9 handles unicode...

Excerpt from del.icio.us/tag/ruby at

Sam, I feel like you’ve given us only the barest taste of the detail about Ruby Unicode. Where do we get the whole enchilada? For example, what happens if I concatenate strings with different encodings? How fast is character addressing? What version of Unicode is supported? What is the relationship to non-Unicode character sets? What character sets and encodings are supported? Is there a document that answers these kind of questions?

Posted by Paul Prescod at

Where do we get the whole enchilada?

If I knew that, I would simply have pointed to it.

Posted by Sam Ruby at

links for 2007-12-30

Sam Ruby: Ruby 1.9 Strings — Updated A useful explanation of some of the details of how Ruby 1.9 handles unicode (tags: ruby strings unicode) Recommend this post:...

Excerpt from a work on process at

Sam Ruby: Ruby 1.9 Strings — Updated

[link] [more]...

Excerpt from reddit.com: what's new online at

Sam Ruby: Ruby 1.9 Strings — Updated

[link]...

Excerpt from del.icio.us/tag/ruby at

Ruby 1.9: Not For Rails

Do NOT install or upgrade to Ruby 1.9 if you’re using Ruby for Rails development. There, that warning ought to suffice. On Dec. 25 Matz announced that a development release of Ruby 1.9 was available in which the Ruby 1.9 spec has been frozen:...

Excerpt from Binary Code at

Unicode Strings and byte buffers

Prior to Unicode there was ASCII or ISO 8859-1 (except for Microsoft that used their own encoding to lock-in users) and string manipulation was not hard. Now, Unicode is the future since everyone wants an easy solution to integrate all the...

Excerpt from edpeur public mind dump at

sjs on 3 + 1 = 2: I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

Sam Ruby has also updated the article, and has a [new post]([link]) about Ruby 1.9 strings. It was a bug in `gsub!`....

Excerpt from reddit.com: what's new online at

sjs on 3 + 1 = 2: I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

Sam Ruby has also updated the article, and has a [new post]([link]) about Ruby 1.9 strings. It was a bug in `gsub!`....

Excerpt from all: what's new online at

Add your comment