intertwingly

It’s just data

Ruby 1.9 Strings — Updated


My confusion from yesterday was due to a bug, which was promptly fixed — test case, fix.

Now that I understand what is intended, the situation is a lot clearer.  In Python 3.0, there are two types of strings, Bytes and Unicode, and the determination of the type is static.  With Ruby 1.9, there is one type of string, and the associated encoding is mutable.  The internal state of a given sequence of bytes with respect to the current encoding is: UNKNOWN, 7BIT, VALID, and BROKENUNKNOWN is a mechanism to delay the binding, and the combination of the bug and the delayed binding made the situation confusing as correctness of the result produced depended on the order of the operations performed.

The bug affected gsub!, but not sub, sub! or gsub.  With the released 1.9.0 version of Ruby, gsub! the state of the resulting string was not updated.  Oops.  Now that that is corrected, everything works as expected, for some values of expected.  Things I was not previously aware of:

The net result of all this is that any sequence of operations that produce a runtime exception in Ruby 1.9 would also produce a runtime exception in Python 3.0.  Some use cases that are entirely safe will not produce an exception in Ruby 1.9 when they would in Python 3.0.  Such an approach is entirely consistent with a dynamic language.