It’s just data

Python vs Parrot

In many ways, it seems like Python and Parrot are from different planets.

In Python, the general approach seems to be to reduce everything possible to a canonical form as early as possible, and then deal with everything consistently.

In Parrot, the general approach seems to be to leave everything in its original form as long as possible, and then deal with everything separately.

Example

A simple example, incrementing a in Python:

 LOAD_FAST a
 LOAD_CONST 1
 BINARY_ADD
 STORE_FAST a

And in Parrot:

 find_lex P1, a
 add P1, P1, 1

The first difference you may notice is that Python is stack based, and Parrot is register based; but the difference I want to focus on is the add operation itself.  In Python, the BINARY_ADD operation is generic and can handle everything from integers to floating point to strings.  For this to work, the numeric 1 must first be converted to an object (a process often referred to as boxing) pushed on the stack.  BINARY_ADD will then pull two objects off of the stack, unboxing as appropriate, do the appropriate operation, box up the results, and then push it back on the stack.

In Parrot, the boxing and unboxing is deferred... there is a separate and unique opcode for adding an integer to an object (a.k.a. PMC).  This is in addition to opcodes that add floating point numbers to an object, adding an integer to an integer, an object to an object, etc.  This requires more special cases to be handled, but the payoff is that with this additional development work, runtime work can be eliminated.  In this case, all of the boxing and unboxing can be avoided.

In absolute terms, boxing and unboxing is not very expensive.  But in relative terms (and in this case, what it is to be compared against is simple integer addition), it can be very significant.

With the way Parrot is structured, much of the development overhead can be eliminated.  Not every object class that wishes to provide an add operator needs to implement an add_int method.  By using a common base class, a generic add_int method can be provided that boxes up the int and calls a single add method designed to work on objects.  Such a technique allows subclasses for which add_int is a common enough operation worthy of optimization to do so directly, without burdening all other subclasses with the need to do so.

Goals

The first goal of a Python on Parrot implementation needs to be fidelity to CPython implementation.  Otherwise, you are simply implementing a Python-like language on top of another runtime.  Such a language would not be able to make use of the full range of existing Python libraries and scripts.

However, that goal, by itself, is insufficient.  There already is a CPython implementation.  Potential secondary goals include better performance and better integration with other languages.  Both of those goals ultimately require some trade-offs to be made with respect to the first goal.

Most of the performance trade-offs can be made without compromise to functionality.  Best cases are when common scenarios are made significantly faster at an marginal expense to less common scenarios.

The integration scenarios are trickier.  Perl's integer divide has different semantics than Python's, particularly for negative numbers.  What does dividing a Perl integer by a Python integer mean?  If two Perl integers are passed to a Python function which attempts to divide them, what should be done?

The same operation that does a binary arithmetic also does string concatenation in Python.

These are admittedly edge cases.  But such edge cases abound.  Python has a dict as a fundamental data type.  Perl has a hash.  Keys of Python dictionaries can be any immutable value.  Keys of Perl5 hashes can only be strings.  More significantly is the impact of Duck Typing.  If somebody passes a Perl hash to a Python function, the Python function expects there to be a fair number of methods at its disposal.  How much of this can be papered over, and how much of this will show through is still a research topic at the moment.

Fundamentals

To date, I've found a number of areas that are more fundamentally different between Python and Parrot than any of the examples above.  The two implementations of Python on Parrot that I have looked at, namely pie-thon and pirate, approach these differently.

The first deals with the extent of the Python canonicalization mentioned above.  In Parrot, instances may have properties, methods, and attributes.  In Python, there are only attributes.  This is possible as functions, methods, and even classes are also objects in Python, so each are possible values for a given attribute.

In the pie-thon implementation of Python on Parrot, all methods are attributes.  In Pirate, all methods are properties.  The implication being that from the perspective of a language like Perl, such Python objects will have no methods defined.

This can be dealt with by implementing a find_method method in PyClass that searches first the set of methods, and then the attributes/properties.

More troublesome is the issue of naming.  In Parrot, the presumption is that all subroutines and classes are globally named.  In Python, such names are lexically scoped.  It is quite legal to have multiple methods in the same scope with the same name, in fact, the syntax to define a class in Python really only creates an anonymous class object and assigns it to a (lexically scoped) variable.  The only names that are global in Python are module names.  Modules in Python are used to address much the same types of problems that namespaces do in Parrot, but again, are fundamentally different.

Here's an example that can't be handled by pie-thon currently:

 def f(x):
 
   class c:
     def m(SELF): return 0
 
   if x<0:
     class c:
       def m(SELF): return -1
 
   if x>0:
     class c:
       def m(SELF): return +1
 
   return c()
 
 print f(7).m(), f(0).m(), f(-7).m()

But even that can be largely masked by clever compilers.  Pirate addresses this with a bit of name mangling.

A difference that can't be masked at all is a difference that isn't there.  In Python there is no vocabulary for "new-ing" up an instance of a class.  Instead, the __call__ method on the class is expected to act like a factory.  Non python libraries will either be required to mimic this behavior, or an alternate syntax (perhaps a Parrot module which exports a new function) will need to be provided.

Status and Plans

Michal Wallace has given me commit access to Pirate, and I've made a number of small fixes.  But mostly, I've been holding back until I can get a new set of python specific classes implemented and committed.

Leopold Tötsch has been committing (most of) my patches to Parrot, and now I am ready to have a largish one committed.  It is mostly new Python specific dynclass sources, with some small mods to the system to make it work.  Once that is committed, I'll update Pirate, and both will once again pass all defined tests.

At that point, I plan to do two activities.  One is to refactor as much of the existing logic in Pirate into Parrot dynclasses as possible.  The other is to expand the test suite and use that to drive the addition of new functionality.  Two sources of tests will be the parrotbench and the CPython unit test suite.


I think... I think we can get all this working. Some of it's going to be a matter of "The languages fight it out!" and there's not much that can be done there. Some of it's a matter of getting definitions and behaviours straight and providing a means to hide enough stuff that the Right Thing can generally be done painlessly. And some of it may well require some changes to Parrot, which isn't a problem either.

I think it can be worked out reasonably well. Lemme think on it some.

Posted by Dan at

I've been trying to follow you (from a distance). Just commenting that CVS parrot does not build for me in linux-ppc since a week ago or so:


blib/lib/libparrot.a(jit_cpu.o)(.text+0x2722): In function `Parrot_end_jit':
src/jit_cpu.c:74: undefined reference to `Parrot_ppc_jit_restore_nonvolatile_registers'
blib/lib/libparrot.a(jit_cpu.o)(.text+0x2726):src/jit_cpu.c:74: undefined reference to `Parrot_ppc_jit_restore_nonvolatile_registers'
blib/lib/libparrot.a(jit_cpu.o)(.text+0x274e):src/jit_cpu.c:74: undefined reference to `Parrot_ppc_jit_restore_nonvolatile_registers'
blib/lib/libparrot.a(jit_cpu.o)(.text+0x2752):src/jit_cpu.c:74: undefined reference to `Parrot_ppc_jit_restore_nonvolatile_registers'
blib/lib/libparrot.a(jit_cpu.o)(.text+0x2772):src/jit_cpu.c:74: undefined reference to `Parrot_ppc_jit_restore_nonvolatile_registers'
blib/lib/libparrot.a(jit_cpu.o)(.text+0x2776):src/jit_cpu.c:74: more undefined references to `Parrot_ppc_jit_restore_nonvolatile_registers' follow
collect2: ld returned 1 exit status

I think the problem is that people working on ppc arch is working either with darwin or with aix. I'll give it a look if I can find time.

Posted by Santiago Gala at

RE: Python vs Parrot

Sam,
  It seems you are always working on the next big thing; Web services, Apache, Open Source, XML syndication and now dynamic languages. I have to agree that from a technical perspective, dynamic languages look like they are set to break into the mainstream of developer consciousness.

Interesting stuff.

Message from Dare Obasanjo at


The type issues don't seem too terribly bad.  At least, from a Python perspective, Duck Typing should paper over a lot of the problems.  To tackle what I think is the most difficult type mismatch: Perl's mutable strings.  These are largely incompatible with Python, because string immutability is deeply ingrained in Python.  Perl strings can't be used as Python strings.  But if str(aPerlString) returns an equivalent immutable Python string, and _cmp_ is implemented for these two, and all those other operators, then usually it will be fine.  Programmers will have to be careful, lest they keep a mutable string around without realizing it (and it gets changed underneath them), but otherwise it should be fine.

Or, with Perl hashes, it's easy to imagine them in Python -- they just implement appropriate _getitem_ and _setitem_ methods, and throw exceptions if you try to use something other than string keys.  This isn't incompatible with Python, or even all that unusual.  And Python dictionaries can contain mutable keys, they just have the typical restriction that if you redefine equality you must redefine the hash; and if equality is based on mutable aspects, then there's no accurate hash, and so the hash method (_hash_?) is changed to raise an error to avoid confusion on the part of the programmer.

Still in some cases you probably want to put the logic into the operator and not the object.  E.g., if Python and Perl integer division are different, the kind of division you use should be based on the language, not the type of the object.  One way to deal with this would be to wrap foreign objects; e.g., you could define some wrapper from Perl integers to Python integers.  This seems slow and tiresome, but would allow for pleasant interfaces on all platforms (if you put in the work to create wrappers that make objects conform to the conventions of all the various Parrot languages).  I haven't thought that through very much, though.

I'm not sure how the Perl side will look.  It doesn't seem like Perl objects can pretend to be primitive types quite the same (though I'm not at all experienced with Perl).  Of course, maybe that's not an issue in Perl 6, which is the real target.

Re: the increment example.  How can you compile that to "add P1, P1, 1", when you don't know the type of P1?  Assuming "add" only works with integers... did I misread, and "add" is generic?  If so, what realistic cases are there where you would know enough at compile time to use the less generic commands?

Posted by Ian Bicking at

Ian,

The Parrot assembler picks the "add" opcode to match the operand type, so there is a bunch of different opcodes under the hood despite them sharing the same mnemonic. The assembler figures all this irritating stuff out for us.

Posted by Martin Atkins at

Sam Ruby: Python vs Parrot

[link]...

Excerpt from del.icio.us/tag/python at

Hackathon Day 2

Here are some highlights of the second day of the Hackathon:  I did end up spending some time helping people with PGP keys, and the web of trust now extends to ASF developers from Sri Lanka and Japan, among other places.  I was sitting at a table that... [more]

Trackback from Ted Leung on the air

at

Hackathon Day 2

Here are some highlights of the second day of the Hackathon: I did end up spending some time helping people with PGP keys, and the web of trust now extends to ASF developers from Sri Lanka and Japan, among other places. I was sitting at a table...

Excerpt from Ted Leung on the air at

Python has two integer division "modes".  The vestigial default mode of C-like "floor" division, where the remainder is dropped (1/2 => 1) and "true" division where (1/2 => 0.5).  The goal is to deprecate floor division with the '/' operator and replace it with true division.  Those who want the old behavior will be able to get it with the '//' operator.  The Python "special methods" for these are _div_/_truediv_ and _floordiv_ with _div_ vs. _truediv_ being selected by from _future_ import division.  I would think Perl integers would simply emulate Python floats and always do "_truediv_".  Given that the goal in Python is to have integer division more closely mimic the Perl model in the future, I'd say that this particular issue is less important that initially indicated.

Posted by Shahms King at

Sam Ruby: Python vs Parrot

Tags: parrot, python...

Excerpt from Ma.gnolia: Recent Bookmarks Tagged With "parrot" or Similar at

Add your comment