intertwingly

It’s just data

Python vs Parrot


In many ways, it seems like Python and Parrot are from different planets.

In Python, the general approach seems to be to reduce everything possible to a canonical form as early as possible, and then deal with everything consistently.

In Parrot, the general approach seems to be to leave everything in its original form as long as possible, and then deal with everything separately.

Example

A simple example, incrementing a in Python:

 LOAD_FAST a
 LOAD_CONST 1
 BINARY_ADD
 STORE_FAST a

And in Parrot:

 find_lex P1, a
 add P1, P1, 1

The first difference you may notice is that Python is stack based, and Parrot is register based; but the difference I want to focus on is the add operation itself.  In Python, the BINARY_ADD operation is generic and can handle everything from integers to floating point to strings.  For this to work, the numeric 1 must first be converted to an object (a process often referred to as boxing) pushed on the stack.  BINARY_ADD will then pull two objects off of the stack, unboxing as appropriate, do the appropriate operation, box up the results, and then push it back on the stack.

In Parrot, the boxing and unboxing is deferred... there is a separate and unique opcode for adding an integer to an object (a.k.a. PMC).  This is in addition to opcodes that add floating point numbers to an object, adding an integer to an integer, an object to an object, etc.  This requires more special cases to be handled, but the payoff is that with this additional development work, runtime work can be eliminated.  In this case, all of the boxing and unboxing can be avoided.

In absolute terms, boxing and unboxing is not very expensive.  But in relative terms (and in this case, what it is to be compared against is simple integer addition), it can be very significant.

With the way Parrot is structured, much of the development overhead can be eliminated.  Not every object class that wishes to provide an add operator needs to implement an add_int method.  By using a common base class, a generic add_int method can be provided that boxes up the int and calls a single add method designed to work on objects.  Such a technique allows subclasses for which add_int is a common enough operation worthy of optimization to do so directly, without burdening all other subclasses with the need to do so.

Goals

The first goal of a Python on Parrot implementation needs to be fidelity to CPython implementation.  Otherwise, you are simply implementing a Python-like language on top of another runtime.  Such a language would not be able to make use of the full range of existing Python libraries and scripts.

However, that goal, by itself, is insufficient.  There already is a CPython implementation.  Potential secondary goals include better performance and better integration with other languages.  Both of those goals ultimately require some trade-offs to be made with respect to the first goal.

Most of the performance trade-offs can be made without compromise to functionality.  Best cases are when common scenarios are made significantly faster at an marginal expense to less common scenarios.

The integration scenarios are trickier.  Perl's integer divide has different semantics than Python's, particularly for negative numbers.  What does dividing a Perl integer by a Python integer mean?  If two Perl integers are passed to a Python function which attempts to divide them, what should be done?

The same operation that does a binary arithmetic also does string concatenation in Python.

These are admittedly edge cases.  But such edge cases abound.  Python has a dict as a fundamental data type.  Perl has a hash.  Keys of Python dictionaries can be any immutable value.  Keys of Perl5 hashes can only be strings.  More significantly is the impact of Duck Typing.  If somebody passes a Perl hash to a Python function, the Python function expects there to be a fair number of methods at its disposal.  How much of this can be papered over, and how much of this will show through is still a research topic at the moment.

Fundamentals

To date, I've found a number of areas that are more fundamentally different between Python and Parrot than any of the examples above.  The two implementations of Python on Parrot that I have looked at, namely pie-thon and pirate, approach these differently.

The first deals with the extent of the Python canonicalization mentioned above.  In Parrot, instances may have properties, methods, and attributes.  In Python, there are only attributes.  This is possible as functions, methods, and even classes are also objects in Python, so each are possible values for a given attribute.

In the pie-thon implementation of Python on Parrot, all methods are attributes.  In Pirate, all methods are properties.  The implication being that from the perspective of a language like Perl, such Python objects will have no methods defined.

This can be dealt with by implementing a find_method method in PyClass that searches first the set of methods, and then the attributes/properties.

More troublesome is the issue of naming.  In Parrot, the presumption is that all subroutines and classes are globally named.  In Python, such names are lexically scoped.  It is quite legal to have multiple methods in the same scope with the same name, in fact, the syntax to define a class in Python really only creates an anonymous class object and assigns it to a (lexically scoped) variable.  The only names that are global in Python are module names.  Modules in Python are used to address much the same types of problems that namespaces do in Parrot, but again, are fundamentally different.

Here's an example that can't be handled by pie-thon currently:

 def f(x):
 
   class c:
     def m(SELF): return 0
 
   if x<0:
     class c:
       def m(SELF): return -1
 
   if x>0:
     class c:
       def m(SELF): return +1
 
   return c()
 
 print f(7).m(), f(0).m(), f(-7).m()

But even that can be largely masked by clever compilers.  Pirate addresses this with a bit of name mangling.

A difference that can't be masked at all is a difference that isn't there.  In Python there is no vocabulary for "new-ing" up an instance of a class.  Instead, the __call__ method on the class is expected to act like a factory.  Non python libraries will either be required to mimic this behavior, or an alternate syntax (perhaps a Parrot module which exports a new function) will need to be provided.

Status and Plans

Michal Wallace has given me commit access to Pirate, and I've made a number of small fixes.  But mostly, I've been holding back until I can get a new set of python specific classes implemented and committed.

Leopold Tötsch has been committing (most of) my patches to Parrot, and now I am ready to have a largish one committed.  It is mostly new Python specific dynclass sources, with some small mods to the system to make it work.  Once that is committed, I'll update Pirate, and both will once again pass all defined tests.

At that point, I plan to do two activities.  One is to refactor as much of the existing logic in Pirate into Parrot dynclasses as possible.  The other is to expand the test suite and use that to drive the addition of new functionality.  Two sources of tests will be the parrotbench and the CPython unit test suite.