The approach used for the
pie-thon
involved
mdis.py to compile a Python source and disassemble the
resulting Python bytecode, and
pie-thon.pl to translate the equivalent of Python assembler
language to
PIR.
This is then
compiled
into Parrot Byte Code, which is what is ultimately executed.
This is good for a proof of concept, but in the opinion of
Leopold ToetschA real implementation of a compiler should
use the AST to produce Parrot bytecode. The overall
approach would look something like this [adapted from
Him Hugunin's presentation on Iron Python]:
Michal Sabren's
pirate (currently
stalled) goes directly from Python AST to PIR, which would still
need to be compiled.
Loepold took Mical's approach and created
ast2past.py which compiles source to
Python's
AST, and then converts it to Parrot's AST, which as near as I
can tell, is
unimplemented. I'm not even clear on what a Virtual
Machine's AST is.
Potential Approach
It seems to me that one should settle on an interface to Parrot,
and work backwards from that point in Python. At the moment,
the PIR interface seems the most viable (though I freely admit that
may be due to my ignorance at this point). If so, the first
effort would be to upgrade pirate to produce PIR equivalent to that
which is produced by pie-thon, using the
Python tests to guide development.
The result would still depend on the Python compiler (written in
C). In order to host this on Parrot, a Python Parser would
need to be written. It could leverage the existing
tokenizer
as a Python Scanner.
Ultimately, when self hosted, the need for IL in any textual
form could be dispensed with.
Is this innovative? Why use parrot over the CLR for any existing OS? CLR implementations run on mac, linux, free bsd, and even windows.
The biggest issue with the CLR is that there are two separate implementations: Rotor and Mono. Rotor has not been updated since 2002 and implements the API in the ECMA standard only. So Remoting, XML, and other advanced features of the .NET API are not implemented. These portions of the API, however, were all put in a patent application. Mono has implemented a majority of this API meaning that there may be patent infringement issues with using it.
There are already a couple attempts at a python parser written
in python. Last time I worked on pirate, everything that worked
with the official parser also worked with one of the python-based parsers stolen from PyPy.
The reason that an AST based approach was not used in pie-thon
was that Dan Sugalski felt that the rules limited him to using
bytecode, and when the bet was made that also seemed like the
simplest approach. When I announced pirate, he said on the python
developers list that he couldn't use it. (Dan's also a bytecode/
VM wizard, so I think he wanted to play to his strengths)
It turns out that the name pirate as a parrot compiler was already taken: by Klass-Jan Stol's Lua compiler. I can't find his paper at the moment, but reading it showed that pretty much all compilers are going to do the same thing: convert the same sorts of abstract syntax trees to the same sorts of bytecode. We figured it would save people time if we could provide a single AST tool/library with front ends for the various languages. Sort of like gcc for parrot. Leo liked the idea and decided it ought to be part of parrot proper, so the "parrot AST" idea is his take on it.
But... It turns out that the real stalling point is the objects.
What's really needed to get python running on parrot is for someone
to expose the PythonObject code to parrot by wrapping them as parrot PMCs. I'm not a skilled C programmer, but was able to
create a small proof of concept. If someone could finish the job, there should
be no reason why python objects and modules couldn't live happily in the parrot VM.
The other main work to be done is optimization and cross-language
support. Pirate is an incredibly wasteful compiler. A lot of the
bytecode it generates could be simplified. I was hoping to leave
that job to Leo and Dan and friends. :)
As for cross language support: functions have a beautiful, streamlined interface in the calling conventions. That part
should be easy, but the object system is another story. I haven't
looked at it in a year, but when it was announced, it would
have prevented using objects across languages. Dan agreed that
it was bad, but didn't want to address it at the time. I don't
know where it stands now, but I think there's a reason I haven't
seen any working cross-language OO parrot examples yet.
Anyway, if you want python on parrot for speed, this should be
straightforward (and small) job for a competent C programmer.
If you want python on parrot so you can talk to perl and ruby
and php, then you're might need some diplomatic skills as well...
Parrot is written with dynamically-typed, late-binding languages in mind, while the CLR is (in my impression) really written in the style of the JVM (though the CLR is reportedly better than the JVM for this case). Parrot also (I think) uses a different model for the VM, with registers instead of a stack. I don't know if it'll be better than a stack-based approach, but it's possible.
But... It turns out that the real stalling point is the objects. What's really needed to get python running on parrot is for someone to expose the PythonObject code to parrot by wrapping them as parrot PMCs. I'm not a skilled C programmer, but was able to create a small proof of concept. If someone could finish the job, there should be no reason why python objects and modules couldn't live happily in the parrot VM.
I a competent enough C programmer to wrap PythonObjects as PMCs (I previously wrapped Java jobjects as PHP4/Zend objects). I also know enough to not want to do this. Among other things, any reasonable application will quickly find a need for this to be two way (i.e., wrappering PMCs as PythonObjects), at which point, garbage collection becomes problematic.
rubys@rubix:~/pirate$ python PirateTest.py
.................................FF.......................
======================================================================
FAIL: test_microthreads (__main__.PirateTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "PirateTest.py", line 665, in test_microthreads
self.assertEquals(res, "a b a b a b a b ")
File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
raise self.failureException, \
AssertionError: 'Null PMC access in invoke()a b ' != 'a b a b a b a b '
======================================================================
FAIL: test_microthreads_more (__main__.PirateTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "PirateTest.py", line 683, in test_microthreads_more
self.assertEquals(res, "0 1 2 ")
File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
raise self.failureException, \
AssertionError: 'Null PMC access in invoke()0 ' != '0 1 2 '
----------------------------------------------------------------------
Ran 58 tests in 5.489s
FAILED (failures=2)
rubys@rubix:~/parrot/languages/python$ python test_microthreads.py
a b a b a b a b
rubys@rubix:~/parrot/languages/python$ ./pie-thon test_microthreads.py
Undefined subroutine &main::LOAD_CLOSURE called at pie-thon.pl line 413.
error:imcc:parse error, unexpected $end, expecting '\n'
in file 'test_microthreads.pir' line 102
Parrot VM: Can't stat test_microthreads.pbc, code 2.
error:imcc:main: Packfile loading failed
The two failing pirate tests use generators for a simple microthread demo. In python, invoking a generator routine returns a generator object with a "next" method. The object system wasn't implemented when I wrote the code, so I faked it. Looks like generators will need to be repaired. You can just comment out those tests if you want, and just avoid generators. (Frankly I'm surprised that's all that's broken!)
If you're interested in wrapping python objects, there are two main issues that I know about. PythonObjects can stand on their own for the most part, so really shouldn't care what virtual machine they live in. However, they do seem to expect two services from the VM: garbage collection and management of the global interpreter lock. Python has standard functions for these things, that will need to be replaced at some point. In my proof of concept (which simply attemped to show you could use a PythonObject without a running python VM), I simply stubbed out whatever was missing, and that seemed to work.
Re. "registers instead of a stack" (from Ian), the main reason for doing it that way is to support continuations.
Some of the benefits of continuations are:
Thread-like behavior without threads, locks, or (necessarily) queues for communication.
Particularly, producer/consumer pipelines without needing threads.
For simulations and games, the ability to have 100-100k objects that can interact with each other (again, like threads, but simpler).
Continuation-based web applications, which have a far simpler flow-of-control (see Squeak's Seaside and Ruby's Borges).
Continuations are a part of Stackless Python, but iirc, Stackless is still stalled around calling C routines. Continuations are a primary design requirement in Parrot.
For the sake of completeness, here's a link to my report on the Lua compiler for Parrot: [link]. Please note the compiler wasn't really finished, but it was a good prove of concept (my first real compiler project, so it's not that strange it didn't work out completely).
Of course, I'd be glad to answer any questions about the project.
PythonObjects can stand on their own for the most part, so really shouldn't care what virtual machine they live in.
Michal, I now understand what you are getting at, and I agree. However, my first priority would be to upgrade the support to match the current version of Parrot, and to focus on getting the python unit tests to pass. Here's an example:
rubys@rubix:~/pirate$ python pirate.py test_tokenize.py
error:imcc:op not found 'add_p_p_sc' (add<3>)
in file '-' line 38
By comparison, here are the pie-thon results:
rubys@rubix:~/parrot/languages/python$ ./pie-thon test_tokenize.py
Use of uninitialized value in string ne at pie-thon.pl line 470.
Use of uninitialized value in string eq at pie-thon.pl line 471.
Use of uninitialized value in concatenation (.) or string at pie-thon.pl line 478.
... my first priority would be to upgrade the support
to match the current version of Parrot, and to focus
on getting the python unit tests to pass.
Re-syncing with parrot makes sense. But the error you listed isn't a parrot version problem. It's coming from
this line in test_tokenize.py:
f = file(findfile('tokenize_tests' + os.extsep + 'txt'))
os.extsep doesn't exist in parrot's vm, so pirate doesn't
know how to cope. (Neither does "file", "findfile", or "os" itself)
There is almost no chance that the python unit tests will work correctly without PythonObjects. You could recode all the builtin functions (like file here) by hand in parrot if you wanted, but it seems like it would be so much easier to just wrap the builtins module.
Same thing with "import". Since imports are dynamic, they require a parser at runtime. The official parser would need to be wrapped, and the the parsers written in python all use objects.
Looking at pie-thon.pl, it seems that someone's already made a stab at creating pythonic PMC's (Py_object, Py_int, and so on in %type_map). Using these Py_ PMC's instead of the old perl-style structures would be a good next step for pirate.
I've came across a few references to AST during the past few weeks, but hadn't really sorted out what it really was. The name is pretty descriptive, I'll admit, but I hadn't seen any examples. But yesterday Sam Ruby posted......
Yep -- parrot's bytecode loading system inspects suffixes if it doesn't otherwise recognize what the file you've fed it is. Right now .pasm, .imc, .pbc, and .past are on the list of known suffixes. Hopefully we'll add .z-code soon...
En este post, Sam Ruby explica de forma muy sencilla cómo está diseñado el soporte Python en Parrot (pyrate).Por lo que yo sé, y no sé mucho de este tema, su futuro no está muy claro (o al menos no está nada cercano) y el estado de desarrollo actual...