Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Personal thoughts about Pyston's outcome (kevmod.com)
103 points by open-source-ux on Feb 19, 2017 | hide | past | favorite | 67 comments


Key takeaway: "the [speed] difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types. The problem is that every operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively."

Also, Kevin and Pyston's team are too modest to state this directly in this article: They made good and steady progress, but it's taken them a long time. A lot of that time was spent making them experts on Python runtime and Python JIT options. With several more years of fulltime work we could have a JIT that we all use instead of CPython.

Pyston showed that certain features of Python make it slow and difficult to JIT. The best case scenario for Python's future would have been for Kevin and Pyston to work with the Python core team to decide how to tweak language features for speed (perhaps marking certain object modifications as being "slow" in the same way rust has an "unsafe" marker). With Pyston shutting down, I think the horse of python speedups is out of the barn and is never coming back.

More discussion: https://news.ycombinator.com/item?id=13534992


I don't get why it seems like languages are designed for convenience then later down the road billions of hours are spent trying to get better performance.

I have seen this happen over and over with PHP, Ruby, js, and now Python.

The best solution is to stop using a slow language for these types of applications. Personally I don't see a point to investing so much in these projects, since all it does is encourage developers to continue their bad behavior.

"Oh don't worry about it, they're working on a JIT if performance is ever an issue" -- lead dev deciding to write something that is expected to handle 10k transactions per second a few years later in Python.

If performance is ever going to be important just use a high performance language and be done with it. Then you don't have to spend years and millions reinventing the wheel like Facebook did with PHP.


> I don't get why it seems like languages are designed for convenience then later down the road billions of hours are spent trying to get better performance.

It's easy to explain.

1. Most languages are designed for personal satisfaction of the author, not for commercial success. Perl, PHP, and Ruby were all just personal projects. Python was a language intended for education, then pivoted towards Perl's niche in the 90s.

2. The vast majority of languages that are designed will fail and be abandoned without anyone having ever heard of them.

3. Designing a pretty and elegant language is hard, but fun. You don't even need to implement it.

4. Designing a language that is deliberately amenable to optimization is also hard, but not fun (especially when optimization has to be prioritized over elegance). It's also hard to prove that your design actually yields good performance, which is hard and not fun. Furthermore, the only point of designing such a language is to hope to someday fully implement it, which is very hard and extremely not fun. And hoping to implement such a high-performance language in a cross-platform way is, for any single person, basically impossible.

The modern resurgence of high-performance languages like Rust is because, thanks to retargetable backends like LLVM, the idea of a hobbyist designing a new high-performance cross-platform language has gone from "basically impossible" to "unbelievably tedious though possible but also on some level your language has to basically pretend that it's still C". And also the consolidation of platforms for the past few decades now mean that there's fewer targets to support in the first place, and hence fewer C compilers than ever, so if you just want to transpile directly to C you can, though you'll still spend the rest of your life adding hacks to your compiler to work around bugs in every combination of platform/C compiler/C compiler version out there.


> Designing a pretty and elegant language is hard, but fun. You don't even need to implement it.

The parts that make Python slow make it easier to hack on, but they don't make it elegant in any meaningful sense.

> Designing a language that is deliberately amenable to optimization is also hard, but not fun (especially when optimization has to be prioritized over elegance).

Elegant languages are usually amenable to being efficiently implemented without the implementor having to do anything special. That's because elegant languages guide you towards using the simplest, least powerful features that are powerful enough to solve your problem. For example, in Standard ML, I don't use higher-order functions that much, not because they suck (they don't), but rather because first-order functions and functors (which can't be recursive) are powerful enough for most things I want to do.

On the other hand, in a language like Python, it can be very difficult to find features that solve exactly your problem and nothing more, so often you have to reach for super-powerful hammers (operator overloading, metaprogramming, etc.). These hammers are almost invariably more general than the intrinsic demands of your problem, but, since the language designer can't provide an implementation that only works for your use cases, you must pay the price of the unnecessary generality of the feature.


Every single company that has invested heavily in JITs for PHP, Ruby, Python etc. has done so because they have vast codebases, often millions of lines, written in those languages and rewriting their codebase is infeasible: getting a 20% performance increase for everything from a dozen man-years is vastly cheaper than trying to optimize their entire codebase similarly yet alone rewrite large parts of the application in another language.

Why start writing something in one of those languages? They have high-quality frameworks that make it quite easy to ship early and ship fast: the vast majority of web-based startups fail, or at most stagnate with at most tens of thousands of users, and for them the performance of the common interpreters is fine. If we want to see a phase shift away from them, we need frameworks that allow quick prototyping written in languages which are more easily optimised (and, honestly, JavaScript really isn't bad in terms of writing a JITing compiler).


The point I was trying to make is that the idea that a JIT or compiled version will come along in a few years drives people to use slow languages for high performance projects in the first place. This creates a cycle where the language is used in a way that a JIT coming along is a self-fulfilling prophesy.

The irony is that the companies investing the most money into eventual JIT/compilation are the same ones that wrote code believing it would "come around" one day.

The only way to stop the madness is to avoid using these languages for high performance code in the first place.

Having used many fast/slow languages over the years I'll say I don't really notice any increase in productivity one way or another, and no difference in the richness of the library ecosystem.

Are there any studies that show more dynamic languages are more productive? Usually the argument is made that the more abstracted the language is from hardware the better it is to work in, but I don't notice a productivity difference in C#/Java vs Python/perl


I'm not sure people have this idea that a JIT or AOT compiled version will come along; certainly ten years ago nobody really had that impression, has the rise of the JS VMs really changed this so much?

To take FB as an example: AFAIK, it was written in PHP because Zuckerberg knew PHP. There was no expectation it would "come around" and it continued to be in PHP because bills had to be paid and rewriting everything would stop new features from being launched and hopefully reach profitability.

If you're a small startup your first goal is almost always to hit profitability, and then your goal is almost always to maintain that. As a result, language choices are frequently made by what founders know rather than any technical merit.


Try language with manual memory management and then you may notice difference.


Instead of rewriting all their codebase at once, if a company choose to simply first target the performance-critical aspects of their software stack, and slowly replaced old code with new, then it doesn't sound so financially irrational. Isn't that what's happening with Dropbox?


For three reasons:

1. There exist massive existing codebases in these languages, and simply the cost of a rewrite, retraining all of the engineers, and building new tooling is more expensive than making these languages run fast. In the case of V8, the cost for Google was to convince the entire web not to use V8.

2. These languages are in fact, the best tool for the job. The semantics of the language are the best tool for the job, and writing in another language is going to slow down the team (in the long run) more than making the language faster.

3. Opportunity cost is key when starting. I've seen this first hand with C* vs. Riak, and K8s vs. Mesos -- sometimes, choosing the language that lets you run faster, even at some sacrifice will put you ahead early in the game. You can figure out later how to replace all your duct tape, while dancing on the corpses of the competition.


Can you say more about K8s vs Mesos?


K8s written in Go, Mesos in C++ - believe that's what OP was referring to.


> I don't get why it seems like languages are designed for convenience then later down the road billions of hours are spent trying to get better performance.

Because CPUs are so fast there is usually no need to worry about performance and clarity of code and convenience is more important.

Also you have to remember that Python "grew up" from early 90s through early 2000s. That when most CPUs were not multicore and CPU speeds were doubling every year almost. You could write something in Python, and in a few years, it would speed up just by going to a better hardware.

You had the alternative of "here is 2000 of lines of C++ with templates, pointers, references, inheritance, constructors of various types, kind of meh standard library" vs "here is 500 lines of Python that you can read, understand and fix in much less time". Python was slower but that's ok. Getting faster machines was easier and cheaper than than work-hours poring over bugs and segfauls in the C++ code.

The one reason we talk about Python and speed is because it was widely adopted and in some cases companies wanted more "speed" from it. Then multi-CPU machines become commodity and there was more talk about the GIL and such.


> I don't get why it seems like languages are designed for convenience then later down the road billions of hours are spent trying to get better performance.

Usually, it's many orders of magnitude less than billions of hours, and the reason is that raw machine language is fast, all higher level languages are designed around ideas of desirable abstractions.


There's a similar comment from the earlier thread [1]:

"I urge designers of future dynamic languages to study Lua and LuaJIT if you want your language to not inhibit JIT development."

and [2]:

"Agreed. Also Smalltalk's memory model was a lot easier to optimize than languages like Python and Ruby. (Java inherited some of that.) Not having syntactic sugar makes optimization easier. In addition, language designers need to make parser tools easy to make and robust."

[1] https://news.ycombinator.com/item?id=13541702

[2] https://news.ycombinator.com/item?id=13543573


With regards to JS, the performance improvements did actually happen...


> I don't get why it seems like languages are designed for convenience then later down the road billions of hours are spent trying to get better performance.

It's a tradeoff - what are you trying to accomplish, and what would you rather spend your time doing?

More flexible languages are often also more expressive and have better tooling (e.g. for a CRUD app it's hard to find something nicer than Django / Rails), but tend to be slower. The other end of the spectrum is more performance-oriented languages where everything is way more verbose, and has low expressiveness, there's a lot of boilerplate, writing any code that does something relatively simple is more onerous. So depending on what phase of development you're in and what you're doing, the "obvious" choice can be different. And as companies grow and mature, those goals also shift.

Granted, that gap is closing with compiled / systems languages getting more and more expressive (Rust, Nim, Go?, C++11/17?), but it's also closing with scripting languages getting faster (very blanket statements here).

The fact that python, and some Java, but decidedly not C++, is the de-facto standard tool for the scientist these days should give you pause. Machine learning, gpu computing stuff, large scale data processing with spark - you can all do it with python. Sure, it's all C and Fortran and Java behind the scenes, but who cares? No one wants to bother with that if they can avoid it. They just need to get their job done.

Another example, more towards your requests per second example, is Japronto (https://medium.freecodecamp.com/million-requests-per-second-...) - turns out if you take time to optimize the event handling and the http parsing, suddenly you're back in the ballpark of compiled languages. Sure, other bits of code are going to then become the bottleneck, but those can be made faster too.

The "high level developer interface + compiled and optimized internals" combo is just very hard to beat, because it's always going to be good enough where it counts.

Also, performance improvements for a language or specific libraries is something that can be done as a centralized effort by experienced and knowledgeable people on that specific subject. Chances are most people aren't going to write a matrix multiplication operation that'll beat BLAS stuff, so why bother unless you have an extremely good reason? This is much more efficient and happens "for free" from the point of view of the end-user developers (also computers are getting cheaper and faster, also for free). So why bother?

I think Facebook / HHVM / PHP was sort of the boundary of how far this can go, but I think it's a testament to it rather than against it. Granted, not every company is Facebook, but like I said, the gap is closing too. But I think more dynamic languages are always going to become "fast enough" faster than more static languages becoming more expressive and nimble, and will often beat the latter in available tooling (I remember "package management" and build tools in C++, ugh.)


Your comment is lucid and makes good sense.

Why readers are downvoting it shall remain a mystery.


One other key takeaway from the comments on Kevin's blog:

"Hi John, I share some of your disappointment in the outcome [ed: Dropbox switching hot Python paths to Go], but at the end of the day I’m an engineer because I want to make an impact as opposed to just working on interesting problems, and I think the Pyston decision reflects that [ed: the cost-benefit of future time invested vs amount of speedup we might gain is not good]. I think there are some other areas of Python performance that are very important, perhaps in the numerical computing space, and I’m currently taking a look at that. Who knows how this will all work out!"

One personal comment: I still think that with two dedicated full-time engineers with the experience the Pyston guys have, some syntax tweaks could be proposed that make python more efficient. That would require working with the Python core team, which is sort of a bear given the amount of effort and time required for PEP 3118 and PEP 523. Which gets back to the question of making an impact.


I'd be curious if someone could share some examples. My poor understanding of Kevin's claim is that, essentially, python is too dynamic, particularly in the core library. It seems like the approach to that is devirtualization, perhaps something along the lines of Julia. But Kevin also points out that using llvm was a failure for the pyston project.

If anyone fills in some of the blanks, thank you in advance.


Michael Kennedy mentioned on podcast that Dropbox is considering Python 3 with type hints. Pyston is Python 2 only and considerable effort would take to make it work for Python 3.

PyPy is not that bad as they describe it in incentives to start Pyston. PyPy now has cpyext and Python 3.5 support is funded by Mozilla.

Another reason is pluggable JIT to CPython such as Pyjion. This is already possible in Python 3.6.

Currently you can use Cython for hot spots in your CPython apps and even release GIL.


"the [speed] difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types. The problem is that every operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively. Some examples are inspecting the locals of a frame after the frame has exited, mutating functions in-place, or even something as banal as overriding isinstance."

It's not typing that's the problem, it's dynamic behavior like monkeypatchable methods which require dict lookups.


Nowhere did I state that typing is the problem. I was explaining potential move to Python 3 with mypy.


"We needed a high level of compatibility and reasonable performance gains on complex, real-world workloads. I think this is a case that PyPy has not been able to crack, and in my opinion is why they are not enjoying higher levels of success."

It's a little tiring, as somebody who regularly puts production workloads onto PyPy, to hear this bullshit. The truth is that PyPy has been regularly spanking Pyston on benchmarks[1] and has been adding Python 3 support while Pyston has struggled to handle its Python 2 commitments.

It's great that you wanted to build a JIT. JITs are fun and cool. But if you just wanted shit to work, you could have used PyPy years ago and not jumped through all these hoops.

Yet again, the fact is that if you invested any code into C extension modules, you picked the wrong path. Good night, Pyston; you were a cool experiment and I hope that your contributors try sending patches to PyPy next time. (Pyjion, you're next.)

[1] https://pybenchmarks.org/u64q/benchmark.php?test=all&lang=py...


Not being able to use C extension modules is a killer feature for lots of large Python code bases.


I for one applaud Dropbox's devotion to Python. They are really helping to move the Python community forward.

I also think Kevin hits the nail on the head regarding what makes speeding up Python difficult. It's the same thing that makes writing Python so enjoyable: the object model.


This is not specific to just Python, but I have often wondered if it would be a productive exercise to write a static compiler for a dynamic language which also supported optional static typing. Presumably this compiler could generate code that is faster than an interpreter like CPython, but slower than what you can get out of a high performance JIT, since a JIT can do dynamic optimizations and function inlining. However, once you add static type notations, the static compiler can (for some cases) generate code that's actually faster than a JIT. Typically you'd only need to add type notations to hot spot code, and you could use profiling from previous runs to figure out what type annotations to use and where to put them.

Using a JIT to generate very fast code for dynamic (even highly dynamic) languages is a really interesting topic. But it seems to involve jumping through a lot of hoops to get performance that is a fraction of what you can get out of an early 1990's era C compiler (or Pascal compiler, for that matter).


The short answer is that performance is as much as property of the language itself as of the implementation. You can make a subset of Python really fast, but then you can't really call it Python anymore. Think of performance in terms of how much work has to be done for a function call, for instance. If the language specifies that all function calls are dynamic, all function arguments can be anything, all type classes can be redefined and so on then those are the constraints that will hold you back when trying to make it run fast. Static constraints don't really help much, because even if you statically assert you're always passing a Person instance to a function, the class definition of Person can still change over time, and the instance itself can swap its parent classes, properties, methods, and so forth.

I agree that making simple and dynamic languages with hugely complicated JIT-compilers is pretty absurd. We need new and fundamentally better languages, that combine performance, expressiveness, and much better compile-time guarantees.


If you haven't already, you might be interested in 'The Art of the Metaobject Protocol' [1]. It takes you step by step through the design of a highly dynamic, yet still efficient design of an object system for Common Lisp and its metaobject protocol.

The key is designing the extension points and the API in a way that allows efficient implementation. In Python that never happened. It seems like they basically documented the implementation and said: 'here are the dicts containing the internals, feel free to do anything you want'. That's an approach that will never result in an efficient implementation without huge efforts.

[1] https://mitpress.mit.edu/books/art-metaobject-protocol


Thank you -- I'll check it out.

I'm familiar with CLOS, but I have never looked seriously into implementation strategies. I have read many research papers on compiler design, language design, (multiple)dispatch implementations, and I have concluded that the languages we use day-to-day are decades behind what we would be using if we took advantage of this research.


This is roughly what SBCL does for Lisp, compilation to native code that can sometimes be improved with optional type annotations. Coincidentally, the native-code compiler itself is named Python (it predates Python the language).


I have often wondered if it would be a productive exercise to write a static compiler for a dynamic language which also supported optional static typing.

Isn't this basically exactly what cython is?


I'm on the Dart team, so this is a topic that's close to my heart.

> Presumably this compiler could generate code that is faster than an interpreter like CPython,

People have been writing static compilers for dynamic languages for ages. Lisp and Scheme have been compiled for a long time.

My understanding is even simple static compilation works tolerably well for something like Scheme, but object-oriented languages like Python are a different story.

In object-oriented languages, the majority of "calls" are potentially-polymorphic method invocations. In Python, even `a + b` may call some user-defined "+" method which must be looked up at runtime.

If you statically compile that in a straightforward way, almost all of the resulting "static" code is just going to be calls to dynamic method lookup functions. Something like:

    Obj* temp1 = lookup("a");
    Obj* temp2 = lookup("b");
    Obj* temp3 = invoke(temp1, temp2, "+");
You won't really do anything useful at the level of native code, so you don't gain much from eliminating the interpreter's bytecode loop. In fact, you'll likely lose. That above chunk of C is a pretty verbose encoding of `a + b`.

You can look at bytecode is a compression format for dynamic code of that form. The runtime cost of "decompressing it" -- the bytecode interpret-and-decode loop may be a net win over the memory wasted by the larger native code size and the cache misses and stuff caused by that.

In order to statically compile a very highly dynamic language, you really do need to do some static analysis to eliminate some of the dynamism. That analysis is either very costly (slow, giant monolithic whole program compiles that take ages as in MLton) or doesn't work very well (failing to infer good static types and resorting back to lots of slow dynamic code), or (most often) both.

With Dart, we have a very mature whole program compiler that does concrete type inference, and it's still both too slow to run and not effective enough at inferring good types.

> However, once you add static type notations, the static compiler can (for some cases) generate code that's actually faster than a JIT.

You can do that, yes, but it requires a few things:

1. Your core libraries and idiomatic user code need to actually be fairly "static" in terms of the types they use at runtime. If the code really does use a lot of polymorphism and duck typing, all static analysis can do is tell you, "Yup. You gotta do this at runtime."

Python in particular is well-known for relying heavily on duck typing in its core libraries.

2. Your type system needs to be sound. Once you are within the bounds of static types, the compiler needs to be able to trust that those types are correct, with no soundness holes. Otherwise, it all falls apart.

3. You need to deal with the boundary between dynamic and typed code. Because of 1, you need to do validation when an object flows from dynamic-land into the typed area. That validation occurs at runtime and has its own cost. If you aren't careful, that runtime cost outweighs the perf benefits you get once you are in static code.

The "gradual typing" folks have been beating on this for years and it turns out to be really hard. Google "is gradual typing dead" to see some of the latest soul-searching.

With Dart, we had 1 since the language has always had an optional static type so typical Dart code is fairly Java-esque. We did not have 2 and that turned out to be a big problem. We have since come up with a new sound type system [1] to address that.

3 is hard, but our strategy has been to be more restrictive on what kinds of dynamic code we allow to flow into typed code. For example, you simply can't pass a List<dynamic> into a function that expects a List<int>.

By the time you do all of these things, I think you discover that you don't really have much dynamism left in your system. It is handy having a dynamic type for some places, like deserialization, JSON, etc. But if you care about static compilation and perf, actually being statically typed is just so much easier, and dynamic typing doesn't really offer that much value as an alternative.

[1]: http://news.dartlang.org/2017/01/sound-dart-and-strong-mode....


> In order to statically compile a very highly dynamic language, you really do need to do some static analysis to eliminate some of the dynamism.

Or you can require the user to add type annotations, if/when he wants fast code.

In SBCL (a compiler for Common Lisp):

  (defun count-even (array)
    (declare (optimize speed (safety 0)))   ; <- these are the optimisation settings for this function
    (declare ((simple-array fixnum) array)) ; <- the argument to the function is an array of fixed sized integers
    (loop for x across array count (evenp x)))
This compiles down to quite efficient assembly:

   > (disassemble 'count-even)
   ; disassembly for COUNT-EVEN
   ; Size: 65 bytes
   ; 05A7ABC2:       BB17001020       MOV EBX, 537919511            ; no-arg-parsing entry point
   ;      BC7:       498BF8           MOV RDI, R8
   ;      BCA:       31C9             XOR ECX, ECX
   ;      BCC:       31F6             XOR ESI, ESI
   ;      BCE:       31D2             XOR EDX, EDX
   ;      BD0:       498B70F9         MOV RSI, [R8-7]
   ;      BD4:       660F1F840000000000 NOP
   ;      BDD:       0F1F00           NOP
   ;      BE0: L0:   4839F1           CMP RCX, RSI
   ;      BE3:       7D18             JNL L1
   ;      BE5:       488B5C8F01       MOV RBX, [RDI+RCX*4+1]
   ;      BEA:       4883C102         ADD RCX, 2
   ;      BEE:       4883E302         AND RBX, 2
   ;      BF2:       4885DB           TEST RBX, RBX
   ;      BF5:       75E9             JNE L0
   ;      BF7:       4883C202         ADD RDX, 2
   ;      BFB:       EBE3             JMP L0
   ;      BFD: L1:   488BE5           MOV RSP, RBP
   ;      C00:       F8               CLC
   ;      C01:       5D               POP RBP
   ;      C02:       C3               RET
If you don't add the type annotation (the second declare form), the compiler will tell you that it can't optimise the code because of type uncertainty. Optimising a program becomes a dialogue between the user and the compiler.


The entire second half of the comment you've replied to addresses this approach at great length.


Cython gets a modest speedup on plain Python code merely from bypassing the bytecode stack machine.


What do you think of something like Julia?


I think Julia is a really cool language.

As far as I know, they don't compile ahead of time. They JIT based on the types that are seen at runtime, and from what I've heard, they generate a lot of code for various specializations.

That's a smart trade-off for a mathematical language designed for running on a desktop, but probably not super relevant here.


You mean something like Julia?


isn't that what Typescript is shooting for?


As far as I know there are no plans to directly compile (JIT or otherwise) typescript. Typescript is always (again, as far as I know) transpiled into javascript whereupon it gets JIT'd in the browser just like "normal" javascript.

The gains that come from using typescript are productivity gains: in big projects static typing, generics, decorators, and interfaces are supposed to be a boost to productivity, not a boost to performance.


I guess that might change in a couple of years if Typescript is still around and WebAssembly is widely available.


No. TypeScript's type system is intentionally unsound (it makes it easier to interop with JS and untyped code), which means they can't rely on types for efficiency.


It was my understanding that most/all of this unsoundness can be removed by telling the compiler to disallow a bunch of stuff using command line args, is this still not the case?

Also even if you can't statically determine the types of some variables typescript seems to be able to determine it for most. If most of the objects are statically known would it give you a large part of the speed up of a totally static system? In Java and C# there's still some intentional holes in the type systems like the dynamic keyword or casting to object


> It was my understanding that most/all of this unsoundness can be removed by telling the compiler to disallow a bunch of stuff using command line args, is this still not the case?

Nope, it's never been the case, even with all the flags on. It's not about implicit casts, it's about soundness in the actual type rules. Here's an example:

    function append(array: Array<Object>) {
        array.push("not a number");
    }

    var nums: Array<number> = [];
    append(nums);
    console.log(nums[0].toFixed());
If you run this, it throws an exception because String does not have a "toFixed()" method.

> If most of the objects are statically known would it give you a large part of the speed up of a totally static system?

Maybe, but it's hard to say. Once you have a single hole, if you can't check it at runtime right at the hole, then values of the wrong type can spread anywhere in the program.

> In Java and C# there's still some intentional holes in the type systems like the dynamic keyword or casting to object

Right. In the above example, Java and C# would throw an exception when you added the string to the array. This check has some runtime cost, but works.

I believe TypeScript is also unsound around function types, which is harder to address because there's no easy place to know where to add the check.


All of this is not really relevant for TypeScript, as it just transpiles to Javascript and ends up running on the used Javascript VM.

Sure, sometimes you might be able to generate more efficient code by tailoring the emitted Javascript to the VM behaviour or using something like typed arrays for low level operations, but if the target language doesn't have support for certain things like type hinting, there is not much that could be gained.


It's hard for me to believe that it's easier (and not only more comfortable for the engineers) to start pyston rather then rewrite their backend in go or whatever. Is dropbox really so big (not in sales, in the size of their backend) that they should manage their own python implementation? Also choosing python for their backend really seems like the wrong language.

Compilers (especially JIT-compilers) are hard to get right and even harder to get performant. Since i am interested in these areas i would evaluate Graal/truffle or Jython. I only looked briefly at it, but Jython seems very un-optimised and the JVM really improved it scripting-language support in the recent years. Is there really no way to solve the native-extension problem of Jython?

But i maybe wrong, any comments?


There is a Truffle implementation of Python, but I think it's a bit out of date https://bitbucket.org/ssllab/zippy. I can't find any papers now but I think they had results showing that it was faster than PyPy (which is consistent with what other Truffle languages have achieved relative to RPython versions).

The native extension problem can be solved in Truffle with the LLVM bitcode interpreter. We're using this to interpret Ruby's C extensions.


thank you. What's the status of truffle? I often hear news from the project, but i never read anything about the scope of the project. Is is highly experimental? Are people using it in production? Is it intended to be used in production in the near future?

Also, if it's faster and the C-extension solved by the LLVM interpreter, what's holding it back from being used?


Truffle is a large active project working on multiple language fronts at the same time (JS, Ruby, R, C). There's a pretty long tail of things to do to go from research prototype to something people could use, but that is what we are working on at the moment. Twitter is experimenting with Graal as a Java compiler in production.

The C extension problem is solved in theory. We're now working on making it something a bit more substantial that the research implementation we have talked about previously.


quick question - the big focus of truffle seems to be js, ruby,R. Even the github org does not have python - https://github.com/graalvm

your comment today was the first time I realized there was a Truffle port for Python, and is NOT maintained by the parent oracle org. https://github.com/securesystemslab/zippy

Any particular (technical) reason ?


We just did that one as a collaboration with UCI. I don't know why. There isn't a master plan - these things evolve.


By way of comparison, Carakan (Opera's last JS VM) was under 10 man-years from start to shipping, including all testing resources, etc. (And yes, I realise man-years is a terrible way to judge this!)

How much of Dropbox could they rewrite for something quicker than CPython in the time taken to write a JIT? That's the ultimate business question, and there's plenty of evidence to suggest that in general JITs have been comparatively easy versus a rewrite. As the article says, they have millions of lines of code.

At the same time, at least from my knowledge of the languages, JS, Lua, and PHP are all easier to optimize than Python is (because you can do crazier stuff with Python's object model). JS and Lua also benefit from their incredibly small standard libraries which also means you don't need to potentially spend lots of time reimplementing libraries; CPython at least has most of its being implemented either in Python or through its C API which means if you have C API and licensing allows you can just pull out much of it from there.


It's hard for me to believe that it's easier (and not only more comfortable for the engineers) to start pyston rather then rewrite their backend in go or whatever.

You are correct, it is easier to rewrite in Go. That's what Dropbox learned after putting two FTEs on Pyston for a few years.

But the larger Python community outside of just Dropbox would have really benefited from these two guys continuing to work on Pyston.


Of course, getting a fast JIT-Compiler for Python would be awesome! I have been doing some python lately and it's a nice language. I would guess there is such a broad usage in the industry that a committee funded by various companies could take ownership of the project and greatly increase the assigned resources.

I just question whether it's the right move for dropbox. Seems like not actually tackling the problem and instead trying everything so that it just disappears.


it seems like a smart move to hedge your bets like that if one doesn't work out.


Facebooks Hack could be an interesting comparison here.

Afaik it started out as experiment, driven by one or two engineers and got to a significant usability level before it got more resources.

Even if you have 10 or 30 engineers working on it, if they can speed up the other code written by thousands of developers, it can still materialize a huge gain.


"Some examples are inspecting the locals of a frame after the frame has exited, mutating functions in-place, or even something as banal as overriding isinstance"

Ok, they're doing all of this in their codebase?

This seems like "trying too hard". Yes, they can do those. It doesn't mean it's a good idea


> Ok, they're doing all of this in their codebase?

They might not be, but the point in the blog suggests that standard libraries they consume do.

I wonder what the value proposition is in hunting down such stuff and fixing at that level (and whether that will help performance in cpython, pypy, pyjion etc.)


If you don't do this, then it's not Python.


I've yet to see (real-life) code that needs these.


It's useful when hacking around someone else's code that you aren't supposed to change. Not sure why Dropbox would need it, though.


IronPython uses the Microsoft DLR, which uses a JIT, so it's definitely possible that it would work.


They can't support C extensions, which are basically required for any real world Python.

Pyjion is a Microsoft CoreCLR JIT that can support C extensions. That's Python's next great hope for speedup via JIT.


Pyjion is C Extension based JIT for CPython. Pyjion only uses CoreCLR RyuJIT as a JIT compiler backend. It is NOT aware of .NET types at all.

Note that you can use pythonnet for interop between CPython and .NET for both embedding and extending.


Works has stalled on Pyjion and there's not a lot of contributors or money behind it at all. Don't let the hype get to you. As it was fairly obvious with Pyston, PyPy is still where it's going to be at.


DLR is just dynamic runtime for dynamic languages, including IronPython, IronRuby, PowerShell, COM interop calls. JIT is for CLR runtime to compile IL byte-code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: