(author here) Hm very interesting, I hadn't seen this article! I would say it's ...

skybrian · on Feb 27, 2022

Yes, I don’t agree with everything in the article, but it points out some sticky problems that can make things complicated.

One example that’s related to programming languages is the evolution of an abstract syntax tree. We often expect new compilers to work with old code when a new kind of AST node is added, but to what extent should existing tools work with new code that has a node type that it doesn’t understand? Also, if your AST datatype is a library, does adding a new node type immediately cause compiler errors in every tool that uses the AST, or do those tools compile fine and break at runtime when given new code?

Sometimes as a workaround, “metadata” gets put in specially formatted comments, as a way of saying that most language tools can ignore it. Alternately there can be general-purpose annotations (as Java has).

Also, often an intermediate representation gets used as a “narrow waist” but they are tricky to design because “lowering” code will often discard high-level constructs.

chubot · on Feb 27, 2022

Yes, the AST problem is one I've had a lot! It's hard.

(The whole Oil language is defined as a big ML-style statically typed data structure, but it's not exposed: https://www.oilshell.org/release/0.9.8/source-code.wwz/front...)

What I'd say is that we already have a narrow waist for every language -- text. That is the one that's stable! Having an additional stable AST format seems to be an insurmountable problem, especially in most statically typed languages.

The basic reason is that it's easier to evolve text in a backward compatible way than types. This is probably obvious to some people, but controversial to others... It has a lot to do with the "ignoring new data" problem discussed in the blog post. It would be nice to unpack this a bit and find some references.

----

Related: I recently circulated this language implementation, which has a front end in OCaml, a backend in C++, and the IR is transmitted via protobuf.

https://old.reddit.com/r/ProgrammingLanguages/comments/ss3w6...

Protobuf has the "runtime field presence" feature, which would be useful for making a backward compatible AST/ IR. That isn't done here, but it would be something to explore.

-----

Also, thinking about the blog post a bit more, I'd say a huge number of decisions hinge on whether you can recompile the entire program or not.

If you can, you should. But if you can't, a lot of programmers seem to have a hard time with runtime composition. Sometimes they even deny that it exists or should exist!

In contrast, whereas admins/operators/SREs deal with runtime composition almost exclusively. Programmers are more familiar with a static view and build time composition. If I wanted to be ambitious, what I am getting at with this series is a theory of and guidelines for runtime composition, and versionless evolution.

e.g. even in Windows, you never recompile your COM components together, even though they are written in statically typed languages. Global builds and atomic upgrades don't scale, even on a single computer.

skybrian · on Feb 27, 2022

Yes, this gets into the static versus dynamic library debate.

There's one school of thought that says that a technically competent organization really should be able to recompile any of the code it runs, on demand. If you can't do that, you are missing source code or you don't know how to build it, and that's bad. It's not really your code, is it?

That's Google's primary approach, internally, though there are many exceptions. It fits in well with an open source approach. The Go language's toolchain uses static linking as a result of this philosophy.

But most people don't belong to an organization with that level of self-determination. Few people run Gentoo. We mostly run code we didn't compile ourselves, and if we're lucky there is a good source of security updates.

But there's still a question of whether you really need dynamic linking, or is having standard data formats between different processes enough? When you do upgrades, do you really need to replace DLL's or is replacing entire statically linked binaries good enough?

If replacing entire binaries is your unit of granularity for upgrades, this leads to using something like protobufs to support incremental evolution. Instead of calling a DLL, you start a helper process and communicate with it using protobufs. The helper binary can be replaced, so that part of the system can be upgraded independently.

None of this really solves the AST problem. I don't think there are many development tools that would still work if the parser and AST were in a DLL and you replaced them? I guess it would help with minor fixes and cosmetic changes. The approach you tried with protobufs seems interesting but how awkward was it?

chubot · on Feb 28, 2022

Hm I do see that point of view, but to me the trends seem to be in the opposite direction.

I'd question if that view even makes sense at Google, because like I said nobody ever recompiles all the code in a cluster at once and reboots it. This is the "lack of atomic upgrade problem", which tilts you toward a dynamic view. Some more links here, I will probably put it on the blog:

https://old.reddit.com/r/ProgrammingLanguages/comments/t0lze...

It's even more true if you're AirBNB or Lyft -- are you going to recompile Google Maps or Stripe? No you use it dynamically through an API. Pretty much all organizations are like this now, not just software companies. They're a big pile of dynamically composed services, and so the static view has less and less power.

I was thinking today about a slogan Poorly Factored Software is Eating the World. My first job was working on console games where you shipped a DVD and never updated it, and it was incapable of talking to a network. But basically no software is like that anymore, including games and embedded systems.

What I see is that the world is becoming a single big distributed system and that actually influences what you'd consider technical details like the type systems! And static vs. dynamic libraries. There is pressure from applications on languages and operating systems.

-----

The DLL vs IPC question has a few different dimensions... I'd say Linux distros do it very poorly. The whole model of "named versions" and "constraint solving" is extremely brittle:

1. It reduces something that's multi-dimensional to one or 2 dimensions (a version number)

I wrote this awhile back, and it's related to the Clojure and protobuf style of "versionless" software evolution:

https://github.com/oilshell/oil/wiki/Feature-Detection-Is-Be...

2. It means that you're often running a combination of software that nobody has ever tested! In Linux distros, testing is basically done by users.

Google does a little better -- you have a test cluster, and you deploy to that first, so ideally you will have tested the exact combination of versions before it reaches end users. And you have canarying and feedback too, so even when you have bugs their impact is more limited.

https://old.reddit.com/r/ProgrammingLanguages/comments/t0lze...

So while I favor IPC, I'd say you can probably do dynamic linking well, but there are some "incidental" reasons why it's done poorly right now, and has a bad reputation. It leads to breakage and is hard to deploy, but that's not fundamental.

----

I guess the overall frame I'm trying to fight is "static vs. dynamic". What I'd say is that static is local and domain-specific, and you should take advantage of it while you can. But dynamic is inevitable at a large scale and we need better mechanisms to deal with it, better ways of thinking about it, better terms and definitions, etc.

Shell is of course all about dynamic composition -- it's basically unheard of to recompile all the programs a shell script invokes :) The language is such that you can't even determine all the programs it invokes statically (but Oil is going in that direction for deployability / containers)

----

Oh and the language project wasn't mine -- I just circulated it. I did 2 related experiments called "OHeap" that were protobuf/capnproto like, which I described on the blog, but I'm not using them, and they were only mildly interesting. In Oil there's not a hard line between front end and back end, so serializing the AST didn't end up being a major requirement.

skybrian · on Feb 28, 2022

Yes, there are no atomic upgrades. Even when a static binary is replaced, there are many instances of it running and the processes aren't restarted atomically.

However, in a protobuf-based system within a single organization, you can at least say that every deployed binary expects an API that was defined by some version of the same protobuf file. At Google there is (or was - it's been a while) a linear history of this file in source control somewhere. That limits the possible API variations in existence.

By contrast, in the acko.net article, he describes the ad-hoc variation that happens when many organizations make their own extensions to the same protocol, with little coordination with each other. (And yes, the web is like this too.)

chubot · on Feb 28, 2022

Adding one more thing: I'd also say there's a big asymmetry in DLLs vs. static libraries. DLLs that can be independently upgraded pretty much have to use the C ABI, which is more like RPC/IPC. It doesn't really make sense to have a pure Rust or C++ DLL -- you lose all your types and type safety; you have to write glue.

So actually I'd say IPC and DLLs are closer together, and static libraries are on the other end of the spectrum. IPC and DLLs are both dynamic, and that's the important distinction. That's what a lot of the decisions in the acko.net article hinge on