Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(author here) Hm very interesting, I hadn't seen this article!

I would say it's highly related, but doesn't contradict anything I'm saying. His examples are at a higher application level, in the domains of images and animation. I'm more focused the system level, like the basics of building code, distributing it, dynamically composing it, monitoring it, etc.

There's definitely less need for types and schemas at that level, and more need for computation in his application domains. (I guess in networking terms this is a "smart middlebox" problem, as opposed to passive ones.)

The commonality is that they are trying to avoid combinatorial explosions of code by having a shared representation, e.g. avoiding writing the same transformations for both PDF and JPG.

I'd also say that he is overly negative on a few fronts -- I agree with a lot of it, but I'd frame the evolution as a success, not a failure. I guess my point the narrow waist is basically the only way we know how to build (really evolve) large scale systems over large time frames.

We can do it well or do it badly, explicitly or implicitly... From the code and systems I've worked with, we could easily do a lot better. But I appreciate this article because identifying these kinds of problems is the first step.

----

Some random responses:

Putting data inside a free-form key/value map doesn't change things much. It's barely an improvement over having a unknownData byte[] mix-in on each native type. It only pays off if you actually adopt a decomposable model and stick with it. That way the data is not unknown, but always provides a serviceable view on its own. Arguably this is the killer feature of a dynamic language. The benefit of "extensible data" is mainly "fully introspectable without recompilation."

This part feels overly negative ... I would just frame this as a tradeoff between static and dynamic. There could be reasons that dynamic doesn't work in his domain, but that doesn't mean it doesn't work elsewhere. I have a bunch of material in drafts about types and distributed systems, sort of related to this "Maybe Not" discussion: https://lobste.rs/s/zdvg9y/maybe_not_rich_hickey (i.e. schemas/shapes should be decoupled from field presence; one is reusable and global and the other is local)

From what I know Clojure has some good framings of the versioning problem, with Spec and RDF inspired schemas, but I haven't used it. Rather than the brittle versioning model, Hickey frames software evolution as "strengthen a promise" and "relax a guarantee". I don't know if it would be useful in those domains but it's worth addressing.

I would be interested in unpacking the problem differing PDF and JPG metadata a little more... Off hand I'm not sure why it's so difficult; I feel like it's mostly a problem in certain statically-typed systems.

I guess to put a big stake in the ground, you could say extensibility is inherently dynamic. Simply because you don't know what code is going to operate on your data 10 years from now. It hasn't been written yet, and the people who need it haven't been born yet. It's impossible in reality, not just in the type system :)

I think the difference is his domains are living with the confines of a single machine and single application, so there is the expectation that you should be able to do better (have more type safety). While nobody expects to be able to reboot the Internet all at once and upgrade it. And even say at Google nobody wants to recompile all the code in a cluster at once and reboot it.



Yes, I don’t agree with everything in the article, but it points out some sticky problems that can make things complicated.

One example that’s related to programming languages is the evolution of an abstract syntax tree. We often expect new compilers to work with old code when a new kind of AST node is added, but to what extent should existing tools work with new code that has a node type that it doesn’t understand? Also, if your AST datatype is a library, does adding a new node type immediately cause compiler errors in every tool that uses the AST, or do those tools compile fine and break at runtime when given new code?

Sometimes as a workaround, “metadata” gets put in specially formatted comments, as a way of saying that most language tools can ignore it. Alternately there can be general-purpose annotations (as Java has).

Also, often an intermediate representation gets used as a “narrow waist” but they are tricky to design because “lowering” code will often discard high-level constructs.


Yes, the AST problem is one I've had a lot! It's hard.

(The whole Oil language is defined as a big ML-style statically typed data structure, but it's not exposed: https://www.oilshell.org/release/0.9.8/source-code.wwz/front...)

What I'd say is that we already have a narrow waist for every language -- text. That is the one that's stable! Having an additional stable AST format seems to be an insurmountable problem, especially in most statically typed languages.

The basic reason is that it's easier to evolve text in a backward compatible way than types. This is probably obvious to some people, but controversial to others... It has a lot to do with the "ignoring new data" problem discussed in the blog post. It would be nice to unpack this a bit and find some references.

----

Related: I recently circulated this language implementation, which has a front end in OCaml, a backend in C++, and the IR is transmitted via protobuf.

https://old.reddit.com/r/ProgrammingLanguages/comments/ss3w6...

Protobuf has the "runtime field presence" feature, which would be useful for making a backward compatible AST/ IR. That isn't done here, but it would be something to explore.

-----

Also, thinking about the blog post a bit more, I'd say a huge number of decisions hinge on whether you can recompile the entire program or not.

If you can, you should. But if you can't, a lot of programmers seem to have a hard time with runtime composition. Sometimes they even deny that it exists or should exist!

In contrast, whereas admins/operators/SREs deal with runtime composition almost exclusively. Programmers are more familiar with a static view and build time composition. If I wanted to be ambitious, what I am getting at with this series is a theory of and guidelines for runtime composition, and versionless evolution.

e.g. even in Windows, you never recompile your COM components together, even though they are written in statically typed languages. Global builds and atomic upgrades don't scale, even on a single computer.


Yes, this gets into the static versus dynamic library debate.

There's one school of thought that says that a technically competent organization really should be able to recompile any of the code it runs, on demand. If you can't do that, you are missing source code or you don't know how to build it, and that's bad. It's not really your code, is it?

That's Google's primary approach, internally, though there are many exceptions. It fits in well with an open source approach. The Go language's toolchain uses static linking as a result of this philosophy.

But most people don't belong to an organization with that level of self-determination. Few people run Gentoo. We mostly run code we didn't compile ourselves, and if we're lucky there is a good source of security updates.

But there's still a question of whether you really need dynamic linking, or is having standard data formats between different processes enough? When you do upgrades, do you really need to replace DLL's or is replacing entire statically linked binaries good enough?

If replacing entire binaries is your unit of granularity for upgrades, this leads to using something like protobufs to support incremental evolution. Instead of calling a DLL, you start a helper process and communicate with it using protobufs. The helper binary can be replaced, so that part of the system can be upgraded independently.

None of this really solves the AST problem. I don't think there are many development tools that would still work if the parser and AST were in a DLL and you replaced them? I guess it would help with minor fixes and cosmetic changes. The approach you tried with protobufs seems interesting but how awkward was it?


Hm I do see that point of view, but to me the trends seem to be in the opposite direction.

I'd question if that view even makes sense at Google, because like I said nobody ever recompiles all the code in a cluster at once and reboots it. This is the "lack of atomic upgrade problem", which tilts you toward a dynamic view. Some more links here, I will probably put it on the blog:

https://old.reddit.com/r/ProgrammingLanguages/comments/t0lze...

It's even more true if you're AirBNB or Lyft -- are you going to recompile Google Maps or Stripe? No you use it dynamically through an API. Pretty much all organizations are like this now, not just software companies. They're a big pile of dynamically composed services, and so the static view has less and less power.

I was thinking today about a slogan Poorly Factored Software is Eating the World. My first job was working on console games where you shipped a DVD and never updated it, and it was incapable of talking to a network. But basically no software is like that anymore, including games and embedded systems.

What I see is that the world is becoming a single big distributed system and that actually influences what you'd consider technical details like the type systems! And static vs. dynamic libraries. There is pressure from applications on languages and operating systems.

-----

The DLL vs IPC question has a few different dimensions... I'd say Linux distros do it very poorly. The whole model of "named versions" and "constraint solving" is extremely brittle:

1. It reduces something that's multi-dimensional to one or 2 dimensions (a version number)

I wrote this awhile back, and it's related to the Clojure and protobuf style of "versionless" software evolution:

https://github.com/oilshell/oil/wiki/Feature-Detection-Is-Be...

2. It means that you're often running a combination of software that nobody has ever tested! In Linux distros, testing is basically done by users.

Google does a little better -- you have a test cluster, and you deploy to that first, so ideally you will have tested the exact combination of versions before it reaches end users. And you have canarying and feedback too, so even when you have bugs their impact is more limited.

https://old.reddit.com/r/ProgrammingLanguages/comments/t0lze...

So while I favor IPC, I'd say you can probably do dynamic linking well, but there are some "incidental" reasons why it's done poorly right now, and has a bad reputation. It leads to breakage and is hard to deploy, but that's not fundamental.

----

I guess the overall frame I'm trying to fight is "static vs. dynamic". What I'd say is that static is local and domain-specific, and you should take advantage of it while you can. But dynamic is inevitable at a large scale and we need better mechanisms to deal with it, better ways of thinking about it, better terms and definitions, etc.

Shell is of course all about dynamic composition -- it's basically unheard of to recompile all the programs a shell script invokes :) The language is such that you can't even determine all the programs it invokes statically (but Oil is going in that direction for deployability / containers)

----

Oh and the language project wasn't mine -- I just circulated it. I did 2 related experiments called "OHeap" that were protobuf/capnproto like, which I described on the blog, but I'm not using them, and they were only mildly interesting. In Oil there's not a hard line between front end and back end, so serializing the AST didn't end up being a major requirement.


Yes, there are no atomic upgrades. Even when a static binary is replaced, there are many instances of it running and the processes aren't restarted atomically.

However, in a protobuf-based system within a single organization, you can at least say that every deployed binary expects an API that was defined by some version of the same protobuf file. At Google there is (or was - it's been a while) a linear history of this file in source control somewhere. That limits the possible API variations in existence.

By contrast, in the acko.net article, he describes the ad-hoc variation that happens when many organizations make their own extensions to the same protocol, with little coordination with each other. (And yes, the web is like this too.)


Adding one more thing: I'd also say there's a big asymmetry in DLLs vs. static libraries. DLLs that can be independently upgraded pretty much have to use the C ABI, which is more like RPC/IPC. It doesn't really make sense to have a pure Rust or C++ DLL -- you lose all your types and type safety; you have to write glue.

So actually I'd say IPC and DLLs are closer together, and static libraries are on the other end of the spectrum. IPC and DLLs are both dynamic, and that's the important distinction. That's what a lot of the decisions in the acko.net article hinge on




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: