Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RFC: Evolving Rust Through Epochs (github.com/aturon)
127 points by JoshTriplett on July 4, 2017 | hide | past | favorite | 44 comments


It's really language version pinning for source code.

Rust uses version pinning heavily. This gradually builds up a legacy code problem, as old versions never really go away. A recent article on HN pointed to some bloated non-Rust phone app which shipped with four different versions of some libraries.

How does this work when crates with different epochs are mixed? The business with "core Rust" suggests that all code must be compiled with the same core compiler, because the run-time model might change, but some crates may have different epochs.


> How does this work when crates with different epochs are mixed?

The summary addresses this:

> Each crate specifies the epoch it fits within (a bit like "C++11" or "C++14"), and the compiler can cope with multiple epochs being used throughout a dependency graph. Thus we still guarantee that your code will keep compiling on the latest stable release (modulo the usual caveats),

That is, a single compiler can handle multiple epochs, meaning that anything/everything can change, but it has to change in ways that keeps compatibility with the behaviour of the old epochs.

Using crates compiled with different compiler versions (rather than language versions within a single compiler) is ABI stability, which is a whole different level beyond source/"API" stability.


> How does this work when crates with different epochs are mixed?

It Just Works. (The discussion contains some stuff about macros, which may be a bit more complex, but that's the core concept anyway.)

> the run-time model might change

Rust has no more runtime than C.

It's in fact in the opposite; the only kinds of changes that can be made in epochs are basically to the parser, everything below that has to stay the same. This is for both technical debt reasons as well as for "mental debt" reasons; the core way that Rust works cannot change.


I think he means run-time semantics.

But anyway, the promise of stability means that the semantics can't be changed in a breaking way. So having epochs is no free pass to do arbitrary kind of breaking changes, and the Rust people are well aware of that.


Other people have addressed the question about epochs, but I'd also note that we do work to mitigate the code bloat issues of multiple versions of libraries. cargo is designed to unify all dependencies on a library within each major version to a single instance (that is, you'll never build serde-1.0.0 and serde-1.1.0 together, but you may build serde-1.0.0 and serde-2.0.0).


> you'll never build serde-1.0.0 and serde-1.1.0 together

nit: unless you explicitly ask for this by an `=1.0.0` version dep. which is rare, but can be pretty useful in edge cases.


That's not correct. cargo will fail to resolve your dependencies if that happens.

For example (cargo_metadata 0.2.1 depends on serde ^1.0.2):

    [dependencies]
    serde = "=1.0.0"
    cargo_metadata = "0.2.1"
Attempting to cargo fetch will result in this error message:

    error: failed to select a version for `serde` (required by `cargo_metadata`):
    all possible versions conflict with previously selected versions of `serde`
This is an intentional design decision in cargo that isn't well known.


Shamelessly reposting my comment https://github.com/rust-lang/rfcs/pull/2052#issuecomment-312..., but I'm concerned about using semver for the tools, and what is effectively just a major version for the language.

This is backwards. Every new language feature should get its own minor version, and if tools never drop support, there is no need for a (tool) major version.


I think I agree with this.


Exactly. This is just a bad use of "Semver". It's still a "version" right? 2018 vs 2015?

If they're so concerned, maybe add another figure to the left.

Current version could be 1.1.42.0 and the "eopch" could be 1.2.3.9? Maybe breaking could be 2.0.0.0? I'd much rather see a unification around semver then create a different identifier called epoch only to stay radically faithful to a definition of semver


I feel like this is only a problem if you stick religiously to semver. This problem more or less boils down to:

1) We want to indicate that a major update to the language has happened, so we want to bump the major version number

2) We can't bump the major version number, because we're not making breaking changes, so our semver dogma forbids it.

3) Lets add a completely new version number that's even MORE important than the major version number and call it "epoch", and lets not have that follow any semver semantics!

You have a major version number already. Just bump that. Don't make up a new one. There's no federal law saying you have to follow every rule of semver, you can just make the decision to release Rust 2.0.

Don't get me wrong, i think semver is a great policy for version numbering. But you have to recognize when it's causing more problems than it solves.


They are indeed making a breaking change in that some new code will not be valid previously, and some old code will not be valid after.

They are totally correct that since everything interfaces with each other, this is not a problem in practice. But they are worried about the stupid knee-jerk negative publicity that well occur—I don't dispute that.

But this is extra complexity to pander to idiots.


Ah yes, a leading super major version to denote linking compatability would be an excellent idea!


For a little language I designed, I decided that keeping the evolutionary pathway open means introducing new keywords once in a while. But we don't want to break existing code; code is expensive. Valuable. We don't want to break valuables!

So I added a mandatory file header to the language which (amongst other things) specified the language version.

I thought this was a win... with one exception that would have bothered me a lot if this had been a language intended to achieve world domination:

Mandatory version headers totally ruin your "hello, world!" comparison chart story ;-)


Mandatory version headers totally ruin your "hello, world!" comparison chart story ;-)

Once, when the C standards committee was struggling with the semantics of "main" under Windows, I suggested that "main" not be part of the language at all. You would include <unix.h> or <posix.h> or <dos.h>, and get "int main(int argc; char *argv[]);" as a prototype, which you must define. Or you include <windows.h>, and "int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow);" is the function you must write. The runtime pulled in would call the appropriate function. "main" would no longer be a special case. This was generally agreed to be the correct solution, but too upsetting for beginners.


It'd be unfortunate to have to include a platform-specific header in otherwise portable code. Why have to include a <posix.h> versus <dos.h> if they both define main the same way? (Windows aside.)


You can use vanilla `main` for Windows apps too, just need to override the default subsystem when invoking the linker. I never understood the point of `WinMain`; IIRC there's nothing it does that you can't do just as well without it.


Or you can just tell Microsoft that they won't get special treatment and should use `int main(int argc, char* argv[])` ffs.


I believe this is the only correct answer. Or at least the best answer. Code isn't written in isolation, it is a function of the compiler used. And in a large codebase it will span many compiler versions.

1) Code written against a compiler should continue working

2) New code should be able to use new features and not pinned to an old compiler

3) The language / compiler should be allowed to innovate w/o maintaining strict backwards compatibility for all past code.

I am personally ok with and support having any large scale, long term project require multiple versions of the compiler to build.


> I am personally ok with and support having any large scale, long term project require multiple versions of the compiler to build.

Or even better, have every compiler support all versions of the language. It obviously requires some careful architecting to keep it maintainable, but that's better than every user of the compiler having to sort out versioning issues themselves.

Edit: I've re-read your comment, and I can't tell if we're in agreement or not :)


> have every compiler support all versions of the language

If having an interface that looks like the compiler do this, I am fine with it, but a single codebase ... not so sure. Large systems collapse under their own weight, having to support all versions when the current-4 already supports it seems kinda ridiculous.


Well, hopefully the changes between versions aren't too big, because at some point it doesn't make sense to consider them to be the same language.

The idea is that each version would still be translated into a common internal representation pretty early on after parsing, so it would almost be like supporting multiple versioned mime-types in a REST API.

And it's probably harder to do with languages that already exist, but if the spec is defined from the start with versioning in mind, then it might not be so difficult.


These are the things that I think Rust should really be concentrating on. How to make a vibrant language that can support 10-20-50 year codebases and still remain fresh.


If extensibility is a primary concern, it may be worth just putting sigils on all identifiers to make it completely unambiguous.

E.g. the LLVM IR representation takes that route, though admittedly it's not really intended for humans to write.


Contrast with FPC's compiler mode pragmas for different dialects. [0] FPC shows the effects of this story in the long term: there are several major divergences in behavior supported by the modes, and some individual features can be toggled per-unit or sometimes with a finer grain. [1] Since you're allowed to mix and match different units, the fragmentation poses little threat to any individual programmer, and it's possible to dial in assertions, optimizations, and calling conventions in a targeted fashion.

[0] https://www.freepascal.org/docs-html/user/userse33.html

[1] https://www.freepascal.org/docs-html/prog/progch1.html#progs...


That is part of the Pascal culture, we were already making use of such pragmas during the MS-DOS / VAX / Compaq / Apple days.


This is good news for us who want alternative implementations like gccrs. GCC (and software developers) can target static "epoch"s (c89, c99, c11, ...) rather than a moving target.

Hopefully, this will be enough to make development of GCC's rust front-end viable again.


FWIW the GCC frontend development seems to have started up again. I'm hopeful!

However, epoches don't really help alternate frontends. Rust 1.42 on epoch 2015 will still have all the features Rust 1.42 on the latest epoch has, except for the features that require an epoch shift (which will be a small handful)

But compiler versions already help here, targeting a compiler version gets you all of this.

I suspect if a compiler backend project nears completion there will be a lot more collaboration between rustc and the project in order to make evolution less painful.


> FWIW the GCC frontend development seems to have started up again. I'm hopeful!

I'm curious: any particular goals or results you hope to see out of the GCC frontend? Do they have any plans to share front-end code with rustc, such as the parser?


> any particular goals or results you hope to see out of the GCC frontend

- Support for weird architectures

- Potential for diverse double compiling

- Better crystallization of the language spec.


One thing I'm hoping is that it brings support for some more obscure arches.


The ones that GCC supports and LLVM doesn't?


Yup. There's a lot, in my understanding.


If the Rust compiler promises to support all previous epochs, can it ever remove the compiler code that implements deprecated language features?


No, and they've specifically specified it as always available. Just as GCC still supports C89.

However, they also specified this as only affecting translation/desugaring, which makes the additional code minimally invasive.


Epochs are primarily deprecated syntax behavior, so it's more like branches in the parser or other relatively self-contained bits. As the RFC says, the core language will continue to stay the same.

So I don't see this as leading to some major accumulation of cruft.


I work on the C# compiler and the answer over in that world is "no"; everything needs to be supported.


Perhaps it could move them to packages as compiler plugins? That could be more or less tricky depending on the piece though.


If "catch" becomes a keyword in a future epoch, how does code in that new epoch call functions with that name defined in libraries from previous epochs? Does Rust support stropping?


It can't, I presume code on an older epoch will have to expose it with an alternate name, or upgrade to the new epoch.

The epoch RFC doesn't attempt to improve this, but the hypothetical catch RFC can. It can for example provide an oldcatch!() macro that expands to the catch path instead of the keyword, making it possible to import things or define things. Or have some system of attributes.

Solutions are possible!


Is it possible to have a syntax extension (or even macro?) that creates identifiers that wouldn't be legal in plain source code?


It can try to; but it will fail to parse, usually (both macros and new-style syntax extensions go through a sort of re-parse step)

Old-style (AST-based, unstable, will go away) syntax extensions can do this.

But this isn't relevant, the compiler can define custom syntax extensions that do whatever it wants :)


It'd be pretty easy to add an @identifier model like C#'s.


So it looks like you end up with two syntaxes for the same feature, one of which you can use without any extra declarations. From a user's perspective, why not just use the "do catch" syntax everywhere? What happens with that syntax, is it eventually deprecated?

It seems like you end up with an 'ugly' way and a nice way, under the assumption that the ugly way will be temporary, but really both ways end up sticking around, and users can and will use them interchangeably.

I see the motivation, but realistically it may be better to just stick with the not-so-nice syntax everywhere. Or at least have a good way to unambiguously namespace future keywords - instead of "do catch", say "futurerustkeyword catch" or something. Or maybe a per-file declaration like "use __future__::catch" ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: