I hate having to individually update my Wordpress install, my Rubies, my system packages, my IDE (be it Eclipse or Android Studio), a separate TeXLive install from the OS packages, even Vim now has its own package management with Pathogen...
There must be some way to _unify_ this proliferation of software update mechanisms.
edit: It would be ironic in the extreme if Windows of all platforms manages to get this out the gate before the Linux community considering how awesome stuff like apt-get, yum, and emerge are... Oh well, just goes to show that open-source giveth with the one hand and taketh with the other.
Nix runs on any linux distro because all packages are stored in /nix, so it's not trampling all over your files in /usr, /bin, etc. It also runs on OS X (packages work, but are a bit less well tested, and the binary replacements are currently not as easily available as they should be). In the past it has been made to run on BSD and Windows through Cygwin, but these need a lot more work before being production ready.
It's also significantly more flexible than other package managers due to how it can handle multiple versions of the same package in an elegant way. This makes unifying diverse package sets an actual possibility. Apt, yum, etc are too limited to unify package sets in this way, an creating another layer above them in order to integrate them is hacky, ugly, and unlikely to work well.
They are the only package managers I know of capable of unifying package management with a single set of tools. Unifying multiple distinct package managers is a fool's errand.
I don't see how they can. Guix/Nix don't run on Windows, so they can't replace the NPM or PIP or Ruby Gems, first off. But I don't even see anything about Guix/Nix that make them especially capable of unifying Linux package management. (Heck, given that there's two of them, I don't see how either can be said to unify the specific kind of Linux package management they do.)
Correct. On Windows you can't do any better than a swarm of language package managers. Without the ability to create a custom Windows distribution starting from a systems level package manager, there's no hope in having anything better.
Different package managers make different design decisions that make them incompatible with each other without sacrifice. Guix/Nix focus heavily on reproducibility and not relying on any third party binaries. These features would have to be thrown away if it unified pip, npm, etc. because they make no such guarantees.
Every package manager works different, and trying to accomadate all of them with a unifying tool will require a lot of time wasted writing interfaces between them all. I don't see a way to do it without lowering the feature set to the least common denominator and settling with that. The real solution is to elevate our system package managers to the point of handling the important use cases that currently only language package managers provide, such as virtualenv/Bundler style management (i.e. installing packages somewhere besides /). Nix/Guix accomodate all such use cases, which is why I promote them.
> These features would have to be thrown away if it unified pip, npm, etc. because they make no such guarantees.
Again I disagree. For example in this case, it would just mean that when installing from nix repos you get the reproducibility etc guarantees, installing from upstream repos would get you vanilla version and installing from OS repos would get you an integrated/patched version. But the installation process (from user point of view) could still be unified, because regardless what happens behind the scenes the high-level stuff is pretty much the same; nix -i/apt-get install/pip install could definitely be unified under one umbrella tool.
> I don't see a way to do it without lowering the feature set to the least common denominator and settling with that.
And the common denominator would probably cover 80-90% of uses. For the rest you'd still have to option to delve deeper and use some implementation-specific tools if necessary.
If we've any hope to unify package management, we need to get to the essence of what package management is. It's really quite simple though - it's the ability to say that one piece of software depends upon another, and to have a piece of software which can automatically resolve the dependencies (which form a DAG). To construct our DAG we need a list of nodes (the packages), and a list of edges (the dependencies of a package).
If we say that packages are basically just binary blobs of data (say, a .tar.*), then we might construct our database to identify (key) our package payloads. I'll use some pseudo pgsql for illustration purposes.
CREATE TABLE packages
(
payload bytea NOT NULL,
package_name character varying NOT NULL,
CONSTRAINT pk_package PRIMARY KEY ("package_name")
);
CREATE TABLE package_dependency
(
dependant character varying NOT NULL,
dependency character varying NOT NULL,
CONSTRAINT package_dependency_dependant_dependency_key UNIQUE (dependant, dependency)
CONSTRAINT package_dependency_dependant_fkey FOREIGN KEY (dependant)
REFERENCES packages (package_name),
CONSTRAINT package_dependency_dependency_fkey FOREIGN KEY (dependency)
REFERENCES packages (package_name),
);
Simple. But we're mising a bit here. We need to update software, so a name is not sufficient to identify a dependency. Lets add that.
CREATE TABLE packages
(
payload bytea NOT NULL,
package_name character varying NOT NULL,
version integer NOT NULL,
CONSTRAINT pk_package PRIMARY KEY (package_name, version)
);
CREATE TABLE package_dependency
(
dependant character varying NOT NULL,
dependency character varying NOT NULL,
dependant_version integer NOT NULL,
dependency_version integer NOT NULL,
CONSTRAINT package_dependency_pkey UNIQUE (dependant, dependency, dependant_version, dependency_version),
CONSTRAINT package_dependency_dependant_fkey FOREIGN KEY (dependant, dependant_version)
REFERENCES packages (package_name, version),
CONSTRAINT package_dependency_dependency_fkey FOREIGN KEY (dependency, dependency_version)
REFERENCES packages (package_name, version)
);
Cool, now we have a composite key we've got sufficient information to idenfity a dependency right? Well no, we now have the problem that the same piece of software with the same version could be distributed by different vendors (with different dependency chains/configurations, etc). We had to modify the original solution to get here rather than extend it. Let's modify it again!
CREATE TABLE packages
(
payload bytea NOT NULL,
package_name character varying NOT NULL,
version integer NOT NULL,
vendor character varying NOT NULL,
CONSTRAINT pk_package PRIMARY KEY (package_name, version, vendor)
);
...
Great, now given a combo of package_name, version and vendor, we can uniquely identify a dependency without worry. All problems solved?
What now if Vendor has a customer with different needs, and must distribute two different derivations of the same package version and number? Do we add another field for "configuration", and if so, what type do we make it? Do we just rename the package and lose the relationship that exists between them? It should be blindingly obvious by now that we're just trying to add structure where it isn't really present, and we're making the solution to the problem more and more complicated.
Now take into account the possibility that Package Manager A implements dependencies using a tuple of (package_name, version, configuration), and Package Manager B implements dependencies using a tuple of (package_name, version, vendor), then if we want to unify these models under a "one true package manager", then our OTPM needs to model things using ("package_name, version, optional[vendor], optional[configuration]), and so forth. Multiply by N package managers with their own individual quirks and you get a "unified" model which is barely unified at all, the only structure to it is really our initial solution - packages with names.
Here's a reduction in complexity: Instead of using a name, which may be ambiguous and therfore requires us to constantly add fields and change the underlying model - let's propose we have some means of creating identifiers with some means of guaranteeing uniqueness. We can basically go back to our initial model:
Now, if we want to integrate "Package Manager B" into this new model, we can extend our system. (Note keyword extend, not modify). We can implement a new table which references the base model.
CREATE TABLE pmBpackages
(
identity uuid NOT NULL,
package_name character varying NOT NULL,
version integer NOT_NULL,
vendor character varying NOT NULL,
CONSTRAINT pmBpackages_identity_fkey FOREIGN KEY (identity)
REFERENCES packages (identity)
);
And as far as pmB is concerned, this is just an implementation detail - we can hide it from any users and just present the legacy view to them.
CREATE OR REPLACE VIEW "pm_B_view" AS
SELECT package_name, version, vendor, payload
FROM "pm_B_packages" NATURAL JOIN packages;
So here's a challenge. Begin with the schema for "Package manager A" as implied above, and try to implement "Package manager B" by extension (not modification). The first point of struggle might be to notice that you need to invent "configuration" values, since "Package Manager A" requires them as part of the key, and they're NOT NULL.
Hopefully it becomes obvious now why trying to build another model on top of other overcomplicated models is the real fools errand, because none of them so far have understood the essence of the problem.
I've glossed over how we might guarantee uniqueness for package identities so far. The solution is to use a cryptographic has of the payload, under the (fair) assumption that a modern hashing algorithm is sufficiently collision resistant. There's nothing language/framework/operating system specific about SHA-1 or whatever the choice of algorithm.
I think you are massively overthinking this. Dependency tracking and all that jazz are just implementation details of individual backends, the unifying tool does not need to be aware of that at all.
Let's say we define basic API with two verbs: SEARCH and INSTALL. When the user wants to install a package the unifying tool first queries backends with SEARCH to see if they have such package, and after resolving which source/backend it wants to install it from (a process that might involve user interaction), and then invokes INSTALL to that backend. Nowhere in this process the unifying tool needs to know what black magic the backend needed to do to get the package installed.
SEARCH is the reason package management is so broken. If a developer intends a particular piece of software to be installed as a dependency, he should convey that to the users, rather than a vague criteria for which the user might hopefully get the right thing. The search needs to return only one result - the right one. If several package managers keep a package of the same name and version, we shouldn't need to keep adding criteria to narrow down our search until we get the right one (and someone could later add a package which meets all those criteria after you've published).
The heart of the problem is making a SEARCH which will always return one result, and to do this we either need to add N criteria which are collectively guaranteed to be unique - or to just stick a unique identifier there to begin with and treat the rest as information queryable from the content.
The use of a hash as an identifier solves several other problems such as having multiple repositories whereby each one could host packages of the same name and such. While we continue to rely on search without some means of uniquely identifying packages, we continue to have the social burden of making sure our repositories don't conflict with other peoples (usually the "official distro" repo - but when we're talking of unifying packages, that's a lot of repositories which could conflict. Unless we add our "vendor" flag into the mix, etc)
Is it possible to unify package management un-intrusively or not? Call the un-intrusive path A nd the intrusive path B.
A) If we were to to it in such a way that did not require us to modify all the existing package managers out there then how do we do that? I think the solutions (a database of unique hashes and so on) you are coming up with might be a way forward. But don't we still have the problem of communicating with the underlying package managers?
B) What sort of intrusion? Would package managers have to adhere to some kind of standard or API or expose a minimum amount of surface area? How would you get package manager maintainers to sign up to something like that? A summit? Who would fund such a summit? Then we'd still need something to implement that API. Somebody elsewhere mentioned PackageKit, would this fit the bill? If not, is there something else that would? And would we still need to track installs and so on with something like what you're proposing.
Remember this would ideally would on Debian-like (.deb apt-get) and RPM-like (.rpm yum) and Gentoo (.pkg emerge) and so on and so forth ad nauseum ... I mean, I'm running Ubuntu so after the Great Request for the Unification of Managing Packages Summit (GRUMPS) I want to still be using Synaptic but I want to be able drill down into sub-package managers. See what I mean?
Again I think you are overthinking this. It is perfectly fine for SEARCH to return multiple results, it is intended mostly for interactive use anyways. It won't impact dependency resolution in any way, that would continue to work as it does now, in other words each backend does its own dependency stuff without any regard of any other system.
It's a disaster. The proper way to do it is distro packages, but for some reason every language, framework, ecosystem and individual developer wants to reinvent this particular wheel. I really don't understand why.
It's because individual tools usually don't want to be tied to assumptions made by one particular distro. I actively avoid using distro packages for 3rd party development libraries and such, especially when a good tool for accessing upstream sources (eg pip) is available.
I use packages for certain tools and platforms, and libraries if I feel the library is really something I want to be a standard part of the system environment. For example, I am more likely to use the distro package of a python library (if available) if I'm planning to use the library for a system administration task than if I am planning to use it for application development. I'm also likely to use distro packages for things like apache, nginx, postfix, unless I have some case-specific reason not to.
One technical reason is that I might use two different versions of the same library in different projects and apt-get only allows me to have one at the time. I think npm and gem are brilliant on this regard.
Best of both words: docker. I consider docker an application packager.
You know, it is still simpler to make your own deb or rpm, than entirely different package system.
It is more of a case, that these different package systems were introduced on platform that lacks native one. Then, by combination of laziness/not wanting to build another package and recycling the already built binaries, they got a traction on linux systems too.
And there is a reason, why distro packages move slower - people having that deployed in production do not like breaking changes. If you want bleeding edge packages, use bleeding edge repos.
There's no need to unify the installation of those domain-specific package managers - those packages just need to provide proper machine-readable metadata to make it possible to build distribution packages out of them so there's no need to create another 10 package-datasilos outside of the system's package-manager's control.
The very idea of placing dependencies in a local location (or even worse in a global location but not controlled by the system package-manager) is so rotten from the core - it simply deserves to die.
What was once supposed to be a tool for developers to have easy access to dependencies has now crept into the area of operation and deploying uncontrollable stuff (Docker containers, 3rd party package managers, statically linked applications, manually built packages, …) into production systems has become the norm.
Actually, Composer is what's usually used these days. Composer is per-project (like npm), though, so there's no point unifying it into a global package manager.
Have you ever tried ninite (https://ninite.com/) ?
Not perfect at all - but at least it works for some of the freeware utilities / programs I'm using on my windows PC.
Simply run the generated exe to install the program. Run it later to update the program.
Hum, you could be right. I tried to install Discourse from source a while back on CentOS and gave up. A while later I tried the container method (Docker I believe) and bingo. So much magic going on under the hood though, dunno how I feel about it...
This kind of reminds me of just statically linking everything. We have not done that because it is a waste of resources but containers are all right then. :)
(OK, I'm aware of the differences, but I don't see containers as a salvation for client-side deployment.)
> There must be some way to _unify_ this proliferation of software update mechanisms.
There are many ways. The problem is there are many different environments who all use different methods for different reasons.
To a solo developer with personal control over the entire stack, the Operating System is merely one more tool in the toolbox. That dev can pick any distro he wants and then install and configure anything he wants(as root). To a postdoc researcher in a lab using the University's shared compute cluster, leveraging the OS might not be such an obvious and easy choice. Then of course there is the whole issue that Wordpress, ruby, system software, and IDEs are all developed by very different groups of people.
But back to your question: configuration management tools can help with this problem. They do require some end-user investment at this stage, in no small part due to the reasons I identified above (there's not yet an ideal default that works for everyone).
Generally, configuration management tools encourage you to declare your software, modules, packages, requirements in an abstraction layer and then have the config management tool handle the messy details of whether to use apt or yum or gems. The catch right now is that you will typically have to do handle those decisions (to some extent) yourself.
Aha! Lightbulb moment. I thought that Puppet and co. (chef and so on) were about pure configuration across multiple machines. I didn't realise they were about deployment as well. In that case I guess the problem is sort of solved but then I need to start thinking a layer higher than I have been.
It would be nice if the solution to this proliferation still allowed me to think at the level of [synaptic/apt-get/dpkg] on Debian/Ubuntu/... or [yum/rpm] on Redhat/CentOS/Suse/... and so on. Do you think this is unreasonable of me?
http://leto.electropoiesis.org/propaganda/plugins-and-packag...
I hate having to individually update my Wordpress install, my Rubies, my system packages, my IDE (be it Eclipse or Android Studio), a separate TeXLive install from the OS packages, even Vim now has its own package management with Pathogen...
There must be some way to _unify_ this proliferation of software update mechanisms.
edit: It would be ironic in the extreme if Windows of all platforms manages to get this out the gate before the Linux community considering how awesome stuff like apt-get, yum, and emerge are... Oh well, just goes to show that open-source giveth with the one hand and taketh with the other.