How are you ever going to manage Python Packages? PyPi and pip

scottyallen · on Aug 17, 2011

This article is missing the best tool of the bunch: virtualenv. Part of the fundamental reason managing python packages is such a pain (or perl packages, or ruby packages, etc) is that they're installed once for the whole system.

Virtualenv allows you to sandbox all your packages in a directory local to your development/deployment environment. You no longer need to install anything on the base system, and you can have multiple virtualenvs side-by-side. In addition, you can copy a virtualenv from your build environment to your production environment by just copying the directory.

This gives you huge wins in versioning packages, testing out different versions alongside each other, and being able to strongly validate that what you tested in your staging environment is the same thing that got deployed to your production systems.

In general, I'm a very strong believer of pushing _everything_ your production environment relies on to a single directory on your production system, rather than installing it on the base system. It's actually rather surprising more people haven't moved to this model, and that there aren't better tools to support it.

mtai84 · on Aug 17, 2011

Hi, I'm the author of the article. We've definitely looked at virtualenv but for right now we're just developing a single app. Thus far, we've been able to get away with munging the global package list but this could change in the near future. I tried playing around with virtualenv earlier this year but I couldn't figure out how to push directories OR make virtualenvs reliably relocatable. The main problems for us were/are:

1) We develop on macs 2) Our deploy environments are a mix of 32-bit/64-bit machines running Ubnutu 3) It wasn't clear if each deployment required its own virtualenv to start from scratch OR if we should reuse a virtualenv (which seemed to defeat the whole point of using virtualenv)

Got any ideas on what we could do?

scottyallen · on Aug 17, 2011

I've used virtualenv extensively for production deployment in a large environment (100+ servers), with an environment very similar to yours. We developed on macs as well, and ran RHEL in production. All building/pushing to production was done from an RHEL similar to production. Some things that worked for us:

- We deployed tarballs to production machines which unpacked to a single directory that included a virtualenv that had both all our code and all the modules we relied on, and anything else that the code needed to function that wasn't installed in a very basic RHEL install.

- We used the --relocatable option in virtualenv to remove all references to absolute paths, which meant we could copy the virtualenv around to various directories and machines and still have it work.

- We had a series of makefiles that would make/update a virtualenv, and which worked on both mac and linux. We would use this both in development and when deploying. For several packages, we had to hand tune this to work on both mac and linux, but for most things, it Just Worked.

- When we deployed to production, we would unpack the tarball to a directory whose name included a version number for the push. We then had a symlink that pointed at the currently running version. This meant all we had to do to rollback a push was flip the symlink to the previous deployment directory, and restart. This rollback included any modules that changed (since they were in the virtualenv). I haven't seen any other way to reliably do this.

The one thing you mention that we didn't have to deal with was mixed 32/64-bit environments. One nasty solution is to have two build machines, one 32-bit and one 64-bit, but there's probably a better way...

mtai84 · on Aug 17, 2011

- We deployed tarballs to production machines which unpacked to a single directory that included a virtualenv that had both all our code and all the modules we relied on, and anything else that the code needed to function that wasn't installed in a very basic RHEL install.

Ah, we're doing tarballs too but only for our code. We're having pip munge the global package list at the beginning of every deployment (somewhat dicey but with requirements files its easy to rollback package upgrades/downgrades) which does mean that packages can change underneath a running Python process :O

- We used the --relocatable option in virtualenv to remove all references to absolute paths, which meant we could copy the virtualenv around to various directories and machines and still have it work.

Hmmm this could work, I was under the impression --relocatable wasn't fully tested but if its works for you, we'll definitely take a look at it :)

- We had a series of makefiles that would make/update a virtualenv, and which worked on both mac and linux. We would use this both in development and when deploying. For several packages, we had to hand tune this to work on both mac and linux, but for most things, it Just Worked.

Yeah I took at look at virtualenvwrapper to handle the multi-environment thing but that was primarily geared for multiple apps. Probably got me more confused than necessary. I'll take a look at plain old virtualenv wioth scripts.

- When we deployed to production, we would unpack the tarball to a directory whose name included a version number for the push. We then had a symlink that pointed at the currently running version. This meant all we had to do to rollback a push was flip the symlink to the previous deployment directory, and restart. This rollback included any modules that changed (since they were in the virtualenv). I haven't seen any other way to reliably do this.

We do the exact same thing, minus the modules. We do have an issue with packages changing underneath versions but if we don't bounce the servers they should have the "old" modules already imported. We should fix this though :)

The one thing you mention that we didn't have to deal with was mixed 32/64-bit environments. One nasty solution is to have two build machines, one 32-bit and one 64-bit, but there's probably a better way...

Yeah, our solution was to leave it to pip :)

donmcc · on Aug 17, 2011

I'm relatively new to using virtualenv myself, but I think you replicate a virtualenv on a different os, architecture or location by creating a new virtualenv on the target machine then using pip to reinstall all the packages. You use "pip freeze" in your source environment to generate the package list then "pip install --requirements=<filename>" to reinstall packages on the target.

sdfjkl · on Aug 17, 2011

you can copy a virtualenv from your build environment to your production environment by just copying the directory

Not if you're developing on a different platform/architecture than you're deploying to (e.g. OS X vs. Linux). Unless you're only using pure Python modules, which excludes PIL and most database drivers.

Other than that, you're very right.

scottyallen · on Aug 17, 2011

Yes, right. However, your build/staging environment should be the same platform/architecture as your production environment, for lots of reasons other than just this. Development environments can be different, of course...

rdtsc · on Aug 17, 2011

If you target a particular Linux platform and have more than just pure Python package dependenices it might be better to learn to use that distro's native package manager (RPM for example).

Not sure if pip lets you run configuration scripts during various stages of install and un-install?

Does pip let you depend on C libraries and other system features?

How does pip handle obsoletes and transitive dependencies?

mtai84 · on Aug 17, 2011

> Not sure if pip lets you run configuration scripts during various stages of install and un-install?

As far as I can tell, you can run config scripts during installation. A good example of this would be Cython. An even crazier example of this would be uwsgi. As for uninstall I'm not sure. A lot of this has more to do with what setuptools/distribute can do not necessarily pip. Pip just makes it easy to enumerate/install/uninstall with the aid of PyPi, setuptools, distribute.

> Does pip let you depend on C libraries and other system features?

Pip isn't smart enough to track down C libraries. This is the part that kind of sucks but I don't know how pip would even do this. Python is cross-platform by nature. Being able to hook into what Windows DLLs, RPMs, debian packages you have to see what C dependencies you've satisfied seems like an ambitious endeavor. Its already hard enough to get a SINGLE package management system working well on its own :) Usually if I know I find myself missing a C library, I read the python package's installation notes, do apt-get or whatever to get all those dependencies I need, then run "pip install" I've had to do this for MySQL-python, pycrypto, and matplotlib thus far.

> How does pip handle obsoletes and transitive dependencies?

By obsolete you mean you need a newer version or do you mean you don't need that package at all anymore? As for transitive, do you mean like a chain of dependencies? I haven't run into any issues just yet... I think pip just goes out and installs everything for you. This primarily depends on how well people implement their setup.py's though.

rdtsc · on Aug 17, 2011

Thanks for responding. Sounds like pip supports most the features I was wondering about.

By transitive dependencies I mean that it will intall C when installing package A if A depends on B and B depends on C. It looks like it would do that.

As for 'obsoletes' I meant the case (this is from the RPM 'world') where on package changes names or 2 packages are combined into one, so let's say you maintain package A. Then you change its name to B because Oracle's lawyers have threatened to break your knees over a trademark ;-). Now everyone in the world depends on A so you declare that B obsoletes A so not their package will still install and corectly pull in your new package B. The same would happen if say you got together with another project and merged their project into yours. So you would 'obsolete' their old package name. Yeah, this probably doesn't apply to pip as much as RPM, it is more of feature of the package index server rather than pip itself perhaps.

mtai84 · on Aug 17, 2011

Yeah I think transitive should be fine provided each python package has a good "setup.py"

As for obsolete, you'll just have to manually uninstall the ones that were deprecated and install the new package.

Derbasti · on Aug 17, 2011

I had a rather less flattering experience. It seems half the time I pull stuff using pip, it will fetch the wrong version or fail to compile, or not fetch essential dependencies...

It is still good, but nowhere near as great as gem or apt. But maybe that is just me being unlucky...

wahnfrieden · on Aug 17, 2011

Often these are issues with improper setup.py configurations, by the authors of these packages. It's not uncommon for authors to forget to list their package dependencies in setup.py, for instance.

Forking these projects and fixing the setup.py is probably the best solution. If they're active projects and you'll want their updates, they're likely to accept your patch. If they're inactive, it doesn't really matter that you're using a fork anyway.

mtai84 · on Aug 17, 2011

You can explicitly specify which version of a package you want to install. For example, if you wanted to install an older version of Tornado,

pip install tornado==1.1

OR if you know you ONLY want tornado 2.0

pip install tornado==2.0

From my experience, pip has been able to fetch PYTHON dependencies if the author of the package wrote their setup.py correctly. I agree that things could be better with regards to C dependencies (MySQL-Python requiring libmysql5dev being particularly annoying) but this is hard to do as every environment has their own way of naming dependent libs.

kilink · on Aug 17, 2011

Maybe it's lesser known outside of the Zope community, but buildout is essentially virtualenv + pip:

http://www.buildout.org/

sdfjkl · on Aug 17, 2011

Didn't know about requirement files. Useful!