Dynamic software updating in C

chwahoo · on July 1, 2015

This was work done as part of my phd thesis. You can take the code and samples that are here https://github.com/kitsune-dsu and play with them. However, I wouldn't consider what has been released ready for production use. The papers were the main product of this research and are your best resource if you're interested: http://www.cs.umd.edu/~hayden/papers/kitsune-draft.pdf

I moved on (graduated!) from the project in 2012 and the code that has been released is pretty much where I left it at that time. An undergrad collaborator was doing neat work on updating Tor, so that code continued to evolve a bit after I left.

I think there may still be folks at UMD working in some ways with Kitsune, but I'm not up on the details.

billconan · on July 1, 2015

I was thinking using this to update our graphics software, but then I realized I need to backup GPU states (textures, for example) and recover them before and after the update.

lots of the servers are now using gpus for deep learning, may be adding gpu support to the framework is a good feature.

mwhicks1 · on July 1, 2015

If such states are preserved while the process that used them is still running then there is nothing to do: They will still be available to the updated program.

billconan · on July 1, 2015

I was thinking about memory mapped resources, like memory mapped io. and resource references.

maybe you are right, there might be no problem. I need to read the paper to understand better.

mwhicks1 · on July 2, 2015

Yes, memory mappings should be preserved, so I can't immediately think why it wouldn't work.

paulasmuth · on July 1, 2015

Thanks for the very interesting article. One thing I didn't find very clear from the paper was the rationale of using a framework/kitsune/all that wrapper code vs just implementing the feature natively. In other words, what is the upside of using kitsune to make e.g. redis live-upgradable vs just implementing it in redis proper [I believe that e.g. redis already uses a fork model for writing the data to disk/getting a snapshot of the data]? Is it much simpler when I am using kitsune? Do I really not have to reason about what happens when the binary upgrades? Or is this mostly useful for retro-fitting the feature to software that wasn't designed with the usecase in mind?

chwahoo · on July 1, 2015

Kitsune provides useful abstractions for modifying your program for runtime updating and tools for automating state transformation between versions of your program.

It would be entirely reasonable to implement DSU all within an app's own codebase, particularly until there's a production-ready library/tool-set. The downside would be that you'd probably end up re-implementing a lot of what Kitsune provides.

For our purposes (evaluating Kitsune-style updating on a variety of server programs), it made sense that we'd want to have a common toolset that we applied to all of the programs.

jeffinhat · on July 1, 2015

Seems interesting! It would be nice to have a diff between the base version and patch-ready version of the various software (eg, redis). It's hard to quickly understand what the integration of this into an large, existing piece of software would be.

That being said, I haven't read the paper so it may be clear as day there.

chwahoo · on July 1, 2015

The paper describes the main changes that you have to make to make your program updatable and is the best resource.

You can find all the modified programs (redis, etc) here: https://github.com/kitsune-dsu

mwhicks1 · on July 1, 2015

To add: The number of changes tends to be very small, as reported in the paper. We are talking 100-300 LOC even for applications that are 100 KLOC. And these changes are robust in the sense that once you retrofit to include them, you rarely need to make further changes of that sort -- new versions will just work.

patagonicus · on July 1, 2015

Ah, Kitsune. I studied the paper for that as well as Rubah, a DSU for Java, quite a bit while working on my bachelor's thesis - which was implementing DSU in pure Java. I was pretty suprised with how little code you needed to make it work (for the one program I was testing on at least ...).

Unfortunately due to some failed prototypes and general lack of time I barely managed to get the basics working, there are little to no tests and I'm pretty sure when I handed everything in I knew a couple of critical bugs that hadn't been fixed.

The code isn't public right now, but if there's interest I may be able to make it open source. Just have to talk to a few people first.

kozukumi · on July 1, 2015

Very cool, this is one of the reasons I love Erlang and Java. Hot-patching is so simple.

paulasmuth · on July 1, 2015

Honest question; How can you even do this (replace the running binary/java code without closing file descriptors, etc) in plain java/on the jvm? [ Without resorting to tricks where there is some trampoline code that actually does the connection handling that never gets updated. I realize that's kind-of what the authors of the linked paper are doing too, but it's certainly possible to do this "the old fashioned way" without such hacks in native code on linux]

paulasmuth · on July 1, 2015

To answer my own question here, but without being a java expert, so talking out of ignorance probably. This paper [1] argues there are only two methods to do this on the stock JVM. One is putting in some trampoline code, the other one is patching the JVM. So it looks like the stock JVM doesn't actually support restarting the program while keeping the old heap and FDs etc around.

[1] http://www.cs.umd.edu/~mwh/papers/rubah.pdf

mwhicks1 · on July 1, 2015

I was just about to point you to this paper (I'm a co-author), but you beat me to it. The JVM has a fix-and-continue updating feature that perhaps replacing method bodies as long as (a) the method is not running, and (b) it has the same type as before. This approach is limiting and also slow, which directed us to use bytecode transformation. The Rubah implementation is pretty robust at this point, at least for research software. Your reference to "trampoline code" should probably clarify: You just need ways to "restart" your threads, as you do in Kitsune. You don't have to insert trampolines for every updated method, as some prior approaches require.

paulasmuth · on July 1, 2015

Thanks for the clarification! I understand that you got rid of the "trampoline"/dispatch code for calling methods by rewriting the respective bytecode at runtime. But you still need to register file descriptors and such with some rubah runtime code, and that piece of rubah runtime code doesn't get replaced on process upgrade, right?

I guess this really doesn't matter from the user's point of view, just for the sake of curiosity, want to work out that/if there seems to be a fundamental difference (FWIW) between what rubah is doing which I understand as "hot patch the code within the running process and have some kind of 'trampoline'/dispatch code within that process to switch between new and old version, regardless if it is actual explicit code or just implicit in the rewritten bytecode, but never actually leave that original process" vs. what you'd do in native code on linux which is more like "tell the kernel to start a proper new process and then pass all open fd's and programm state to the new process" (which obviously requires that process to be written in a way that allows it to "continue where it left off") --- and the latter is not possible to do when running on the stock jvm if I understand correctly, because the jvm doesn't allow you to implement the part where you pass the filedescriptors, right?

mwhicks1 · on July 2, 2015

There's no need to register file descriptors, since these stay open during the upgrade -- the process identity doesn't change.

As far as a fundamental difference with the approach you propose: We actually tried it with an earlier system called Ekiden. We found that updating by migration was slower and more cumbersome than updating in place, but I don't believe there were fundamental issues. If you look at the related work section of the Kitsune paper you'll see a nice comparison with all known approaches, as well as Ekiden.

Pargr0n · on July 1, 2015

This is super cool. And I see the reasoning for the systems Erlang is designed for. What I wonder is whether this kind of thing is really necessary for most web architectures?

mwhicks1 · on July 2, 2015

You can do a lot with load balancing on web architectures, but some things remain problematic. If you read the introduction of the Kitsune paper, linked from the kitsune-dsu.com site, you'll see some argumentation about this.