What’s Wrong with Git? A Conceptual Design Analysis

shadowmint · on Dec 30, 2013

Humm, well, you can summarize this paper with:

1) The proposition that all systems (and thus git also) should be composed of operations that are orthogonal (no side effects), general (can be used for multiple purposes) and that the system displays a high level of propriety (available operations are limited to those strictly required).

2) Git isn't very good and any of these because it has weird side effects, the operations it does have are often very specific, and there are a whole bunch of different ways of doing similar or related tasks. ...and the author really doesn't like the idea of staging. Which seems irrelevant to the argument, but turns up 4 or 5 times in the paper.

3) Gitless is amazing because it doesn't have the concept of staged changes. See (2) about how the author really likes this:

    The elimination of the staging area was enthusiastically received as a
    major reduction in complexity, though one student missed being able to
    stage files and then only diff those staged files prior to committing 
    (using git diff --staged). We believe this to be a limitation not so much
    in the conceptual model of Gitless but rather in the detailed functionality 
    of the gl diff command, which appears to be insufficiently versatile.

Well, I believe the author would be more plausible if they didn't have a clear bias towards a result they favoured.

How about looking up, for example, a bunch of things that people use staging for, and then objectively evaluating those, compared to the alternative in a 'no staging' world?

Perhaps you could also include potential limitations of using the orthogonal/general/propriety comparison as the single test for evaluating features? Like, does following that path lead to slower implementations, where the 'concepts' are inherently limiting to the implementation, like with bazaar.

Food for thought~

jwr · on Dec 30, 2013

I also found the paper to be very biased. I use the staging area all the time and like it a lot. Also, I really enjoy git's practical approach.

It seems the authors would find hg to be much more enjoyable. It has a stricter approach ("this is how you should do it, because conceptually this is what things should be"), which some people like.

stephen_g · on Dec 30, 2013

I really don't get the part, "Despite its widespread adoption, Git puzzles even experienced developers and is not regarded as easy to use"

Am I the only person who finds Git exceedingly simple? I had significant experience with Subversion, and then tried Git on some of my personal projects. I didn't find the learning curve very steep (perhaps some of the articles and documentation I read helped a lot, or maybe some of the graphical tools I used at first), but compared to SVN, git just feels to me how a version control system should work...

Sure there are some little quirks with the interface, but it didn't take all that long to work out...

SwellJoe · on Dec 30, 2013

I didn't find the switch any harder than learning about revision control in the first place. When I moved from RCS to CVS, it was pretty easy (though I was pretty weak in RCS, and so I had to learn a lot of new concepts). When I switched from CVS to Subversion, it was really easy (Subversion is designed from the ground up to just be a better CVS). When I switched from Subversion to git, it was a challenge...but, not a really big one.

There are some things I still don't get about git. And I still find myself with some weird commits that look like I'm committing what other people just committed (because my tree was behind theirs/HEAD, and I had commits that happened and then I had to pull in order to push...I still don't know how to avoid that, other than always pulling immediately before doing any commits and then pushing immediately after committing, which feels clunky).

I tinkered with some other distributed revision control systems before it was apparent git would win the majority of the mindshare (and actually before git even existed). Someone I worked on the Squid project with also happened to be an arch and then bzr developer (and I think now works at Canonical on DVCS), so I spent quite a bit of time using those. I also found them kinda confusing. But, not so much that I couldn't get work done.

Skinney · on Dec 30, 2013

> There are some things I still don't get about git. And I still find myself with some weird commits that look like I'm committing what other people just committed (because my tree was behind theirs/HEAD, and I had commits that happened and then I had to pull in order to push...I still don't know how to avoid that, other than always pulling immediately before doing any commits and then pushing immediately after committing, which feels clunky).

Git pull --rebase?

smharris65 · on Dec 30, 2013

When you commit other developers work you are merging in the wrong "direction".Please read: http://tech.novapost.fr/merging-the-right-way-en.html

kyberias · on Dec 30, 2013

With all due respect, could it be that people use only a very limited subset of git features (that are simple to use) and/or still don't understand how the features _really_ work. Personally, I've several times thought I understand git only to realize later that I've been mistaken although I've been relatively successful in using it. I hope that makes sense. :)

Thrall · on Dec 30, 2013

I found this to be one of git's greatest strengths; Beyond the basics, it doesn't force you to use features you don't need or want. Initially, I just committed snapshots in a single branch. Later, I found I wanted to do something experimental, so I learnt to branch, and so on…

Now, I would miss git's more advanced features if they weren't there, but when I started, I distinctly remember thinking, "For now, I just want to be able to change my code non-destructively, so I can easily revert it if I make mistakes". Git did that, and did it well.

tacticus · on Dec 30, 2013

Nope not the only person. I found git incredibly easy to get the first few times i used it.

wakeless · on Dec 30, 2013

I personally think that the Git approach is closer to how I envisaged a (non-distributed) VCS should work. I'm the same as you, coming from SVN it was a godsend.

stiff · on Dec 30, 2013

This seems to boil down to a criticism of the staging area, so it is strange its purpose is never clearly explained in the paper. The reason why the staging area exists, is, I think, that for larger teams working long term on a code-base, it is crucially important for the version control history to be very, very neat, with logically separate changes in separate commits, with very clean commit messages for each change, with the right changes going to the right branch (eg. you don't want to commit a change that can go live immediately to a branch that's going to be released in 3 months) and so on. That's also why git makes rebase such a big deal, I guess Linus spends a lot of time getting people to use the VCS right even after the changes themselves are more or less right, and thanks to rebase and the distributed model he at all is able to do corrections related to version control and branch/release management before changes enter the main repository.

People aren't really good in remembering about those things up front, that's why Git introduces the staging area, so that you can work as usual and only after you are finished with whatever was occupying your mind you can consider splitting your work into nice commits, which can be quite a task in itself to do right. If you remove the staging area, and want to incrementally build up a few commits from the changes you made, you end up having to pass in again and again a lot of parameters to the commands executed, first to git diff, then to git commit, and it's easy to do a mistake and diff something other than the changes that will actually be comitted in the end.

A lot of people, especially in small teams, use version control very sloppily, and then get confused about conflicts, changes are hard to track down in history etc. Remember that Git was build for maintaining Linux, which has an absolutely huge number of people working in parallel - in this case you really, really have to care about using the VCS tidily, actually understanding its concepts very well, and not just churn away commits, or you will just fail to integrate the changes correctly. So, frankly, while I like the general discussion in this paper and its approach, it seems to me a bit confused with respect to Git, I wonder whether the authors have any experience in doing long-term software development in a team and especially doing software integration and in using a VCS for that purpose. Once you have a few people, or more than one team working on a project, a few testing servers, and a few different release branches, the concepts in Git do make a lot of sense.

haberman · on Dec 30, 2013

Staging seems like a perilous way of creating clean commits. If you make a commit out of only some of the changes in your working tree, the result of a commit will be a filesystem state that never existed in your working directory, and thus was very likely not tested as committed.

exDM69 · on Dec 30, 2013

I work on a huge codebase using Git together with thousands of other developers. The staging area is a very welcome tool, because often I have to do little changes that I do not want to commit, most often working around some little mistakes by another developer on the other side of the world.

Fixing a small problem in a Makefile, adding a few debug prints somewhere, disabling an unrelated failing assert, and little unrelated fixes like that. It does not make sense to fix and commit them because that would be a distraction and duplicate work because someone is most likely working on it already. But I do need to make my builds pass and my tests run to get on with my work, using little fixes I do not want to include in my commit.

The staging are may be confusing to newbies but it is very useful in large projects with a big team.

chmike · on Dec 30, 2013

I'm not an expert Git user, but I'm impressed by gitless after reading this document.

The use case you refer to is one I frequently have. But I don't think staging is the right solution for that. For one it's limite to file granularity, for two your local changes can be lost by a reset.

So it seems preferable if we could store our local changes as commits in a branch. In this way we can easily identify these local changes and add new one.

When we commit we should be able to specify if the commit is just a local change to fix something for a test or a change to be published. I think git has the tools to do it with cherry pick, but maybe there is a simpler way.

asuffield · on Dec 30, 2013

Ah, you've made one conceptual error here that explains everything, and it's hiding right at the end: "tested as committed" implies that committing is some big special thing you do at the end after all your work is finished. You probably picked this idea up from cvs.

This is not how git is intended to be used and you'll hit a lot of friction if you try to work this way. You should be committing before testing - commit early, commit often, commit everything. git has the idea of easy, painless, zero-risk commits built in from the ground up. You never inconvenience or screw yourself over by committing. You can always edit a commit.

The workflow you are expected to use with git is: code, commit, code, commit, edit commits into a presentable form, test, edit commits more to fix issues, push, ask people to review and pull.

You can fix your conceptual model by replacing every instance of your notion of "commit" with "push". Pushing is the operation that you are expecting commit to be.

(The staging area is the editor for your commits)

haberman · on Dec 30, 2013

This doesn't really have anything to do with my point. Even if you later edit commits before pushing them, it is still the case that any partially-staged commit that you do ultimately push will reflect a filesystem state that never existed in your working directory (unless you manually checkout the intermediate commit later as a "detached head" and test that).

Maybe some people don't care that every public commit can actually build and run tests, but any broken intermediate tree will break git-bisect or similar tools.

structural · on Dec 30, 2013

As a matter of fact, manually checking out all the intermediate commits and building/testing those is precisely what our team does.

Except, it's not done manually, we have a set of tools that walks along a branch and tries to build each revision, storing the build/test status using the commit's SHA-1 hash as a key so that we don't waste effort rebuilding things unnecessarily. Then, this is used for our code reviews to verify that each commit on the branch to be integrated has a clean build/test status saved -- or an explanation why not.

We've given some thought into writing a version of git bisect that takes this cached data as input to select better trees to try given some set of known broken commits, but that hasn't happened quite yet.

wiremine · on Dec 30, 2013

In theory that would seem like a problem, but I've never run into that problem in practice. I think this is for two reasons:

1. It actually makes for _cleaner_ commits, because you can group changes in whatever order/grouping you see fit. So, if you need to group part of file A, all of file C, and part of file D into a commit, you can.

2. Committing changes and publishing those changes are two separate actions. So, you are free to group your work in whatever way you think makes sense, and then publish all those changes collectively. So, if you have 16 commits, you can push all 16 of those commits together. This reduces the odds a fellow committer gets a partial changeset.

Of course, git is a powerful tool, and it will happily let you commit and push a partially finished changeset. Great power, great responsibility, yada yada yada. See http://git-scm.com/book/ch6-4.html.

haberman · on Dec 30, 2013

> So, if you have 16 commits, you can push all 16 of those commits together. This reduces the odds a fellow committer gets a partial changeset.

Your argument here is that it doesn't matter if intermediate trees are broken, because no one will actually use the intermediate trees. But this approach will break anything that does use every intermediate tree, like an automated testing suite or git-bisect.

EDIT: this is a bad downvote. I have accurately paraphrased the parent's point, and made a valid counterpoint. If you're downvoting me, you probably didn't understand what I am saying.

Crito · on Dec 30, 2013

The idea is that you do not give intermediate broken commits to things that you do not want to have broken commits. That is why "rewriting history" is so great. Commit early and often, then publish only working code. Have your cake, and eat it too.

haberman · on Dec 30, 2013

> The idea is that you do not give intermediate broken commits to things that you do not want to have broken commits.

But how do you know if the intermediate commits (aka the ones you want to actually publish) are broken or not? If you partially staged them, you don't know because the tree you committed never existed in your working directory. The only way to know at that point is to checkout every intermediate state later (aka after the commit) and test it then.

Crito · on Dec 30, 2013

You test "intermediate" commits the same way you test any other sort of commit. They are just commits, not something special.

You don't seem to be comprehending the divide between committing something, and actually giving that something to the world. In git it is perfectly natural to test after a commit. If it doesn't work, you just edit the commit. Of course testing after pushing should be treated delicately...

haberman · on Dec 30, 2013

> You test "intermediate" commits the same way you test any other sort of commit. They are just commits, not something special.

An "intermediate" staged commit is special, as I have tried to explain many times now. If you stage a commit by using git-add to include only some of the changes in your working directory, you are committing a filesystem state that never existed in your working directory prior to committing it.

You can disagree that this matters, but you cannot disagree with the above statement. It is simply a fact.

Let's walk through a scenario. Suppose I am at the point in development where I actually want to create the commits that I will publicly push (ie. this is not a "commit early, commit often" scenario). All of my final changes are in my working directory, but for cleanliness I want to break them up into several commits by staging them.

   $ git status
   <git prints a bunch of "Changes not staged for commit">
   $ git add foo.c bar.c
   $ git status
   <git prints foo.c and bar.c as "to be committed", baz.c
   is still "not staged for commit">
   $ git commit

At this point I have committed my changed foo.c and bar.c, but the commit does not reflect my changes to baz.c. That means that I have not actually tested that the tree state at this commit works. If I run my tests right now, they will not reflect whether the commit is broken or not, because my working tree still includes my uncommitted and unstaged changes to baz.c.

I do have a couple of options now. I can stash my changes to baz.c and run the tests; then "stash apply" them if the tests pass. This is probably the best option for ensuring that my staged commit is not broken. I can also commit my changes to baz.c, then checkout HEAD^ and run the tests, but this solution forces me to mentally keep track of which commits I have tested; if there are many commits in the series, this is burdensome.

Both of these solutions are entirely possible; I'm not denying this. The "stash" solution is probably the best option for this. But that said, it is fundamentally true that when you (partially) stage a commit, you can't test your commit prior to committing it. So when you write your commit message and mentally perform the process of deciding that you like this commit enough for public consumption, you don't actually know if your tree is broken or not. Even though there are workarounds, I still think this is a bit clumsy.

Crito · on Dec 30, 2013

The point is that any test system that allows you to test commits will also allow you to test intermediate commits, since they literally are the same thing. A commit is a commit, it doesn't know if it is pointed to by a ref only, or if other commits point to it.

Making sure that you are testing with clean checkouts is important whether or not you are testing an "intermediate" commit. Hell, if you were using SVN, you could have ignored files that alter the behavior of the tests. Being wary of that sort of thing is important no matter what system you use, it isn't some special property of "intermediate" git commits.

haberman · on Dec 31, 2013

> The point is that any test system that allows you to test commits will also allow you to test intermediate commits

Sure, but such a "test system" is not a part of Git. For people who are using Git simply, with a repo on GitHub or similar and no custom infrastructure, the only "test system" they have is running their own tests manually before they push.

> Making sure that you are testing with clean checkouts is important whether or not you are testing an "intermediate" commit.

Yes, but the functionality of partial staging is fundamentally opposed to this, because it explicitly creates a commit that did not come from a "clean checkout." However I am giving up on the hope that you (or anonymous downvoters) will acknowledge or accept this simple point.

I'm a Git fan actually, but it's tiresome to debate with people who can't see both the plusses and minuses of their tools.

anon4 · on Dec 31, 2013

But you can do broken commits just as easily with svn - I've seen people commit all changed files, forgetting to add a new file and making an unworking commit. You can as well commit any files you want and make as much of a mess as with git. Except with git you can test your commit before it's pushed to the server.

haberman · on Dec 31, 2013

> But you can do broken commits just as easily with svn - I've seen people commit all changed files, forgetting to add a new file and making an unworking commit.

If you do that, you've made a mistake. No tool can save you from all mistakes (though they can help you out with warning messages and such).

Partially-staging a commit in Git is doing the same thing on purpose. It's functionality that explicitly helps you make this mistake.

etherealG · on Dec 30, 2013

if you don't trust yourself to make that judgement, try this workflow:

stage, commit, stash

now your working copy does match the state you committed, and you can test away to your hearts content. if you find a problem, fix it, and commit --amend. keep this cycle going till you're done.

stash pop, carry on working

almost every time i see a criticism of git it's about the way you use it not the tool itself.

caster_cp · on Dec 30, 2013

>almost every time i see a criticism of git it's about the way you use it not the tool itself.

In my point of view, you cannot separate "the way you use the tool" with the "tool itself". A hammmer is easy and straightforward to use because of its design features. That is what makes the hammer useful. Granted that there are complicated tools, that require a steep learning curve (a pipe organ, for instance), but that should not be the case of git.

The whole point is that the tool should make it easy for you to do what you intend to do. That's the whole point of criticizing an application (or tool, as you put it). I am pretty sure that you can make beautiful and exact things with Git, but the fact is that sometimes they are difficult to perform or counter-intuitive, and that's the crux of the criticism.

A tool should not be designed only to "allow" people to do certain things. It should also make these things easy and straight forward.

It's impressive how (most of the times) our usage of a tool is directly linked to how it was designed. Therefore, design features (like the ones proposed in the article) cannot be distinguished from the "core" of the tool, or the functionalities it allows one to perform. The design, in some sort of way, is the tool. And that's what conditions our usage of it.

revscat · on Dec 30, 2013

> The whole point is that the tool should make it easy for you to do what you intend to do. That's the whole point of criticizing an application (or tool, as you put it). I am pretty sure that you can make beautiful and exact things with Git, but the fact is that sometimes they are difficult to perform or counter-intuitive, and that's the crux of the criticism.

I'm not sure I agree. There are tools that are inherently difficult, because the problem they attempt to help with are inherently complex problems: architecture, MRIs, corporate taxation, managing pilot and crew schedules for airlines, etc.

Managing source code for any system of sufficient complexity falls squarely into this domain. Git tackles this -- nicely, I would argue. Among other needs, VCS's need to separate code changes into manageable chunks, store them in a compact manner, and be able to distribute those changes efficiently over a network.

Git handles these quite nicely. Separately, you would like developers to have the ability to commit changes in small, related chunks, all while simultaneously preventing conflicts -- or at least making them difficult. Git does this as well.

> A tool should not be designed only to "allow" people to do certain things. It should also make these things easy and straight forward.

Again, I'm not sure I agree with this premise for all cases. Tools should be as complicated as they need to be, and no more. The basic workflow behind git -- add, commit, pull/push -- is not overly complicated, and I must be honest in admitting that it puzzles me when it is otherwise claimed. Is it easy? Apparently not for some. My personal path was CVS, SVN, MKS, Perforce, then git, and it did not take me long to understand the benefits of git over the others I had used.

It was pretty straightforward. Different, but hardly intractable, especially for a tool which is so singularly important to me as a developer. In that case I do not mind complexity, given the flexibility that is gained and, frankly, since it's what I do for a living.

scpotter · on Dec 30, 2013

>In my point of view, you cannot separate "the way you use the tool" with the "tool itself".

Love this. The question then becomes do you change the too, the way you use it, or some of each.

>A tool should not be designed only to "allow" people to do certain things.

You lost me here, because that's the definition of design. There are many hammer designs, some more general purpose like the claw hammer and many others (tack, 2lb) with a more limited intended purpose. Git is designed for very large scale teams, and staging seems to be integral to that. The paper asserts the complexity of staging is unneeded, but in my view doesn't adequately demonstrate that to be true for large teams. The paper goes on to describe an alternate, simpler product which better meets the needs of simpler users using and alternate, incompatible conceptual design. This is fine, but I'd be more impressed if it started by fully demonstrating a concept is not needed for the intended users before removing that concept and then described a pathway from a simpler concept to the fully complex conceptual model.

Flimm · on Dec 30, 2013

I would, except git stash doesn't do what you expect if you have staged and unstaged changes in the same file.

See this question for more details: http://stackoverflow.com/questions/13780700/how-can-i-run-gi...

bonobo · on Dec 30, 2013

I don't think this problem applies though. You're stashing right after a commit, so your stage is empty at this point.

etherealG · on Dec 30, 2013

a fair point, i'd argue that it's a bug in stash's behaviour that disallows this. personally i've had experience with stash working ok in this scenario often in the field, so perhaps only certain split files cause the issue. most likely when the divergence of the stash and the commit are in very close proximity in the file.

worth noting here is that stash behaviour has improved a lot in more recent versions, e.g. stash pop can merge with your working tree now, whereas a while back it would just fail to apply and you'd be stuck having to do a patch and apply instead.

i think the workflow i outlined is intended to work correctly and eventually will for all cases rather than just the majority. hopefully that doesn't scare people away.

out of interest do you know of any other way to do a similar kind of thing with any other tool? i find having to deal with these edge cases still way better than the alternative of having no staging area and only being able to work on 1 thing at a time personally.

stinos · on Dec 30, 2013

and thus was very likely not tested as committed

that is an argument that can be resolved - hereby yielding staging a nice way of creating clean commits. The developper should simply build/unittest at least the project(s) directly involved in the commit. Apart from that, the CI system should take care of the rest. es if you consider git as a standalone tool your argument makes sense, but how often is git used like that, and not as a part of a complete build/test system?

stiff · on Dec 30, 2013

I am talking about one of two typical situations: either you want to commit all your changes to a single branch, but in more than one commit, or, you want to distribute your working directory changes between two different branches. In the first case the problem does not occur, in the second one you can run the tests on both branches after doing the commits, and before pushing rebase if necessary, or you can use git stash --keep-index, and good unit test coverage and CI infrastructure, a good idea regardless, should also help to stay out of trouble. If you keep in your working directory serious changes that are not meant to be comitted at all but used during tests, then Git can't be blamed for that I guess.

wonderzombie · on Dec 30, 2013

Why? You can stage the changes you want to commit, and stash the unstaged changes. Run tests, &c. Commit locally but do not push. If you test more and it's still broken, you can iterate on that with git commit --amend. The rest of your change will happily live on in the stash.

As others have pointed out, nobody sees your change because it's still only local. That means you could make 10 tiny commits as you work. Then, later on, rebase and squelch a series of smaller commits into a bigger one, and then push, send a pull request, whatever.

haberman · on Dec 30, 2013

> You can stage the changes you want to commit, and stash the unstaged changes.

Yes, I have acknowledged that this is a solution to the problem (https://news.ycombinator.com/item?id=6986000). It seems a bit roundabout and clumsy to me though, in the sense that it requires vigilance to do the right thing. Things that require vigilance are easy to forget or do incorrectly.

I haven't looked at Gitless yet, but I hope that it makes this a bit more streamlined.

> squelch a series of smaller commits into a bigger one

I think you mean "squash."

verteu · on Dec 30, 2013

New users find Git difficult because it has extensive hidden state. The effect of "git commit/diff/reset" is completely dependent on the invisible state of the stage/branch/history DAG.

A competent user always knows what "git status" will output. But novices don't even understand which hidden state they must keep track of.

wging · on Dec 30, 2013

One thing I've found helpful is adding git status information to my command prompt. (For this, I use oh-my-zsh.) I always know what branch I'm on, and whether it's dirty or not.

I still need to 'git status' (which I've aliased to 'gs') sometimes but more frequently I get what I need without it.

pedalpete · on Dec 30, 2013

By 'hidden state' do you mean that Git doesn't easily show the user where they are in the process? Is something like sourcetree a good solution?

verteu · on Dec 30, 2013

Exactly -- I've found SourceTree and the Visual Git Reference (http://marklodato.github.io/visual-git-guide/index-en.html) very useful to teach beginners.

midas007 · on Dec 30, 2013

It's an education gap. A step-by-step site with shell and corresponding diagrams of high-level behavior might help.

Mithaldu · on Dec 30, 2013

In short: This guy wrote a wrapper around git [1] that practically makes it SVN, then wrote a paper about how students don't like the staging area and actually got published. [2]

The real interesting story here is how peer review managed to let something slip through that ignores the main target demographic for git, professionally working and experienced developers; and also ignores such extremely simple hypotheses such as "some people think Git is hard because most learning materials about Git are bad at communicating".

[1] http://people.csail.mit.edu/sperezde/gitless [2] http://dl.acm.org/citation.cfm?doid=2509578.2509584

shepik · on Dec 30, 2013

The paper assumes that git is hard because there are lots of concepts, like "Tracked file", "Ignored file", "File staged for removal", "Untracked file" etc. They also think to remove index as a way to reduce complexity.

But those are not the reasons why git is hard.

pedalpete · on Dec 30, 2013

can you elaborate on why you think git is hard?

stavros · on Dec 30, 2013

In my opinion, it's hard because git isn't a VCS, it's a loosely-related bunch of commands to manipulate the on-disk format of a VCS.

Its commands are, conceptually, tools to perform operations on the repo, rather than tools to version your code.

blktiger · on Dec 30, 2013

I think the main reason git is hard is many of the most common commands do too many things. For example, the reset command "reset the current HEAD to a specified state" is too general. I don't think most people consider HEAD as both a working directory and a pointer to the current commit in the repository, instead they just think of it as a pointer to the current commit you are working off of. So they start by just memorizing a couple of commands that do what they need to.

reset -- file #unstage a file reset --hard #reset everything in the working directory

Later they might learn to use reset to undo the last commit.

reset --hard HEAD~1

But they probably still don't really understand what the reset command does. I consider it to be an overloaded command. There should be a command to unstage files, a command to move the HEAD of the current branch (which can keep the --hard and --soft options), and a command to just clear everything from the stage/working directory to start over.

Other commands are similarly overloaded which makes it hard for users to understand unless they learn the internals of git.

0xbadcafebee · on Dec 30, 2013

Git is basically 1000 different programs embedded in both an application and network programming framework.

Each program is also effectively developed independently such that one does not operate the way another does. So there are several learning curves: the one to understand the framework, and the one to understand each program, and then the one to understand how each program inter-operates with others.

Some people "pick it up" without difficulty, but it's because those people don't have to swallow the entire operational knowledge at once. To really grok how anything works you have to learn it from the ground up, instead of task by task. Git is the most convoluted open source project I have ever encountered (from a user's perspective).

shepik · on Dec 30, 2013

imo git is hard, because it is hard to think of history as a tree. also, problems we solve with git are harder than what we used to solve with subversion or cvs. i mean, most of the git problems i or my colleagues ever encountered happened because of a rebase, or rebase + merge, or merge + rebase, or similar combinations, when we tried to rewrite history to make it clean and readable, and we never even think about "clean history" back then.

You can grasp in about an hour concepts like "index", "staging area" and other that the article mentions.

Someone · on Dec 30, 2013

I don't think it is hard to think of history as a tree. SCM tools have been doing that for decades (with fewer branches)

Also, git history is more like a DAG.

Also, about that 'grasping in about an hour': way too often, reading some more confuses the hell out of you again. Witness:

You can grasp in about an hour concepts like "index", "staging area" and other that the article mentions.

So, is there a difference between index and staging area? Google "git index vs staging area" gives me top hit http://stackoverflow.com/questions/12138207/is-the-git-stagi..., which does not help me.

Second hit is http://stackoverflow.com/questions/4084921/what-does-the-git.... Again, far from a clear answer.

Further googling/clicking gets me to http://stackoverflow.com/questions/6716355/why-staging-direc....

And no, the git-scm book at http://git-scm.com/documentation did not help _me_ either. It seems to have banned the use of 'index' as the (almost? More or less?) synonym for staging area.

I think a large part of complaints about git are caused by its confusing user interface and confusing terminology. Yes, terminology may have been cleaned up officially, but 'the Internet' is littered with remnants of its history.

Apart from that, one thing that I find confusing about the staging area is that it is invisible. Consequently, there doesn't seem to be a way to build what would get committed on as 'git commit' (do a 'git add X', then edit X. 'git commit' will commit the old content of X, but 'make' will use the edited content of X. Or am I confused again?)

_ikke_ · on Dec 30, 2013

index and staging area refer to the same thing. The staging area is a high-level concept, while index is more of an implementation detail (exists in .git/index).

Skinney · on Dec 30, 2013

The rule of thumb is to only use rebase for local commits (commits that haven't been pushed). This causes pain if people have commited, or will commit, work on a branch that has been rebased remotely.

I've never really thought of Git history as a tree. I just think of it as a linked list of commits, and that I have more than one list available (branches).

yxhuvud · on Dec 30, 2013

What do you have if you have several lists with a common ancestor? You have a tree!

Skinney · on Dec 30, 2013

I wasn't disagreeing, I simply said I never looked at git's history in that way. While I certainly do have a tree, I don't necessarily perceive it as such, especially when using "git log --graph --format=oneline --first-parent"

banachtarski · on Dec 30, 2013

Really? Is git really that hard? Please. The man pages are super readable and explain pretty much everything. Nobody's complaining about awk, sed, find, and the like. Git exhibits functionality on par with those tools and the user should expect a similar degree of complexity.

Argorak · on Dec 30, 2013

There is an argument to be made that many people dislike sed, awk and find because they are hard to use. Which is fine, because there are other alternatives, thats why the are not complaining.

git is a must-use for any software project that uses it. Removing (unnecessary) complexity from it would be a benefit to everyone in the team.

The question is what unnecessary is, I find the staging area one of the best features in git, especially as it allows reviewing before commit.

dscrd · on Dec 30, 2013

It's the little, seemingly unnecessary complications, like http://bitflop.com/document/111

If there was awk, sed, find which did the same thing but were significantly easier to use, I would complain about them. At least if I was forced to use them.

banachtarski · on Dec 30, 2013

My point is that you'd be hard pressed to make awk, sed, and find simpler without removing functionality. In the same way, I think git is really compact considering its feature set. The issues like the one you point to in the article occur once in a blue moon.

YuriNiyazov · on Dec 30, 2013

Man, I was obscenely happy when I discovered "git stash", and these jokers want to get rid of it. Not interested.

wrs · on Dec 30, 2013

The word "merge" occurs only once in this paper, in the overview. That is an indicator of the superficiality of the analysis overall.

Yes, if your needs are so simple that you never perform a merge, Git is too complicated. If you work on a large team with a lot of parallel efforts going on, you know why all the Git functions are there, including the staging area.

I applaud the idea of a "single-user" or "training wheels" Git that has a simplified model like this, but claiming you're "analyzing Git" when you limit the domain of your analysis so severely is rather misleading.

EDIT: And also, I don't think the approach of working backwards from "here's how Git fails to support what the users thought Git was supposed to be doing" rather than forwards from "here's how Git fails to communicate to users what it's actually doing" is the most productive way to do this.

__david__ · on Dec 30, 2013

The authors seem confused by the purpose of the "Assume Unchanged" feature. According to the initial commit [1] it seems intended to be used as a speed optimization for crappy filesystems, and not as some way to avoid committing files with changes.

They also say:

> Of course, the user might make the set of files explicit on every single commit (leaving out the database configuration file), but this is laborious and error-prone.

I find this amusing given that I don't ever use "git add -u" or "git add -A" in my daily git life—"git add -p" is as close as I get (and commit-patch [2] is nice, too).

[1] https://github.com/git/git/commit/5f73076c1a9b4b8dc94f77eac9...

[2] http://porkrind.org/commit-patch/

midas007 · on Dec 30, 2013

Git absolutely fails at the claim of being good for remote developers on slow links. If the network drops during such an op, feel free to enjoy starting from scratch.

babas · on Dec 30, 2013

What "op" exactly? git push/pull? git push/pull are bandwidth efficient and atomic. A failed pull/push wont destroy your local/remote copy.

Argorak · on Dec 30, 2013

It tends to drop your connection on large packs.

Thats more a server configuration thing, many git hosts disconnect slow clients.

I was in Botswana for a few weeks in November and I can relate to that sentiment: git was unusable down there.

midas007 · on Dec 30, 2013

Yuck, BTDTBTTS. Satellite latency is eye-stabbing, almost as bad as the utterly worthless Mountain View's Google Wi-Fi. Local internet elsewhere, in random countries, GFL... bring your own or start a service (yeah, a friend made some serious cash putting up a service on some Greek island).

Anyone on OSX can feel some pain just by enabling Network Link Conditioner prefpane by creating a profile as follows:

  Download Bandwidth: 256 Kbps
  Downlink Packets Dropped: 90%
  Downlink Delay: 1000 ms

  Uplink Bandwidth: 256 Kbps
  Uplink Packets Dropped: 90%
  Uplink Delay: 1000 ms

  DNS Delay: 2000 ms

Setup instructions:

http://mattgemmell.com/network-link-conditioner-in-lion/

midas007 · on Dec 30, 2013

The issue is never being able to complete the fetch or push because of a spotty network.

It goes without saying that source control should not screw up my local or remote copies.

Shish2k · on Dec 30, 2013

Given that 90% of operations don't touch the network at all, I'm not sure how that counts as absolute fail. With the exception of doing an initial clone on a connection that's so slow and unreliable that "clone --depth 1" doesn't work, I can't think how git would stop a remote developer from working, only how it would help them...

joaomsa · on Dec 31, 2013

I tend to get around large network operations on slow connections by rsyncing the repository from the remote mirror, doing a git clone of the now local mirror with no hard links to a local version, and finally setting the remote origin of the local version to the original remote mirror.

I agree that these sorts of mental acrobatics would be unnecessary if git had proper resume support, but for me at least it's not a total showstopper.

dschiptsov · on Dec 30, 2013

One could almost smell MIT - emphasis on a proper methodology conceptual integrity and links to Gabriel's "Worse is better".) The guys made my day.)

NumberSix · on Dec 30, 2013

Git has many problems.

(1) Many git commands have many variants that do different, sometimes very different things.

(2) There are many different ways, including variants of many different base commands, to do very similar, but not identical things. These are not aliases for the same action, but rather commands with similar, overlapping, but still different effects. For example, should I do a "git reset", "git revert", "git checkout", or more complicated acrobatics with branches, merges, rebasing etc. just to discard some work that I don't want?

(3) One of the many adverse consequences of this is that if you Google how to do many things in Git, you will often find several conflicting answers. Unless you already know a lot about Git, in which case you don't need to Google how to do something in Git, this is very unhelpful.

(4) Git uses cryptic 40 character hexadecimal SHA-1 codes to identify commits rather than a sequential numbering system. This means for example that one cannot tell automatically from two SHA-1 codes which commit or file came before or after the other.

(5) Git's branching scheme makes it difficult to set up a traditional test/development/production system where a developer can easily checkout the production code for the system except for their sub-system AND the development version of one or more other sub-systems.

(6) Git's branching system tends to result in work being scattered across dozens of private branches belonging to different developers or teams making integration difficult.

(7) Git user interfaces fail to hide the low level, complicated command line interface from users. Something usually goes wrong that requires reverting to the command line to sort out what happened and fix it.

(8) It is difficult to attach human readable names such as "release candidate 1" to git commits. There are "tags" but by default they are not pushed to remote repositories. It is possible to set up repositories to block a push with human readable tags; the remote git user lacks permission to push a commit with the tags.

(9) Git's extreme complexity means it is used in very different ways at different companies and organizations. Different companies and organizations often add further systems on as wrappers around Git, e.g. the Gerrit code review system. For example, some Git users essentially never use rebase while others have a process that makes heavy use of rebase.

(10) Git enthusiasts usually respond to criticisms such as this by proceeding to explain or attempt to explain some complicated set of acrobatics in Git, often involving several cryptic variants of several commands, that may solve the problem but is impractical to remember and reuse or document.

(11) Git uses many different names for the same concept or component of Git such as index and staging area. This is confusing even after several months of using Git.

Crito · on Dec 30, 2013

1) True. This is a mild annoyance. Particularly I think that the -b flag of checkout should be removed and instead checkout functionality should be added to git-branch.

2) I don't see the problem. reset, revert, and commit each do different things...

3) This hasn't been my experience. When in doubt, go with the top SO answer I guess?

4) Absolutely not a problem; this is a feature. Sequential numbering is not a powerful enough concept to fit what is possible in git. Just for starters it begins to break down when you realize that relativity of simultaneity kicks in with DVC.

5) What? How? I do that all the time...

6) If your team is refusing to work with each other, that is an organizational failing. You've got some problems there that are not related to git.

7) Wait, what UI are you normally using?

8) I suppose I can see this would be annoying if you are frequently pushing very large numbers of tags. Though the second complaint does not make sense to me; why would you configure your repo to reject pushed tags if you want to push tags? If you want to do that, then don't do that.

9) "That's a feature."™

10) Zero cryptic commands above. I aim to please.

11) Eh, such is language. I wouldn't be opposed to tightening up the docs, but I don't believe anybody really has serious issues with this.

doubleshotmocha · on Dec 30, 2013

There's a lot of things I felt like I've said before when reading this

molalala · on Dec 30, 2013

love it