Linus on keeping a clean git history (2009)

lukev · on Oct 4, 2012

This highlights the only thing I don't like about Git. It's an immensely capable tool, but it gives no guidance regarding the right way to do things.

Our own teams have a set of practices which are similar but different from what Linus outlines here. And different projects on my company use different practices from those.

The worst thing is that there's no way of enforcing these workflows or practices other than out-of-band social conventions. And so minor mistakes happen, all the time. Our Git projects are never as pretty as they should be.

In other words, Git provides an awesome set of primitives for source control. I'm not sure what it'd look like, but I'd like to see a product that built on those primitives to enforce a little more order on projects.

exDM69 · on Oct 4, 2012

> It's an immensely capable tool, but it gives no guidance regarding the right way to do things.

Maybe there isn't a "right way". A workflow that suits a simple desktop application is different from what is used by a kernel or another product that has dozens of targets to worry about. Similarly a web app that gets deployed in a controlled environment will most likely need a different way of working than an end-user application that goes into an app store to be downloaded and ran on a variety of devices.

> Our own teams have a set of practices which are similar but different from what Linus outlines here. And different projects on my company use different practices from those.

The culture around your product is probably very different from the kernel devs' culture so it makes sense for you to have a different model.

> The worst thing is that there's no way of enforcing these workflows or practices other than out-of-band social conventions. And so minor mistakes happen, all the time. Our Git projects are never as pretty as they should be.

Enforcing certain kinds of work flow would mean not allowing something that is currently possible. Crippling one workflow to standardize on another, while there is no clear evidence that one workflow would be the best for everyone.

Everyone has their own ideas on what is a clean history, whether it's a linear or has --no-ff merges for every feature. The most important thing is that it is useful. To me and my team that means that every commit on master should build on every target we have (dozens!) so "git bisect" won't be painful.

dkarl · on Oct 4, 2012

> Our own teams have a set of practices which are similar but different from what Linus outlines here. And different projects on my company use different practices from those.
The culture around your product is probably very different from the kernel devs' culture so it makes sense for you to have a different model.
> The worst thing is that there's no way of enforcing these workflows or practices other than out-of-band social conventions. And so minor mistakes happen, all the time. Our Git projects are never as pretty as they should be.

Enforcing certain kinds of work flow would mean not allowing something that is currently possible. Crippling one workflow to standardize on another, while there is no clear evidence that one workflow would be the best for everyone.

I agree 100%. Tools that attempt to defined culture are an enormous pain and often unusable outside the context understood by their creators. Tools that help you reinforce the culture you decide on for your project are wonderful, but they are rarely as un-opinionated as they need to be.

One thing that strikes me about source control culture is that in centralized environments people are very aggressive about installing pre-commit hooks to enforce rules, but I rarely see people using hooks for git, or even including hooks in their project as a suggestion for other developers to use. I wonder why not?

phogster · on Oct 4, 2012

>The culture around your product is probably very different from the kernel devs' culture so it makes sense for you to have a different model.

I think he meant he wants the ability to enforce a certain behavior within his own group.

fr0sty · on Oct 4, 2012

The amount of control you can exercise with hooks as well as the features available in repository management systems like gitolite should be more than adequate to enforce whatever policy you may dream up.

ajross · on Oct 4, 2012

I'm not sure exactly what you think a better tool would look like. By your own admission, there are multiple "right" ways to do branch management, and all of them are supported meaningfully by git. But, more or less by definition, a tool that enforced a "right" way to do things would disallow some of these.

So... I don't understand. Do you want a tool that makes the kernel branching style illegal, or one that breaks your own team's workflow? If you want one that supports both, how is that providing clarity about the "right" way to do things?

lukev · on Oct 4, 2012

It isn't hard to imagine a SCM tool (using GIT internally) that enforces a specific set of curated operations for a particular workflow, that teams could agree to use for a given project. You could have different such tools for different workflows on different projects.

You could even write a meta-tool that allows administrators to define and reify a workflow which would then be enforced for developers on a project.

ajross · on Oct 4, 2012

Isn't that what all large projects are doing internally? Hell, big chunks of the git chrome (things like "git am", "git request-pull", "git send-email") is precisely an attempt to write scripts to automate core parts of the kernel workflow. Github added bits of its own, like "watching" a public repository and providing a core spot for pull requests to land, with discussion and review tools. I don't understand why you're so interested in "enforcement" when projects using these tools seem to be doing just fine.

What you're saying sounds to me a lot like the stuff we heard from Java nerds in the 90's (who certainly didn't invent it, Pascal and PL/1 nuts said much of the same stuff) -- the programming environment should be designed to force the user into a particular style. Our community has, for the most part, rejected that view in favor of dynamic systems with more flexibility. Why should SCM branch management be any different?

sanderjd · on Oct 4, 2012

I think it's a balance. There is value in having people working on a given project aligned on the same basic workflow. To achieve that for a team that is currently growing or is planning to grow, you have to document what that basic workflow should look like. That "document" can be a set of social mores that are loosely enforced through complaint and argument, or an actual document somewhere, or a tool like your parent is suggesting.

Such a tool, which makes the preferred workflow very easy and excursions outside it achievable but somewhat more difficult seems like a pretty good idea to me, and not at all as stuffy and prohibitive as you seem to fear.

ajross · on Oct 4, 2012

I don't disagree at all. But in reality, those tools exist and are all around us. What would be the value of putting that stuff into git itself? Why is it a shortcoming of git that it hasn't picked one?

saraid216 · on Oct 4, 2012

You might want to look into writing hooks, perhaps? The company I work at has some simple hooks that require you to have a line stating who code reviewed your commit. Sure, it can be bypassed, but there's social pressure not to do so.

I've written my own hooks to do some automated testing on my changes, too. (I've got one that checks for trailing commas in my Javascript files, for instance.)

JonnieCache · on Oct 4, 2012

http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flo...

https://github.com/nvie/gitflow

kscaldef · on Oct 4, 2012

gerrit's permission system goes a good ways in the direction you're talking about. (Unfortunately, it's rather baroque and poorly documented.) You can specify who can submit patches, who can approve them, who can merge, whether a repository allows merges at all or requires rebasing or cherry-picking, etc.

giulianob · on Oct 4, 2012

That's why I like Mercurial a bit more. It takes a bit more work to shoot yourself in the foot from what I've noticed. They recently added the concept of "phases" so something is in draft state until you push to an external repo. At that point, the phase will change to public and it wont let you rebase it w/o doing a force command. You can also mark a branch as private and it wont accidentally get pushed out which is useful if you are doing some local prototyping.

stephen · on Oct 4, 2012

You can enforce workflow via post-commit hooks if you get a bit creative, e.g.:

https://github.com/stephenh/git-central/blob/master/server/u...

Unfortunately I haven't done a lot with this project in a few years since github doesn't allow bash post commit hooks; you'd have to run your own git server.

(Edit to add...)

So, I understand your impression that it's impossible to enforce workflow in git, given GitHub doesn't support it, and most users probably don't want to write complex post-commit scripts.

But it is actually possible.

It'd be nice if communities like git-flow/etc. codified their rules into post-commit hooks that you could install, and maybe GitHub could even vet (e.g. that the bash scripts won't nuke their servers), and provide as out-of-the-box/opt-in options in the admin section of their repos. E.g. "Enforce git-flow in my repo".

fghh45sdfhr3 · on Oct 4, 2012

It's an immensely capable tool, but it gives no guidance regarding the right way to do things.

There is no right way. Think about styling. Is there a right style? No. It is silly to argue over your code's appearance. HOWEVER! As soon as you start collaborating with people and reviewing code, a uniform style is a very nice thing to have.

Teamwork creates the need for shared conventions. And that's where your ability to convince your team members of the value of some standardizations comes into play.

different projects on my company use different practices...

It sounds like your problem is not Git, but lack of organization. I am not sure a more restrictive scm would fix that. You need to find a good way to use Git, and then sell everyone on the benefits of process uniformity.

bcoates · on Oct 5, 2012

If there's no right style, then couldn't the SCM just pick one arbitrarily so I don't have to worry about spending time communicating about something irrelevant to getting the product shipped?

mcgwiz · on Oct 4, 2012

> The worst thing is that there's no way of enforcing these workflows or practices other than out-of-band social conventions.

False. It's called the Dictator and Lieutenants workflow. It's costly, but if you're in a position where you don't trust your own developers or your conventions are severe, then it's a price you have to pay.

If you can't afford it, hire trustworthy developers or dial back your conventions.

LnxPrgr3 · on Oct 4, 2012

On a repo I've been maintaining, I'm horribly tempted to revoke almost everyone's commit access, for two reasons: to add a code review step to the process, and to be able to keep the commit history reasonably clean.

It's low-tech, but a human gatekeeper's really your only hope for enforcing whatever conventions your project has.

exDM69 · on Oct 4, 2012

It makes a lot of sense to require that a commit must go through at least 1-2 human reviewers before getting merged to master. In addition to going through automated builds and tests, if applicable.

You need more than one person who can commit to master and is responsible for the merges, but you most certainly don't need every contributor to have commit access.

mcgwiz · on Oct 4, 2012

No it's not your only hope. Your other hopes are:

- hiring developers that appreciate and obey the conventions, or

- reducing the weight of the conventions.

Simply put, if you have conventions that the developers aren't following, you're organization is dysfunctional in some way. Management should include the team when crafting the conventions, and management should take efforts to give the team time/resources to obey them.

qznc · on Oct 4, 2012

Well, git should be excellent in this case, because it is essentially the same work flow as Linus. Everybody just pulls from each others repo.

mattdeboard · on Oct 4, 2012

Based on what I've seen from popular open-source Python projects I've used in the past (Fabric, Haystack and basically anything else by daniel lindsley) having a single human gatekeeper is the express lane to hell. If you do that make sure you have at least a few core contribs who can approve commits.

qznc · on Oct 4, 2012

Linus is the single gatekeeper of Linux.

EternalLight · on Oct 4, 2012

No, he isn't. He's the final gatekeeper but there are several of lower level gatekeepers/maintainers for different parts.

At least according to rumors, I've never invested the time to get involved myself.

dfc · on Oct 4, 2012

There are an awful lot of maintainers for smaller individual pieces/subsytems of the kernel. Take a look at the MAINTAINERS file to see who is responsible for the smaller chunks:

  grep -B1 ^M: /usr/src/linux-kernel/MAINTAINERS

jrochkind1 · on Oct 5, 2012

I've never even looked at any linux source code, but I still know there's too much of it to believe Linus reviews and signs off on every single commit personally.

misiti3780 · on Oct 4, 2012

http://nvie.com/posts/a-successful-git-branching-model/

guelo · on Oct 5, 2012

And the accompanying tools for it: https://github.com/nvie/gitflow

wickedchicken · on Oct 4, 2012

This is great for people who are that organized. I'm not, so I like the 'just merge everything into master' mentality. See http://scottchacon.com/2011/08/31/github-flow.html

DigitalJack · on Oct 4, 2012

My main issue with the described github-flow is that they push development branches to the server, and encourage that to be done very often.

And my issue with that is once you push something, it's off-limits to any kind of archaeology in the history. And that's not a "principle" thing. If you push your branch, do some rebasing and push again, you are in a world of hurt.

The operation will very likely fail, and recovery is a serious pain in the butt.

If you don't ever do any sort of archaeology, then that's great and it will work for you. I have had numerous occasions where I've tried some git merge or something and screwed things up. I've fixed it by putting my Indiana Jones hat on and digging in.

Being able to tamper with the history has gotten me out of trouble many times. The only time it has gotten me in to trouble is rewriting history that has been published.

jrochkind1 · on Oct 5, 2012

And yet if you _don't_ push it to the server, then nobody else can see it. And don't you want other people to see develop branches to give feedback and even to collaborate on writing?

In practice, what everyone does is they DO rewrite history on those pushed dev branches, and they TRY to avoid the world of hurt by some convention for keeping track of what branches are 'development branches', and knowing that their history can change, and thus not _pulling_ from these branches into anything except a branch that does nothing but track the dev branch. And then using 'rebase' in just the right way on your local copy of that dev branch, when you need to. And then winding up in that world of hurt when something goes wrong.

Contrary to all the git apologists in this thread, i think it is one of the biggest usability problems with git. I'm not familiar enough with the other dcvs to know if they manage to do this better. I do know for all that, branching/merging is still a hell of a lot better than it was with svn.

What I myself tend to do is avoid ever rewriting history, sacrificing 'cleanness' for reliability and safety. Except when I'm working on a dev branch for an open source project where they insist upon it, and then I worry, and mess up a lot, and spend lots of time recovering from my mistakes.

misiti3780 · on Oct 4, 2012

which workflow are you referring to?

qznc · on Oct 4, 2012

Just read the two followup posts. It is not that clear within Linux as well.

vpeters25 · on Oct 4, 2012

"there's no way of enforcing these workflows or practices other than out-of-band social conventions"

I think this is exactly what Linus intended when he designed Git. He explained in a Google talk the way he controls what is committed to the kernel is by just pulling from people he trusts.

If you try to use git as a centralized version control system you lose control of what gets pushed regardless of how many rules and workflows you setup. Have devs send pull requests instead and don't accept/merge bad commits.

karategeek6 · on Oct 5, 2012

While I personally don't subscribe to the "one true way" philosophy, if you do (absolutely nothing wrong with that), you might be better off with mercurial than git.

dfc · on Oct 4, 2012

It would be handy if there was a option to git-rebase that would print a warning if you were about to rebase a commit by someone other than $(git config user.email)

saraid216 · on Oct 4, 2012

I'd suggest writing a git hook on pre-rebase.

uxp · on Oct 4, 2012

One could also write a pre-receive hook on their git server that denies force pushing, so it becomes impossible to overwrite published history. Combined with a gatekeeper approach of denying pushes to the master branch to all developers except an assigned reviewer, this helps foster the idea that unstable code should always remain local and not be published.

mattdeboard · on Oct 4, 2012

Like lukev said, git is "an awesome set of primitives". How you build a workflow out of those primitives isn't set in stone (though, like most things, Linus has strong opinions on exactly how to use his products). This is basically what Github has done, with an extra layer of UI glitz, social, and (much-improved) notifications.

That said, IMO there is still quite a lot of room for customization in git workflow when using Github. For example, we don't "send patches around" as Linus says. Our private feature branches live on Github but we've adopted the convention that the "private" branch name is prefixed by who's working on it, e.g. mdeboard-oauth, jschmoe-url-routes. If it has someone's name at the front, don't touch it. That enables us to still use the "D" in DVCS while retaining the ability to safely rebase our own work to keep our history clean.

The only reason I'd want a git-based product to "enforce order" is a culture-related one: ensure that contributors/collaborators do things in line with the conventions we've established. However, IMO it's always better to have a conversation about that than work with an overly prescriptive tool.

silverlake · on Oct 4, 2012

I'm still new-ish to git and don't get why rebase is popular. If I do my work on a branch B, I can merge this branch into the master M. The merge point will have a succinct message "Bug Fix #1". You can print the history so it only shows these merge messages and not the messy history in the branches. Isn't this the same as rebase? That is, rebase removes the messy branch history. But I'd prefer to keep that history, but rarely use or display it. bisect can also ignore those branches and only use the merge points. Saving the branch history shouldn't be problem. What am I missing?

Jacquass12321 · on Oct 4, 2012

There are two major things I really gain out of rebasing frequently.

Firstly and most importantly, Thanks to rebase I'm constantly working against the most recent mainline, merge pains are reduced by frequently dealing with smaller rebase merges instead of trying to do one massive merge at the end when I'm finished with a longer life task that might last a week or two. The more often you merge the less painful it is.

Secondly there's the cleaning part of history involving squashing. I believe the issue with your viewing the merge history of the main line will miss out on changes that were able to be introduced fastforward without a merge. And frankly no one else on the team cares that I committed 6 times in the process of one task, they want to see all the code relevant to that task, and ideally it's all in one change set.

There's a pretty reasonable summary over here http://blog.sourcetreeapp.com/2012/08/21/merge-or-rebase/

For certain teams rebase just makes a lot of sense.

jrochkind1 · on Oct 5, 2012

> merge pains are reduced by frequently dealing with smaller rebase merges instead of trying to do one massive merge at the end when I'm finished with a longer life task that might last a week or two. The more often you merge the less painful it is.

You can take care of that just by doing frequent regular merges, no need to do rebase ever, and rebase doesn't make this part any easier, does it?

I think the 'cleaning part of history', and trying to avoid those annoying merge commits in the logs, is in fact the only reason to do rebases, no? It's obviously an important one to many people.

aidenn0 · on Oct 4, 2012

Let's compare git to SVN.

With SVN your only real option is to commit something that is working, right? If you commit something broken to SVN then you will likely get yelled at.

With git, you can make a few changes, then think "hmm that might not be the best way to fix it" do a commit and then rip out everything you just did and do it a different way.

Or maybe you Added some instrumentation for debugging the problem, committed, then fixed the problem, committed, then removed the instrumentation.

In both cases git has let you save off information that you might need during the bugfix process, but ultimately isn't needed in the final history. With SVN, you likely wouldn't check in those intermediate steps so the final history in SVN would be a single commit of "Fix bug foo"

Is there any need for everyone to have these intermediate commits in their history? I guess that's a matter of taste. I think the main thought is that rebases improve the signal-to-noise ratio of the changelog.

krzyk · on Oct 4, 2012

Does using rebase or squash leave the history in my local repo and "squash" the commits to a single one in the master?

Sorry for basic questions but I'm new to git.

jcoby · on Oct 4, 2012

No and yes (respectively). Rebase rewrites the history of whatever branch you are on. Squash converts a merge into a single commit on another branch.

Say you are working on a feature in a branch and have made several commits. Now it's time to clean them up in preparation for a merge. You can either squash the entire branch into one revision into master and create a new commit message (git merge --squash) or you can rebase.

If you rebase, you can use interactive mode (git rebase -i) and rewrite your local history however you see fit. You can reorder commits, remove commits, merge commits, edit commits, edit commit messages, anything. It's extremely powerful and lets you make the history of your current branch into whatever you want it to be. I use rebase -i quite a bit to merge "typo" commits into the original commit. Used sparingly rebasing really helps to keep your timeline clean.

You can also rebase while in master but you should not go older than the newest shared commit (origin's HEAD) nor should you edit other people's code during a rebase.

adestefan · on Oct 4, 2012

Linus doesn't want all of your personal history. So he's okay with you rebasing your 15 commits that fixed bug X into one "fix for bug X," but never should anyone rebase someone else's history.

saraid216 · on Oct 4, 2012

> But I'd prefer to keep that history, but rarely use or display it.

So why have it?

> I'm still new-ish to git and don't get why rebase is popular.

My most common use case for rebase is actually to keep my private branches up to date with master. `git rebase master` or `git fetch origin && git rebase origin/master` are common tools for me when I'm doing private work for an extended period of time. This way, I don't have a point where my private branch diverges from master; my changes are always fresh and based off the latest and greatest.

Karunamon · on Oct 4, 2012

Because rarely != never?

uxp · on Oct 4, 2012

If you're in the position of rebasing a branch down to a single commit, but many of those commits contain messages that are useful, then keep those messages in the final commit message.

When running an interactive rebase (git rebase -i my_branch~5), you have many options available to you. Fixup squashes and discards the commit message. I'll use this on commits I made that are literally just tags in a commit stream where I'm about to do a destructive action to my code (like during refactoring, you know the ones: 'git commit -m "update"') and I want to ensure I have a point to reset my branch to if I start screwing everything up. Squash just literally combines the commit with the next one, but preserves both commit messages. Use this for commits that are relevant to the named branch you're working on and are descriptive of what the branch was trying to accomplish.

Create a summary as the first sentence of the new commit message. It will show on any pretty-print log message output. Then literally combine all your commit messages into a paragraph (or more) for a detailed description. The information is still there, and it's still in the same place.

smithzvk · on Oct 4, 2012

So I'm relatively new to version control entirely, but in the last few years my group has been making a big push to institute Git. I have been wondering lately, however: how much history cleaning is expected/desirable?

When I develop, I split my commits into as many small changes as I can so that the commit messages are single topic. I thought that was basically the idea. Every once in a while I use rebase to combine a few commits that should have been done together as they all addressed the same issue. This all seems right to me. I am left with a clean history of everything I have done on a very fine grained time scale. But the large number of commits, each with little significance to whole program hides the large scale structure of the development.

However, I could use rebase to start combining loosely related commits, trading the time resolution for clarity in the commit history. There seems to be a continuum along this scale. Where is the proper place in that continuum to say this is clean enough? Also, I don't like making changes where I am losing perfectly good information.

I know that I can group certain commits by defining a branch, developing on it, then merging (non-fast-forward) back to the original. The branch should keep the grouping in the commit history. I even suppose that this is can be done after the fact using rebase with the proper amount of git-fu. Is branching and non-fast-forward merges the preferred method of grouping related commits in the history?

If so, this seems troubling as it means that partially fixing something is difficult to do with a clean history. Until the piece of the program you wish to fix is completely working, it shouldn't be merged into master because it would ruin the grouping of the related commits. This means that there can't be any partial thought's like fixing bugs as you find them, because presumably you might want to group all bug fixes of a function together, but have a distinct commit for each.

Now I'm more confused than when I started. Seriously, any references or advice on this sort of topic are welcome.

wickedchicken · on Oct 4, 2012

> However, I could use rebase to start combining loosely related commits, trading the time resolution for clarity in the commit history.

In general, your commits should be the smallest atomic operation that makes sense. When people talk about 'clean history,' they're talking about working in the awesome workflow git provides:

1. Write half-written broken code. 2. Fix that code up. 3. Add some more onto that. 4. Fix a typo! 5. Forgot to update the README.

Now, you could push that to master, but then the main master is littered with commit messages like 'oops' and 'typo.' Instead, you can rebase 5-1 onto the latest master, squash them together, and have one 'nice' commit that only has the cleaned up final changes.

This is one of the most powerful things about git: in a private repo, you can commit all kinds of garbage and half-written stuff without caring. When you want to make your stuff public, rebase and squash, then send it out. Be careful though! Only rebase your own private branches, or you're gonna have a bad time™.

smithzvk · on Oct 4, 2012

Okay, that is basically keeping with my current understanding (though I'm not sure how much I live up to the "only have working history in the public repo" rule).

There is the other issue I raised, however: is there a good way to group a series of commits that happen to be towards a single distinct goal. Using branches is a clear step in that direction, but it seems like a nightmare to perform a rebase like you described if the commits are mixed and I would like the end result to involve grouping via branches. That is confusing, hopefully this will clear it up:

1. Bugfix in function1. 2. Bugfix in function2. 3. New feature in function2. 4. Bugfix in function1. 5. Bugfix in function2

...and we want in the end:

      /-- 1 ---- 4 ---\
  ---<                 >--HEAD
      \- 2 -- 3 -- 5 -/

Can rebase do this easily? Is this a good idea (it seems like it is to me)? The programmer would have to confirm that the code works at every state.

wickedchicken · on Oct 4, 2012

So I'm not sure if I understand correctly, but let me put it this way: with a little more git craziness, you can crack apart a commit and separate it into two. This is good if you did two unrelated changes to a file, committed that, and realized you wanted two separate commits later.

The basic process is:

1. git rebase -i, and change a commit to 'edit' 2. git reset HEAD^, this 'undoes' the commit and leaves the changes in your directory as if you had written the code but hadn't committed it yet 3. git status 4. git add <filename> -p, this lets you add commits to your file a chunk at a time. first, add all the commits as a part of commit one. skip the parts you want for commit two. 5. git commit (do not do git commit -a here) and write the message for your first commit 6. now your working directory will be all the changes for commit two. git commit -a if you want all of them 7. git rebase --continue

This page[1] has a more concise answer, but leaves out the git commit -p part.

Note that if you mess up in rebase-land, you can always git rebase --abort. If you come out of the rebase and everything looks lost ('oh god I lost my data!'), use git reflog and pull up the hash of where you were before. Your data is still there.

Another note: if your commits are already separate, you can use rebase to selectively squash and reorder them. Read the manual on git rebase -i, if you rearrange commits and only squash some I think you'll get what I'm talking about.

[1] http://stackoverflow.com/questions/6217156/how-to-break-a-pr...

lmm · on Oct 4, 2012

Switching branches is cheap, I'd say the "right" way to get a tree like you want is to have two or even five branches all the time you're working. But I suspect you could make two branches and cherry-pick different sets of commits onto them to get the result you're after. To my mind it wouldn't be worth the effort though; how often do you really care whether the code worked with only 1 and 4 applied?

smithzvk · on Oct 5, 2012

Right, I would say that it isn't worth the effort. Also, I probably never care about the code with only 1 and 4 applied. So perhaps branches aren't the right way to do what I am describing.

I always saw VC as a systematic way to keep a log of my development so that I could figure out where I may have broken my code. For this purpose, having some sort of meta-data where commits can be grouped would be nice. It would also work to do something like always end my commit messages with some kind of meta-data tag that I could grep the log for. I was just wondering if there was a prescribed/built-in way for Git to handle this.

lmm · on Oct 7, 2012

git-bisect is the standard tool for figuring out where you broke something. I don't know what it does with branching histories though, I tend to effectively linearise my history by rebasing each branch on the trunk head before merging it.

exDM69 · on Oct 4, 2012

> I have been wondering lately, however: how much history cleaning is expected/desirable?

After you've published your work and someone else has checked it out, you don't want to touch your history unless there is a serious problem.

But when you're working on something, you can commit all you want, and do many commits. Then at some point you put your work up for reviews and get feedback. Then you fix the feedback and commit as many times you need to. When your code is good enough to be merged into master, you should clean up the history a little with rebase.

You should at least try to squash and rebase your commits so that there will not be any commit in the master history that is completely broken. The whole point of having a history is that you're able to go back. E.g. you might want to search the point in history where a problem originated (git bisect can automate this with a "binary search"). You cannot effectively do that if your history is full of commits that do not work (E.g. won't build or will crash all tests).

To recap: never change published history unless there is a serious issue (like you committed your database password to github). But you can and should change your local history before you publish to master so that there are no broken commits that make it difficult to walk back in history.

saraid216 · on Oct 4, 2012

My workflow when working on a large project or doing multiple commits looks roughly like this:

  git checkout -b featurebranch
  git commit -am "foo"
  git commit -am "bar"
  git rebase master # to update my personal history with public history
  git commit -am "baz"

I've used different flavors of merging it back in, though. Method 1 is to `git checkout master; git diff master..featurebranch | git apply`. Method 2 is `git rebase -i HEAD~10; git checkout master; git cherry-pick featurebranch`. I'm sure there are other and better methods, but those are the ones I've used recently that I like.

After I collapse a branch down into a single commit (I rarely want a branch to become multiple commits), I typically use `git commit --amend` to modify the commit message to something fitting and push it upstream. --reset-author is also good there to properly denote the correct date/time, rather than the first commit you squashed.

easy_rider · on Oct 4, 2012

Funny. I was just finishing a chat with a colleague about a git strategy for a coming new release of a production product, then saw this post on top. I've been working on it without collaboration for about half a year now, so thats easy.. I've had mixed experience with both rebasing and pull strategies before that. I've found rebasing being a lot better when working with tightly coupled code. And pull being a lot cleaner in being able to cherry-pick and revert to previous states more easily. rebase is indeed a destroyer.

We've now decided to use this model, while only deleting feature branches after RC acceptance.

http://nvie.com/posts/a-successful-git-branching-model/

My colleague just suggested to rebase regularly from the develop branch while developing features "I'm working on a branch. someone - e.g. you - updates the develop branch. I will have no info if that is related to my stuff or not so, I should rebase regularly to the latest version of the develop branch"

I'm kinda clueless now. Git is really powerful and flexible in strageties, and that adds to complexity.

leeoniya · on Oct 4, 2012

here's a more recent rant: https://github.com/torvalds/linux/pull/17#issuecomment-56599...

jrochkind1 · on Oct 5, 2012

oh yeah, perfectly straightforward, only took several thousand words to confusingly explain.

Nope, not simple. Yep, this is a git usability problem.

In the ruby/github world, people generally violate this and DO rewrite 'public' history in order to get 'cleanness', primarily because almost ALL history is 'public', since you tend to show people work in progress on github, or just push it there to have a reliable copy in the cloud. And yes, this sometimes leads to madness.

chris_wot · on Oct 5, 2012

Unintentional contradiction two messages down the thread: Linus says "But note: none of these rules should be absolutely black-and-white. Nothing in life ever is."

Or perhaps intentional. I can never tell when I read a Linus fiat.

http://www.mail-archive.com/dri-devel@lists.sourceforge.net/...

mibbitier · on Oct 4, 2012

git is so overly complex (Coming from svn).

pm215 · on Oct 4, 2012

I think that for people with an svn background there are three different issues that all hit at once:

* distributed rather than centralised version control brings a new set of concepts to understand

* git is flexible enough to support many different workflows. This means you have to actually choose one, and choice is difficult especially when you're just trying to get to grips with a new tool. svn has much more of a "one standard way to do it" approach

* git's UI is in places confusing, inconsistent and occasionally just randomly and unnecessarily different from most other version control systems

The first two are 'essential complexity'; the third is more 'accidental complexity'. In any case I feel it's having to deal with all three sources of confusion that makes the svn->git transition tricky for many people.

mibbitier · on Oct 4, 2012

Don't most people actually end up using git in a centralised manner though? eg the rise of github.

I can totally see git is ridiculously powerful, and general purpose. I just wish it'd default to what most people want a bit more.

kbolino · on Oct 4, 2012

"Distributed" is not the same as "ad hoc". In virtually all workflows, whether using distributed or centralized RCS, there will be a master copy. The difference between distributed and centralized is whether that master copy is the only copy.

aidenn0 · on Oct 4, 2012

In my experience, git is more complex than svn, but not needlessly so. In any sufficiently long-running project, I've wanted features that git has and svn doesn't.

mibbitier · on Oct 4, 2012

As a relative newbie to git,

Why do I get prompted to enter a commit message when I'm just doing a git pull?

Why do I have to explicitly add every file I want to commit each time? Why can't it just default to "everything under the current dir" like svn does?

tmhedberg · on Oct 4, 2012

Why do I get prompted to enter a commit message when I'm just doing a git pull?

Because `git pull` == `git fetch` + `git merge`. If there are upstream commits you are fetching that are not ancestors of your head commit, then pulling involves merging the divergent history, thus creating a new merge commit. And a merge commit, like any commit, needs a message.

Why do I have to explicitly add every file I want to commit each time?

Because Git has an intermediate staging area (the "index") between your working directory and the committed history. This is a great feature; one of the most useful aspects of Git, in fact. The side effect is that you must add your changes to the index before committing, but this is a small price to pay for the huge increase in flexibility the index affords.

Why can't it just default to "everything under the current dir" like svn does?

You can do `git add .` to add everything in the current directory without naming it all explicitly. Or you can use the `git commit -a` shortcut (and similar -A and -u options) to add and commit in a single command. This is hardly a significant increase in effort over `svn commit`.

drizzo4shizzo · on Oct 4, 2012

If you have commits in your local branch and you are doing a pull without --rebase you can get merge commits, but I believe those messages should be generated for you(?). I almost always choose rebase over merge so there are no merge commits, all my merges are fast forward. Check your workflow.

Regarding your second question, you want "git add -a". Git gives you the ability to commit "some of what I've changed here", even within files (see git add -i). This facilitates clean commit history by letting you control exactly what is in each commit (even if you changed other files).

And even once you've made your commits to your private branch of course you can continue to change the order of them or combine them with interactive rebase... until you push...

magoon · on Oct 4, 2012

> Why do I get prompted to enter a commit message when I'm just doing a git pull?

This shouldn't happen; you need to rethink how you're set up.

> Why do I have to explicitly add every file I want to commit each time? Why can't it just default to "everything under the current dir" like svn does?

You can use git commit -a

Edit: by "rethink" I mean research why this is happening, because it isn't by design. I'm sure somebody can help, I just don't have the answer for you.

akoumjian · on Oct 4, 2012

> Why do I get prompted to enter a commit message when I'm just doing a git pull? Not sure. The only time this happens to me is when I need to fix a merge conflict. > Why do I have to explicitly add every file I want to commit each time? Why can't it just default to "everything under the current dir" like svn does? "git add ." adds everything below the current directory.

klj613-- · on Oct 5, 2012

1. `git pull` = fetch + merge If the merge has conflicts, then you got to solve the conflicts and do the commit manually (entering the commit message) If the merge doesn't have conflicts, then the merge commit is automatic.

2. `git commit -m "foobar"` will commit the files from the stage/index `git commit -am "foobar"` will commit all the modified files (it will ignore untracked files)

easy_rider · on Oct 4, 2012

svn always seemed limited to me if your dev team grows beyond the capability of utilizing simple verbal communication to mitigate problems when merging.

mibbitier · on Oct 4, 2012

Personally, I've never been a fan of branching and merging. I don't think it works well at all for small groups. Maybe if you're in a big corp. though.

qu4z-2 · on Oct 5, 2012

What do you do if you want to commit a half-finished feature? Do you only commit in very large-grained chunks?

gosub · on Oct 5, 2012

git needs a "git propaganda" command. Instead of changing history, it would tell it in a different manner.

3825 · on Oct 4, 2012

I've heard some of these words...

jebblue · on Oct 5, 2012

I have tried to get git, some people say one project per repo (which seems crazy but I did it), many projects are ok, you do need a main master repo, you don't need one, then there's the half dozen commands where with SVN it's one.

Now the most valuable thing to me in source control, history, I'm supposed to keep clean? That's like a sacred cow, you _don't_ mess with history.

>> That's fairly straightforward, no?

No _Linus_ it isn't. Git is hard to get right. If it wasn't for EGit I'd be lost. I tried Canonical's bzr and it is more understandable for ordinary humans.

All that aside I really like Linux. :)

klj613-- · on Oct 5, 2012

Best way to learn git is in the command line (get away from any GUI). And then play with repositories to see what the commands actually do.

"Don't mess with history"? I don't have to commit to my commits as long as my commits ain't public.

Rewriting history is a lie? Well, if you want to keep everything you do in history, maybe commit on each keystroke? That's insane.

Don't commit unless your ready to commit? Then that be hard to keep track of. Come time to commit you've got 50+ files modified good luck at doing decent commit messages.

jebblue · on Oct 6, 2012

>> Best way to learn git is in the command line (get away from any GUI).

I've used a lot of source control systems and the best always have a GUI and so guess what? I want a GUI unless the CLI for such system is inherently intuitive which if you read my comments I do not think git is intuitive at all.

>> I don't have to commit to my commits as long as my commits ain't public.

Huh?!?! I don't get that, it like makes no sense to me whatsoever. Why do you think I should even try to comprehend it?

>> Don't commit unless your ready to commit?

Are you suggesting I said or asked that??? Are you advising me? Seriously what?

>> Then that be hard to keep track of. Come time to commit you've got 50+ files modified good luck at doing decent commit messages.

Huh? I'm sorry is that English because it doesn't even make sense at all to me? Is it 50 lines changed all clearly related? Is it 50 totally different changes?

klj613-- · on Oct 10, 2012

Personally I think its better using the CLI for git.

I commit very often however I rewrite the commits. In other words, I mess with my history and it is a good thing (My commits ain't final, in other words... "I do not commit to my commits").

In SVN I try not to commit too often because I do not want to commit (publish) changes which I may not want to keep.

With git I commit very often in stages. Then I can remove them or change them at a later stage. If I do not do this I will end up with a load of files (e.g. 50+) which has been modified and either I commit them all in one (bad) go or try and separate out each step I've taken the past 12 hours and do decent commit messages (good).

Of course you could commit very often, create new commits to fix errors you've done in recent commits (rather than rewriting history). You could also merge master into feature-x everyday (rather than rebasing), but then you'd have history which looks like chaos and hard to follow.

-

Honestly, when I started git I was lost (first VCS I learnt). Until one day I figured out how simple git is to use.