Git - the stupid content tracker

By: John McFarlane <john.mcfarlane@rockfloat.com>
Last updated: 08/09/2008 @ 01:11

Abstract:
Git is the scm system developed by Linus Torvalds. It is a distributed system designed initially for use with the Linux kernel.



1. Installation

Git is really easy to install on Gentoo:
user# sudo emerge -a dev-util/git
I would guess a quick apt-get install git would work on Debian based systems, and good luck for you Fedora cats out there.
I'm finished with this step

2. Introduction

Git is a scm using a distributed model. This is very different than most scm's which use a centralized model. The basic difference being that with Git there is no "server" that holds the repository. Each person has a copy of the entire repository, and each copy is just as "official" as any other.

This might not sound very different than the centralized model, but it really makes things alot different. I'm personally still trying to get used to it.

I'm finished with this step

3. Set global configuration

Git keeps track of configuration settings both at the repository level, and at the user level. Here's how I recommend setting up your global configuration, as you will probably want these [sweet] settings for every repository you have:

user# git config --global user.name "Your Name"
user# git config --global user.email your@email.com

# You do want your shell to look sweet... don't you?
user# git config --global color.diff auto
user# git config --global color.status auto
user# git config --global color.branch auto

# Global ignores is another sweet feature
user# git config --global core.excludesfile ~/.gitignore

# Only use a pager when there's more than one page
# This is only for git >= 1.5.6.4
user# git config --global core.pager "less -FRSX"

# Here's an example gitignore file
user# cat <<EOF > ~/.gitignore
*.pyc
*.pyo
*.swo
*.swp
~*

EOF
    
I'm finished with this step

4. You're first repository

Let's just jump right in. Let's say you want to work on a little Python project that tests your knowledge of services and the tcp ports they use. We'd start off by doing something like this:

user# mkdir svctester
user# cd svctester
user# touch setup.py svctester.py Changelog
    
Now we want to place this project under version control:

user# git init      # Initialize a new repository
user# git add .     # Add all files to the index
user# git status    # Look at what is about to be commited
user# git commit -a # Commit the changes

# Next your default editor will open. Add your commit message and save the
# file to complete the commit.

    
Job well done, you now have your first Git repository. Let's take a second and look at what happened.

user# ls -1a
.
..
.git
Changelog
setup.py
svctester.py

    
What you'll see is your three code files, and a .git directory. This .git directory is known as the repository. The really cool thing [unlike svn] is that this directory holds all of the Git's meta data. This means that if you create directories inside your codebase, they do not have hidden .git directories inside them. This is very differnet than svn where you find your source code littered with .svn directories.
I'm finished with this step

5. You're first branch

Something you'll learn about Git is that it's meant to make branching easy, really easy. First let's look to see what branches we have already:

user# git branch
* master
    
This is telling you that currently you have one branch named master. Let's create another one:
user# git branch get-services
What this did was create a new branch inside the repository based on the master brach, but it did not do anything to your working copy.

Now let's switch to our new branch:

user# git checkout get-services

I'm finished with this step

6. Commit work and merge branch into the master branch

Let's go ahead and make some changes to our get-services branch:

user# echo "svc = open('/etc/services', 'r').readlines()" >> svctester.py
user# echo "svc = [s.strip() for s in svc]" >> svctester.py
    
Now we will commit the changes to the branch we are currently in, the get-services branch:

user# git commit -a -m "- Added svc list to hold services"
user# git checkout master
user# git merge get-services
user# git log

commit cfc176c947970bd5e890905f3486429a7d4ee8eb
Author: John McFarlane <john.mcfarlane@rockfloat.com>
Date:   Sun Jun 3 17:59:16 2007 -0400

- Added svc list to hold services

commit 00bb22909b4c0aeff1ffb2b3a0fc335a88ee68a9
Author: John McFarlane <john.mcfarlane@rockfloat.com>
Date:   Sun Jun 3 17:52:19 2007 -0400

- Initial commit

    
Clean house and delete our branch as we're finished with that task:
user# git branch -d get-services
Now let's look at our handywork:
user# gitview
I'm finished with this step

7. Explore what exactly happens when you switch branches

With Git it's very important to understand what's happening. As a tool it provides you with a lot more control than svn or cvs. In fact you can completely destroy yourself if you're not careful. So let's spend some time looking at what's really happening.
Create a new branch to capture the user's input:

user# git branch get-userinput
user# git checkout get-userinput
    
Now add some code to get the user's input:

user# echo "input = raw_input('What port is smtp: ')" >> svctester.py
    
So now you wonder if it would be better to actually start by getting a random service out of our list, so you can use it as part of the question. To do this you create a branch for that purpose:

user# git branch get-randomservice
user# git checkout get-randomservice
    
Out of curiosity you check to see the status of things:
git status

# On branch get-randomservice
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#       modified:   svctester.py
#
no changes added to commit (use "git add" and/or "git commit -a")

    
What the heck? Why does svctester.py seem to have changes made to it? We didn't make any changes to this file in this branch yet? Here's the deal. Uncommited changes are visible to all branches. This means when you switch from one branch to another, new or modified files will always be in your working copy.
Tip When working with branches, always check which branch you are currently in before doing anything that changes the repository.
So what this means, is that we should also see the same thing if we look at the status from the master branch:

user# git checkout master
user# git status

# On branch get-randomservice
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#       modified:   svctester.py
#
no changes added to commit (use "git add" and/or "git commit -a")

    
But since this rule only applies to new and modified files, once the change is commited, it should not be visible to other branches:

user# git checkout get-userinput
user# git commit -a -m '- Capture user input on service question'
user# git checkout master
user# git status

# On branch master
nothing to commit (working directory clean)

    
I'm finished with this step

8. Howto compare branches

At this point we have three branches:

user# git branch

get-randomservice
get-userinput
* master

    
So let's see what changes would happen if we were too move changes from the get-userinput branch into the master branch.

user# git diff master..get-userinput

diff --git a/svctester.py b/svctester.py
index 3e98f31..8dd13a5 100644
--- a/svctester.py
+++ b/svctester.py
@@ -1,2 +1,3 @@
svc = open('/etc/services', 'r').readlines()
svc = [s.strip() for s in svc]
+input = raw_input('What port is smtp: ')

    
What we just did is a bit confusing: master..get-userinput. What exactly did that do? Essentially when you use the double dot notation you're asking Git to tell you the difference between your stuff (the first branch) and somethign else (the second). Because of this... the diff shows the changed line with a + sign in front of it. If we had used: get-userinput..master it would have put a - sign in front of it.
Tip Remember when using git diff to always put the thing to receive the changes as the first part of the double dot notation.
Let's go ahead and merge the changes into the master, and move on to getting the random service:

user# git checkout master
user# git merge get-userinput
user# git branch -d get-userinput  # Clean house
    
Take a look at our handy work:
user# gitview
Now for those of you who have really strong attention to detail, you might have noticed something odd there. Why does the get-randomservice branch show up next to the second commit? The answer is this: Next to each commit Git will show all of the braches that were present at the time of the commit, and still exist. This means that now if we delete the get-randomservice branch, it won't show up in gitview anymore.
user# git branch -d get-randomservice get-userinput
I'm finished with this step

9. How to shelve a branch

Once upon a time I reviewed Microsoft's successor to SourceSafe named Team Foundation Server. I was complete poo. Really, I managed to destroy my project in about 5 minutes. Though as much as I hated it, it did have one nice feature that I've not yet seen elsewhere (though I've only used svn and cvs). The feature wasa called "shelving" and it essentially allowed you to simply push your code onto a shelf, so you could pick it up later. It was a really handy feature.
With Git you can do this pretty easily, here's how:

user# git branch experiment
user# git checkout experiment
user# echo "Testing with Git shelving" > testing
user# echo "for s in svc: print s" >> svctester.py
    
So now you have a new branch, and you've made some changes to it. But if you switch to another branch, your changes are visible. Let's shelve it for later:

user# git add .
user# git commit -a -m "Shelving - Added print of svc and a test file"
    
Now you won't see those two changes anywhere, but inside the experment branch. Now, when you're ready to resume your work you do this:

user# git checkout experiment
user# git reset --soft HEAD^
    
You are exactly where you were prior to shelving, pretty sweet. What happened was actually quite simple. We committed the changes, and later rolled the commit back. Because this process works by exploiting a commit, it is subject to commit rules. Meaning you must add unversioned files to the index before you commit if you want them shelved.
Tip If you want to save your work for later, don't forget to use git add . before you commit [and later reset].
Go ahead and shelve again for later:

user# git commit -a -m "Shelving - Added print of svc and a test file"
    
Tip Recent versions of Git include the ability to stash. It's pretty interesting, you should check it out.
I'm finished with this step

10. How do you publish a repository for other users?

Even though Git is a distributed system, it's still very nice to have a single place where commits happen. There are two main ways this happens:
  1. Pushing to a remote repository
  2. Asking others to pull from your repo
Let's quickly go over the basic flow of pushing, as it's the work flow that feels most natural to those [smart people] migrating from tools like svn/cvs/p4.
Here's how you would share a scripts repository:

# Create a home for all published repositories
user# sudo mkdir /var/git

# Give users in the gitusers group write access
user# sudo groupadd -f gitusers
user# sudo chown -R :gitusers /var/git

# Add people to the gitusers group
user# sudo gpasswd -a jmcfarlane gitusers
user# sudo gpasswd -a yourfriend gitusers

# Export a repository
user# git clone --bare /path/to/scripts-repo/.git /var/git/scripts.git

# Create a post commit hook to maintain permissions
user# cd /var/git/scripts.git/hooks
user# cp post-commit post-commit.ORIG
user# echo "chown -R :gitusers /var/git/scripts.git" >> post-commit
user# chmod +x post-commit

# Now someone can clone your newly published repo
friend# git clone ssh://hostname:/var/git/scripts.git

# Let's go thru a super fast change process  :)
friend# git checkout -b cruft
friend# echo "foo-code" > cruft.bash
friend# git add .
friend# git commit -m "Initial commit of cruft.bash"
friend# git checkout master # Never edit files here directly
friend# git merge cruft # Pull in my changes
friend# git push # Push my changes to the server
friend# git branch -d cruft # Delete little branches when you can

 # Pretend you already have a clone of the server too
user# cd ~/my/local/cloned/copy/of/scripts
user# git branch

# If I'm NOT on master, and have outstanding changes
user# git add .
user# git commit -m "Shelving my commits to pull from upstream"

# Pull down the new changes (from my friend)
user# git checkout master
user# git pull

# Go back to my branch, integrate the changes, and keep working
user# git checkout mybranch
user# git rebase master # This does all the magic
user# git reset HEAD^ # Unshelf
user# git log log -n 1 # Validate my friend's change
user# git status # Damn that's sweet!
    
I'm finished with this step

11. How to revert/undo changes

I'm a huge fan of svn revert. Here's how you do it in Git. Let's add some code:

user# git checkout -b revert # Shorthand to create and switch
user# mkdir test
user# echo "print 'test'" > test/test.py
user# git add .
user# git commit -a -m "Initial import of unit tests"
    
Now let's make a few changes:

user# echo "print 'wee'" >> test/test.py
user# mv test/test.py test/testing.py
    
Now let's revert those changes:
git checkout test
This will leave test/testing as an unversioned file that you need to manage yourself. If you really want to undo all changes you could just do:

user# rm -rf test
user# git checkout test
    
I'm finished with this step

12. How to retrieve a previous revision of a file or directory

Svn users are familar with svn cat. With git you fetch a previous version like this:

user# git show HEAD^^^:test/test.py
    
The command takes the usual style of revision, meaning you can use any of the following:
  1. HEAD + x number of ^ characters
  2. The SHA1 hash of a given revision
  3. The first few (maybe 5) characters of a given SHA1 hash
Tip It's important to remember that when using git show always specify a path from the root of the repository, not your current directory position.
I'm finished with this step

13. How to see what changed since a given date

Often you need to see what has changed in the last few days. Git is amayzingly good at answering such a request. Any of the following are perfectly valid:

user# git whatchanged --since="yesterday"
user# git whatchanged --since="7/8/2007"
user# git whatchanged --since="yesterday at 4pm"
user# git whatchanged --since="last friday after 3pm"
user# git whatchanged --since="3 days ago"
    
I'm finished with this step

14. How to view files changed per commit

For you svn users out there, you might be looking for how to also include the list of files changed with each commit, which for you is: svn log -v. With git you do it like so:
user# git log -r --name-status
The only tricky part, is that [currently] --name-status is only in the git log man page as an example, it's not actually documented.
I'm finished with this step

15. How do you compare your changes with another branch?

If you want to compare your local branch with another one, just pass it's name to diff:
user# git diff otherbranch
I'm finished with this step

16. How do you create a post-commit hook?

They are pretty easy. See the section on how to publish a repository.
I'm finished with this step

Changelog: Date Description
07/16/2007 @ 23:30 Added steps for fetching file or dir by version, and fetching what has changed recently
12/01/2007 @ 11:38 Added git log -r --name-status
06/07/2008 @ 09:20 Added info on global config, sharing, and a few random tips
08/09/2008 @ 01:11 Added config setting to only use a pager when needed (Thanks jw)

This document was originally created on 07/16/2007


Conventions and tips for this howto document:
  1. scm = Source Control Management

Disclaimer:
This page is not endorsed by gentoo.org or any other cool cats. Any information provided in this document is to be used at your own risk.