Saturday 24 November 2012

Introduction To Git Concepts

This post is an introduction/reminder to Git concepts. It aims at facilitating the learning curve for those coming from a Subversion (or other) background. For more details, there is nothing like the official book.

Concepts

  • Git operates on repositories which contain a local database of files and corresponding file revisions.
  • Repositories contain files which can have 3 states:
    • Committed - The file is stored in the database.
    • Modified - The file has been modified, but has not been stored in the database.
    • Staged - The file has been  modified and flagged as to be committed in the database.
  • A file can also be considered as untracked by Git. It has no status, until it is added to the working directory. The add function can be used to stage a file.
  • When cloning or creating a Git repository to a local directory, this directory will contain:
    • A Git directory containing the local database and all meta information corresponding to the repository.
    • A staging area file containing information about what will be included in the next commit.
    • The remaining content is called the working directory, which contains files (extracted from the database) corresponding to a specific version of modifications stored in the database.
  • Typically, files are modified in the working directory, then staged for commit, then committed into the repository database.
  • It is possible to ignore files in a repository by marking them as such in the working directory. They will not be stored in the local database.
  • It is possible to remove files from the Git repository, or to move them around in the working directory.

  • When cloning a repository locally, its origin (i.e., the original repository) is registered in the local repository as a remote repository. Several remote repositories can be attached (added) and detached (deleted from) to your local Git repository.
  • One can fetch all data from a remote repository (including branches). This operation will not merge this data with your local work though.
  • One can also pull all data from a remote repository, which is like a fetch and automatic merging.
  • Pushing your repository content to a remote repository is like a pull, but the other way round. All modifications transmitted and merged on the remote repository.

  • One can create tags (of content at a specific version). Eventually, this tag can be annotated with information, such as tag creator, email, etc... It is possible to sign tags too (in a cryptographic way). Signed tags can be verified.

  • One can create branches too. Technically speaking, these are pointers to a specific version in the database repository. The default branch is called Master.
  • In order to remember which branch you are working on, Git has a specific pointer called HEAD.
  • One can use the Git checkout command to switch back and forth between branches. This will update the content of the working directory accordingly.
  • Modifications made to files in different branches are recorded separately.
  • Once a branch has reached a stable level, it can be merged back to Master (or any other branch it came from). Then, it can be deleted.
  • Eventually, a branch can be closed and deleted without merging modified content. This content is lost forever.
  • Branches can evolve independently. If so, a merge operation will first find a common ancestor and create a new version in the Master (or any other target branch). This version will contain both the modifications of the target and merging branches.
  • It is possible to work with remote branches from remote repositories too.
  • Rebasing is about merging the content of a branch back to Master (for example). When there is only one branch, it is not different than a simple merge, except that the log history will not contain the entries (versions) of the rebased branch anymore. This functionality is mostly useful when there are multiple branches created from multiple branches in a Git repository. You may want to merge a branch while keeping alive others having common ancestors.
  • Be careful with rebasing, because if other people were working on that branch (i.e., it is public) or any sub-branches, the continuous integration continuity will be broken for them.

  • Fast Forwarding is the process of moving a branch pointer forward. For example, a branch A is created from Master. Work is performed on A and merged back to Master. The Master pointer may lag behind to an earlier version not containing the merged changes. It can be fast forwarded to the version containing those merged changes.
  • Stashing is the practice of saving unfinished work aside without committing it yet. This allows one to switch branches without committing work in progress.
  • Submodules is a mean to import another Git project into your Git project, but with keeping the commits separated. This is useful when that other project is about developing a library which will be used in several Git projects.

1 comment:

  1. Are you sure pushing causes merging on the Git server? I thought all merges were done in the working directory. That's why you can't push if there are changes on the remote.

    ReplyDelete