Christopher B. Browne's Home Page
cbbrowne@acm.org

Source Code Management

Christopher Browne


Table of Contents
1. Git

It proves to be exceedingly useful to have tools for managing software source code. The main such tools are commonly known as "version control" tools, allowing the tracking of the history of modification of a set of files.

1. Git

Git is a rather popular SCM system created by the developers of Linux It has notably replaced BitKeeper as the tool used to manage Linux patches.

1.1. Cool Git Hacks

People are frequently using Git to do more or less novel things:

  • Remote Disk Usage Analysis using Git

  • git mail

  • DVCS-Autosync - A personal Dropbox replacement based on Git

    Dropbox has grown quite popular, as a widely-accessible service where you can "drop" your files. (There are getting to be integration tools for mobile platforms such as Section 1, for which this particular system is sadly not a replacement...)

    But it's not obvious that the operators of Dropbox are infinitely trustworthy in all ways. One might wish to run one's own file distribution service, and here is a nice example of such.

  • bup

    A backup system with a number of interesting qualities:

    • Breaks large files into chunks, detecting (much as with rsync) that repeated mostly-the-same copies of files are mostly-identical and so do not need to be repetitively copied

    • Uses the Git packfile format for storage, so you can access data using Git methods

    • Writes packfiles directly, so it's faster than Git

    • Shares data automagically between backups.

      The really cool part is that this means that files that are identical are stored once, not multiple times, even though the source systems are unaware of each other. Thus, data in /usr/lib and /usr/share will tend to be "less redundant than expected", and the more sources you back up into one bup repository, the more redundancies, and hence, savings, you may expect to find.

      There is a problem with this - you can't trivially purge old data.

    • Something called "par2 redundancy" allows recovery from corruptions

1.2. Document Archiving

  • git-annex

    git-annex allows managing files with git, without checking the file contents into git. This is useful for collections of music, videos, or documents, where it is not important to capture versions, but where you'd like to be able to migrate documents around, make sure there are multiple copies, and such like.

  • dspace.org - MIT/HP system for document archiving

Google
Contact me at cbbrowne@acm.org