Git's Guts

gitの内側を探検しましょう！

Daniel P. Wright

大久保　佳尚

@tataminomusi @danielpwright dpwright

こんにちは。私はここヴィテイのプログラマー、ダニーです。今日は「git入門」ということで、gitの使い方とかを勉強していますが、せっかくだからgitの内側や実装を説明させていただきたいと思います。

これは、gitを使うのに本当は覚える必要はないけど、面白いし、何か変なこと起こったらその原因を調べるのに役に立つと思います。

一つお願いがあります。これからの話はちょっと技術的だし、私は日本語は母国語ではないので、「意味分からない！」と思うときは是非質問してください。我慢しないでください！

git

“The stupid content tracker”

追跡　ついせき

まず、gitとは、いったい何か？と考えたらいいと思います。 gitのマニュアルを見るとこういう風に書いている：「おバカなソース追跡ツール」。この定義が好きです。「stupid」とは「バカ」という意味ですけど、ある意味「simple」ですね。 gitは最もシンプルなシステムであり、そのシステムを上手く使うことはあなた達ユーザーの責任だよ！というスタンスで作られています。

でもgitを考えるとどいう特徴があるでしょうか？

分散型バージョン管理システム

分散型　ぶんさんがた

一つのメーンサーバーではなく、みんなが自分のレポジトリを持っている。 pullやpushでそれぞれのレポジトリがお互いに通信できる。サーバーはただその一つ。

軽量なブランチモデル

軽量　　けいりょう

この前のセッションで詳しく説明がありましたが、他のバージョン管理システムと比べるとgitのブランチモデルはすごく軽量的。

git add, git commit, git push

git pull, git fetch, git merge, git log, git rebase, git checkout, git reset, git diff,
git cherry-pick, git stash, git blame, git submodule, git branch, git show...

CLIが使いづらい！コマンドが多い！と言われていますよね。実は160以上コマンドがある。ここにリストアップしてみたのは日常的な、毎日つかっているようなコマンドだけです。

ただ、元々Linus Torvaldsが作ったgitでは、７つのコマンドしかなかった。その７つでgitの全てができる！７つのコマンドは全部一千の列のCコードで作られています！

(Those commands were: update-cache (now update-index) show-diff (now diff-files) init-db write-tree read-tree commit-tree cat-file)

分散型、軽量的、複雑けどパワーフルな、フレキシブルなコマンドライン。どうやってこんな機能を持っているシステムが作られているのでしょうか？ gitの秘密はなんでしょう？それを知るために、gitの内部を見てみましょう！

実は

gitはバージョン管理システムではない。

連想記憶ファイルシステムだ。

その上にVCSが作られていますけど、gitの中核（ちゅうかく）はこの連想記憶ファイルシステムです。 ...とは言っても、連想記憶ファイルシステムって何でしょう？

Traditional	Content-Addressable
データに名前を付ける	名前はデータによって自動的に付けられるその名前でデータの中味が表せる
名前を使って、RWアクセスができる	名前を使って、データを取得できる変更すればまた変更されたデータで名前を作り直す
ディレクトリを使って、ヒエラルキーが作れる	ヒエラルキーは特にない

連想記憶　れんそうきおく表す　　　あらわす

First Row: Figure out how to say "one-to-one mapping" so I can explain this better. gitの場合はSHA1というアルゴリズムで名前が決まっているので、「名前」ではなく「SHA1」とはよく言われています　（例：コミットのSHA1、ファイルのSHA1）

Second Row: 元のデータが残るので、不変的なファイルシステムとも言えます

Third Row: 連想記憶ファイルシステムではヒエラルキーは必要なくなります。同じ名前で違うデータを入れることができないからです。

Example: Traditional

        $ cd fs
        $ mkdir foo bar
        $ echo foo > foo/hoge
        $ ls -l foo            #See there is a file called 'hoge'
        $ cat foo/hoge         #See the contents of the file is 'foo'
        $ echo bar > bar/hoge
        $ ls -l bar
        $ cat bar/hoge
        $ tree
        $ echo baz >> foo/hoge
        $ cat foo/hoge         #Now says "foobaz"; no way to restore the original
        $ cat bar/hoge

Example: Content-Addressable

        $ cd cafs
        $ git init             #We will be using git as our content-addressable fs
        $ git count-objects    #There are no objects
        $ echo foo | git hash-object -w --stdin
        $ git count-objects    #Now there is one object!
        $ git cat-file -p 257cc5642cb1a054f08cc83f2d943e56fd3ebe99
        $ git cat-file -p 257c #As the number of objects increase you might need more characters...
        $ echo bar | git hash-object -w --stdin
        $ git count-objects    #Now there are two objects!
        $ git cat-file -p 5716
        $ echo `git cat-file -p 257c`baz | git hash-object -w --stdin
        $ git cat-file -p 9dab
        $ git count-objects    #Now there are three objects!
        $ find .git/objects -type f

gitの

中核

は

これだけ！

ただ、本当にこれだけだったらVCSとして使えないので、 gitはレイヤーで作られています。一番したのレイヤーは連想記憶ファイルシステムであり、その上に何が必要でしょうか？

それは、gitのオブジェクトモデルということです。

オブジェクトは

三

種類ある

        # To get all object types...
        $ find .git/objects -type f | while read f; do o=$(echo $f | awk -F'/' '{ print $3$4 }'); echo $f $(git cat-file -t $o); done

Blob

前hash-objectで作ったオブジェクトは全部ブロブでした
自分で作って入れている、ユーザーデータ
gitにとって意味がない、ただのデータ
バイナリでもテキストでもブロブになる

Tree

ファイルヒエラルキーを表せるためのオブジェクト
連想記憶ファイルシステムから取得して、トラディショナルなファイルシステムに変換する
一つのツリーは数ブロブと数ツリーを指定する

        $ cd gits-guts
        $ tree # This is the directory as seen through traditional fs eyes
        $ git cat-file -p e1a95ff8e2b1e585c184aca5325fdf48ebf0e4521 # This is the tree object representing that directory
        # Look at subdirectories, etc...

Commit

内容

トップレベルのツリーのSHA1
親のコミットのSHA1
作者と作成時間
コミッターとコミット時間
コミットメッセージ

ある時点のスナップショット
一つのコミットは一つのツリーを指定する
一つのコミットは数コミットを親コミットとして指定する
コミットは直接ブロブを指定しない

Tree: 二つ以上のツリーを指定するコミットはありません。注釈付きタグの場合だけ、コミットはツリーを指定しない。 Parents: 親がない場合もあり、二つ以上の親コミットがある場合もあります。二つ以上だったらマージコミットになります。 Blobs: They point to blobs via trees

連想記憶ファイルシステムで保存（？）するのはこの三つのオブジェクトタイプが、手動的にブロブやツリーを作って入れることは面倒でしょうね。そのために、次のレイヤーがあります。

        $ cd obj
        $ git cat-file -p 30047fba497d5a6bd3383d642c1e464461368199
        # Navigate from there...

The Index

又は

The Cache

又は

The Staging Area

導く　みちびく

コミットするための準備エリア
作業フォルダーのキャッシュ
データ自体は入っていない。git addをするとき、ブロブを作られる。そのブロブのidをindexに入れられる。
gitの速度はindexによってだいぶ変わるので、最適化のために何回も変わっています。現在使われているバージョンは２、３、４（？）
最初からgitの大事な部分でした。ツリーを読み込むときも、書き出すときも、現在作業フォルダーから直接作ることではなく、indexから作られています。
作業フォルダーが全部なくなっても、indexから取り戻すことができます。

gitの基本データはこれぐらいですが、まだちょっと不便なことがあります。それは、idからデータを取得することです。確かに、ツリーを持っていたらファイル名で取得できますが、そのツリーはどうやって取得できるでしょうか？コミットを持っていたらツリーを取れるけど、そのコミットは？ gitのストラクチャーを導かれる（？）ために、次のレイヤーが必要です。

Ref

ブランチもタグもレフの二つです。
違いは、コミットすればアクティブブランチは自動的に動きます。
タグはずっと同じコミットを指定します。
.git/refsのフォルダーに入っているファイル。ファイル名はブランチやタグの名前。内容はコミットのSHA1。

      $ cd refs
      $ git --no-pager log --oneline --decorate --graph
      # Point out tags, branches, remotes...
      $ tree .git/refs
      # See the same things are all there...
      $ cat .git/refs/heads/master
      # See it lists the sha1 of the master commit...
      $ vim README.md # Change it...
      # git commit -a -m "Changed"
      $ git --no-pager log --oneline --decorate --graph
      $ cat .git/refs/heads/master

配管

Plumbing

gitの連想記憶ファイルシステムやオブジェクトモデルを直接触るコマンド。
UIとしては使いにくいけど、プログラミング的には読みやすい出力するので、スクリプトしやすいです。
元々の７つのコマンドはほとんどPlumbingコマンドでした。

磁器

Porcelain

ユーザーが使う、UIとしてデザインされたコマンド。
git add, git commit, git push等。
多分、ご存知でしょう。

update-index 元 update-cache	apply checkout-index hash-object index-pack	merge-file merge-index mktag mktree
diff-files 元 show-diff	pack-objects prune-packed symbolic-ref unpack-objects	update-ref diff-index diff-tree for-each-ref
init-db*	ls-files ls-remote ls-tree merge-base	name-rev pack-redundant rev-list show-index
write-tree	show-ref tar-tree unpack-file var	verify-pack daemon fetch-pack http-backend
read-tree	send-pack update-server-info http-fetch http-push	parse-remote receive-pack shell upload-archive
commit-tree	upload-pack check-attr check-ref-format column	credential credential-cache credential-store fmt-merge-msg
cat-file	mailinfo mailsplit merge-one-file patch-id	peek-remote sh-i18n sh-setup stripspace

実はinit-dbは磁器のコマンドだが、元々の７つのコマンドだからこちらの方に書きました。

init	am archive bisect	bundle cherry-pick citool	clean describe diff	merge
clone	fetch format-patch gc	grep gui mv	notes revert rm	rebase
add	shortlog show submodule	tag gitk config	fast-export fast-import filter-branch	reset
commit	lost-found mergetool pack-refs	prune reflog relink	remote repack replace	log
push	repo-config annotate blame	cherry count-objects difftool	fsck get-tar-commit-id help	stash
pull	instaweb merge-tree rerere	rev-parse show-branch verify-tag	whatchanged gitweb archimport	checkout
status	cvsexportcommit cvsimport cvsserver	imap-send p4 quiltimport	request-pull send-email svn	branch

        # OK, some examples of how to do "normal git stuff" with just the original 7 commands!
        $ cd cmds

        # Initialise git.  This is synonymous with `git init`
        $ git init-db
        $ find .git/objects -type f

        # Let's create some content and add it to the staging area
        $ vim README.md # Write some stuff
        $ git update-index --add README.md

        # The blob file is made automatically when you add it to the index
        $ find .git/objects -type f
        $ find .git/objects -type f | awk -F'/' '{ print $3$4 }' | while read o; do echo $o $(git cat-file -t $o); done

        # We need to make a tree, though...
        $ git write-tree
        $ find .git/objects -type f | awk -F'/' '{ print $3$4 }' | while read o; do echo $o $(git cat-file -t $o); done
        $ git cat-file -p (SHA1)

        # OK, now we actually want to take a tree and make it into a commit object
        $ git commit-tree (SHA1) -m "Initial commit"
        $ git cat-file -p (SHA1)
        $ find .git/objects -type f | awk -F'/' '{ print $3$4 }' | while read o; do echo $o $(git cat-file -t $o); done

        # Let's cheat a bit and use the log to see if it worked
        $ git log

        # bad default revision 'HEAD'? What's that all about?
        $ tree .git

        # There's a file called head in there, what's in that?
        $ cat .git/HEAD

        # OK, but there aren't any refs... let's try just putting tha sha1 in that filename
        $ echo (SHA1) > .git/refs/heads/master
        $ git log

        # Awesome! It works.  Let's make a change.
        $ vim README.md #Change something
        $ git diff-files

        # What does that all mean? Left-to-right:
        #   permissions(src)
        #   permissions(dst)
        #   blob-sha1(src)
        #   blob-sha1(dst)
        #   status (added, removed, modified, etc)
        #   filename
        # The dest blob is 000.. because it hasn't yet been added to the index!
        $ git update-index README.md
        $ git diff-files
        $ find .git/objects -type f | awk -F'/' '{ print $3$4 }' | while read o; do echo $o $(git cat-file -t $o); done
        $ git write-tree

        # This time we want to write our commit message in a proper text editor
        $ vim .git/COMMIT_EDITMSG #Write it
        $ git commit-tree (SHA1) -p (LAST_SHA1) -F .git/COMMIT_EDITMSG
        $ git log # Master hasn't been updated!

        # OK, so commit-tree doesn't update any refs because refs exist in a
        # layer above these seven commands.  But we can easily update it
        # automatically...
        $ vim README.md # Change it
        $ git update-index README.md
        $ vim .git/COMMIT_EDITMSG
        $ git commit-tree $(git write-tree) -p (LAST_SHA1) -F .git/COMMIT_EDITMSG > .git/refs/heads/master

        # Hmm, wouldn't it be easier to wrap this process up into a script?
        $ vim git-commit
            #!/bin/bash

            TREE=$(git write-tree)
            HEAD=$(cat .git/refs/heads/master)

            echo "" > .git/COMMIT_EDITMSG
            vim .git/COMMIT_EDITMSG
            git commit-tree $TREE -p $HEAD -F .git/COMMIT_EDITMSG > .git/refs/heads/master

        $ chmod +x git-commit
        $ git update-index --add git-commit
        $ ./git-commit
        $ git log