Git Internals - Git Objects¶
Here we follow the famous Git Internals - Git Objects tutorial. The example below
is meant to be complementary to it and in no way a substitute. It is assumed that, as a
preliminary step, an empty git repository is created (git init
).
Git blob¶
A blob is an objects that stores data. This data could e.g., be the content of a file or, in general, a sequence of raw bytes with no specific structure or interpretation. A blob is identified using the SHA-1 hash of its data.
echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4
The git hash-object
command above outputs the SHA-1 hash of the 'test content'
string and registers a blob in the git object store. Let us visualize it:
git dag -B --blobs-standalone
The above figure depicts one blob whose full hash is displayed in the tooltip (the meaning of “Standalone Blobs & Trees” would become clear shortly).
Following the tutorial, we create another blob, this time from a file (test.txt
)
echo 'version 1' > test.txt
git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30
git dag -B --blobs-standalone
Next, update the content of the test.txt
file, and register it in the git object
store:
echo 'version 2' > test.txt
git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
git dag -B --blobs-standalone
Note that, the 83baae6
and 1f7a7a4
blobs do not contain information related to
the name of the file (test.txt
) whose data they store.
Git tree¶
A git tree object allows to group blobs and other trees together (much like a directory groups files and other directories). A tree object is normally created by taking the state of the staging area:
git update-index --add --cacheinfo 100644 83baae61804e65cc73a7201a7252750c76066a30 test.txt
git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
Trees as well are identified using the SHA-1 hash of the data they contain.
git dag -T -B --trees-standalone --blobs-standalone
The tooltip of the 83baae6
blob is now the actual name of the file whose data it
stores (the name has been retrieved from the containing tree object).
In a similar way we can create another tree that contains the second version of
test.txt
and a new file as well.
echo 'new file' > new.txt
git update-index --add --cacheinfo 100644 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
git update-index --add new.txt
git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
git dag -T -B --trees-standalone --blobs-standalone
Next we create a tree that contains another tree:
git read-tree --prefix=bak d8329fc
git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
git dag -T -B --trees-standalone --blobs-standalone
Similar to a blob, a tree does not include information about its own name. However, if
contained in another tree, its name can be retrieved (see the tooltip of d8329fc
and compare it with the tooltips of 3c4e9cd
and 0155eb4
).
Git commit¶
A commit object contains information about who, when and why created a given tree and what are the parent commit(s) from where it descended. Each commit has exactly one associated tree (which of course may contain sub-trees).
GIT_AUTHOR_NAME="First Last"
GIT_AUTHOR_EMAIL="first.last.mail.com"
GIT_COMMITTER_NAME="Nom Prenom"
GIT_COMMITTER_EMAIL="nom.prenom@mail.com"
# by fixing the author and committer dates as well, we have reproducible commit hashes
GIT_AUTHOR_DATE="01/01/25 09:00 +0100"
GIT_COMMITTER_DATE="01/01/25 09:00 +0100"
SHA_FIRST_COMMIT=$(echo 'First commit' | git commit-tree d8329fc)
SHA_SECOND_COMMIT=$(echo 'Second commit' | git commit-tree 0155eb4 -p $SHA_FIRST_COMMIT)
SHA_THIRD_COMMIT=$(echo 'Third commit' | git commit-tree 3c4e9cd -p $SHA_SECOND_COMMIT)
echo $SHA_FIRST_COMMIT
echo $SHA_SECOND_COMMIT
echo $SHA_THIRD_COMMIT
fa26b470d9508bebe2029623de8770215ebb26a0
03c5025d075bbe625608593e3bf4671daebebcc4
aa6ef7bc380e3e98362b4276d24b8046b1f4f758
git dag -T -B -u -n 0 --trees-standalone --blobs-standalone
As with blobs and trees, commits are identified using the SHA-1 hash of the data they
contain (see their tooltips). Our three commits are currently unreachable from any
branch or tag (this is due to the nature of the plumbing command git commit-tree
that was used to create them). Furthermore, they don’t even appear in the reflog
–
because of this, their associated trees and blobs are included in the “Standalone Blobs
& Trees” cluster [1].
Git tag¶
A tag is a label (with additional metadata) assigned to a particular point in the git history. This is the fourth (and last) git object – the other ones being blobs, trees and commits.
git tag first-commit -m "First commit" $SHA_FIRST_COMMIT
git dag -T -B -t -u -n 0 --trees-standalone --blobs-standalone
The colour of the first commit has changed as it is now reachable through our (annotated) tag. Because of this, its child tree and blob are not considered as standalone anymore.
Branch¶
A branch is a label of the most recent commit of a given line of development. It is not a git object in the same way as blobs, trees, commits and (annotated) tags are.
git branch main $SHA_THIRD_COMMIT
git dag -T -B -H -l -t -u --blobs-standalone
Note that the tooltip of the main branch is -> None
– this implies that it doesn’t
track any remote branch.
Let us reset the main branch to point to the second commit. The first commit would now become unreachable from a branch or a tag, however it is reachable from the reflog (the reflog records the reset operation) and thus its child tree and blob are not considered as standalone.
git reset $SHA_SECOND_COMMIT
git dag -T -B -H -l -t -u --blobs-standalone
If we checkout the second commit, the HEAD becomes detached (now HEAD points directly to the second commit and its box in the visualized DAG has a border).
git checkout $SHA_SECOND_COMMIT
git dag -T -B -H -l -t -u --blobs-standalone
Lightweight tag¶
Adding a lightweight tag to point to the currently unreachable third commit makes it reachable again (lightweight tags are colour-coded differently from annotated tags and their tooltip is different as well).
git tag third-commit $SHA_THIRD_COMMIT
git dag -T -B -H -l -t -u --blobs-standalone
In the end, we have only one standalone blob left. Visualizing trees and blobs is
reasonable for educational purpose for small repositories only. Skipping them results in
(here, the -u
flag is superfluous as there are no unreachable commits):
git dag -H -l -t -u