Git Internals: How Git Actually Works Under the Hood

Most developers use Git daily but treat it as a black box. They memorize commands without understanding what happens underneath. This leads to panic during rebases, confusion during merge conflicts, and fear of git reflog.

Once you understand Git internals, the scary commands become ordinary data-structure operations. A branch is a movable reference. A commit is an immutable object in a graph. The index is the next snapshot being prepared. The reflog is a local record of where references used to point. This guide walks through those pieces and shows how Git stores, names, moves, compresses, and recovers your code.

The Four Pieces Git Manages

The official Git data model describes four core kinds of repository data:

Objects: blobs, trees, commits, and annotated tag objects.
References: branches, tags, remote-tracking branches, HEAD, and other names that point to objects or other refs.
The index: the staging area, stored mostly as a flat list of paths and object IDs.
Reflogs: local logs that record updates to refs so you can recover earlier positions.

Everything else Git does - branching, merging, rebasing, checkout, reset, garbage collection, fetch, push - is built on those pieces.

Git Is a Content-Addressable Filesystem

At its core, Git is a content-addressable object database. You give Git content and an object type; Git computes an object ID from the type, size, and content. In traditional repositories that ID is SHA-1. Newer repositories can be initialized with SHA-256, so it is more accurate to say object ID than to assume every repository is SHA-1 forever.

# Store stdin as a blob object and print its object ID
echo "Hello, Git" | git hash-object --stdin -w
# Example output: 3fa0d1ac21b29b96ee682541d4be0b3a0a89f5af

# Ask Git what type that object is
git cat-file -t 3fa0d1a
# Output: blob

# Pretty-print the object's contents
git cat-file -p 3fa0d1a
# Output: Hello, Git

The important detail: the filename is not part of a blob's identity. Two files with identical bytes share the same blob object. The path, mode, and directory structure live in tree objects.

The Four Object Types

1. Blob: File Content

A blob stores file content. It does not store the filename, path, permissions, timestamps, owner, or commit message. If README.md and docs/intro.md contain exactly the same bytes, Git can point both tree entries at the same blob.

# The same content produces the same blob ID
echo "same content" | git hash-object --stdin
echo "same content" | git hash-object --stdin

2. Tree: Directory Listing

A tree represents a directory. It maps names to blobs, subtrees, or submodule commits, and it stores Git's limited file modes such as regular file, executable file, symlink, directory, and gitlink.

# View the root tree of the latest commit
git cat-file -p HEAD^{tree}

# Example:
# 100644 blob a1b2c3d4...   README.md
# 100644 blob e5f6a7b8...   package.json
# 040000 tree 1a2b3c4d...   src
# 160000 commit 9f8e7d6c... vendor/library  # submodule gitlink

3. Commit: Snapshot Plus History

A commit points to a top-level tree, zero or more parent commits, author and committer metadata, and a message. A normal commit has one parent, the first commit has none, and a merge commit has two or more. Git does not store a commit as a diff; when you ask for a diff, Git compares the commit's tree with its parent tree on demand.

git cat-file -p HEAD

# Example:
# tree 4b825dc642cb6eb9a060e54bf899d4e239f3b764
# parent 8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d2e1f0a9b
# author Vishal Anand <vishal@example.com> 1714233600 +0530
# committer Vishal Anand <vishal@example.com> 1714233600 +0530
#
# Add user authentication module

4. Annotated Tag Object: A Named Release Record

Lightweight tags are refs that point directly at an object. Annotated tags are different: they create a real tag object that points at another object, records the tagger and date, and stores a tag message. This is why signed release tags carry more information than a simple branch-like pointer.

git tag -a v1.0.0 -m "Release v1.0.0"
git cat-file -p v1.0.0

# object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3
# type commit
# tag v1.0.0
# tagger Vishal Anand <vishal@example.com> 1714233600 +0530
#
# Release v1.0.0

GIT OBJECT GRAPH tag object v1.0.0 message commit tree + parent(s) author + message parent commit previous snapshot root tree README.md -> blob src -> subtree blob README bytes subtree src directory blob main.ts bytes points to parent tree

Inside the .git Directory

A normal working tree has a .git directory at its root. Worktrees and submodules can instead have a small .git file that points to the real Git directory, but the layout below is the common shape:

.git/
  HEAD                 # symbolic ref, or direct object ID in detached HEAD
  config               # repository configuration
  index                # staging area (binary)
  objects/
    3f/a0d1ac21...      # loose object: first 2 hex chars are directory
    pack/
      pack-abc.pack     # many compressed objects
      pack-abc.idx      # offsets for random access into the pack
  refs/
    heads/main          # local branch ref
    tags/v1.0.0         # tag ref
    remotes/origin/main # remote-tracking branch
  packed-refs           # compact storage for refs that are not loose files
  logs/
    HEAD                # HEAD reflog
    refs/heads/main     # branch reflog

Loose objects are stored below objects/ using the first two hex characters as the directory name. Packfiles collect many objects into compressed files with index files so Git can jump directly to a requested object.

Refs, Branches, and HEAD

A branch is a reference to the latest commit in a line of work. In a small repository it may be a plain text file under .git/refs/heads/, but refs can also be stored in .git/packed-refs. So the useful mental model is not “a branch is always a 41-byte file”; it is a branch is a movable name for a commit.

# Resolve refs without depending on their storage format
git rev-parse main
git show-ref --heads

# Safely update a ref at the plumbing level
git update-ref refs/heads/experiment HEAD

HEAD usually points to the current branch as a symbolic ref:

cat .git/HEAD
# ref: refs/heads/main

When you check out a commit, tag, or remote-tracking branch directly, HEAD can point straight at a commit object ID. That is detached HEAD state. Commits made there are real commits, but no branch name will move with them unless you create one.

HEAD, LOCAL REFS, AND DETACHED HEAD HEAD ref: refs/heads/main refs/heads/main commit C3 C3 C2 C1 refs/remotes/origin/main last fetched remote state detached HEAD direct commit ID Remote-tracking refs move on fetch. Local branch refs move on commit, merge, reset, or update-ref. Detached HEAD moves directly, without a branch name.

The Three Trees: HEAD, Index, Working Directory

A practical way to understand add, commit, restore, checkout, and reset is Git's three-tree model:

HEAD: the last committed snapshot and the default parent of the next commit.
Index: the proposed next commit snapshot.
Working directory: your editable files on disk.

The index is not literally a recursive tree on disk. It is primarily a flat list of path entries. Each normal entry has a mode, an object ID, a stage number, and a path. During conflicts the same path can appear at stages 1, 2, and 3: common ancestor, ours, and theirs.

git ls-files --stage

# Normal staged entries use stage 0
# 100644 a1b2c3d4... 0  README.md

# A conflicted path can have multiple stages
# 100644 1111111... 1  app.ts   # common ancestor
# 100644 2222222... 2  app.ts   # ours
# 100644 3333333... 3  app.ts   # theirs

THE THREE TREES Working Directory editable files sandbox Index proposed next commit .git/index HEAD last commit snapshot next parent git add git commit checkout / restore writes committed or indexed content back to the working tree reset moves refs and may reset index/working tree

How a Commit Is Created

The porcelain command git commit hides several plumbing steps:

git add writes new blob objects for changed file contents and records their IDs in the index.
git write-tree turns the index into tree objects.
git commit-tree creates a commit object that points to the root tree and parent commit.
The current branch ref is updated to the new commit, and the reflog records the move.

# What porcelain roughly builds on:
git hash-object -w README.md
git update-index --add README.md
tree=$(git write-tree)
commit=$(echo "Add README" | git commit-tree "$tree" -p HEAD)
git update-ref refs/heads/main "$commit"

You normally should not create commits this way, but these commands explain what the higher-level workflow is doing.

Reachability: What Keeps Objects Alive

Git objects are immutable. Updating a branch does not edit an old commit; it moves a ref to a new commit. Objects stay protected while they are reachable from a ref, tag, remote-tracking branch, stash, or reflog. Once an object is no longer reachable from any of those places, garbage collection may eventually prune it.

# Commits reachable from main
git rev-list main

# Objects that are not reachable from refs
git fsck --unreachable

# Expire old reflog entries and prune unreachable objects (dangerous if forced)
git gc --prune=now

This reachability rule is why git commit --amend, git rebase, and git reset feel like they rewrite history. They actually create or select different objects, then move refs.

How Merge Actually Works

A merge has two common outcomes. If the target branch is already an ancestor of the branch being merged, Git can do a fast-forward: it just moves the target branch ref forward. If both branches have new commits, Git finds the merge base and performs a three-way merge between the base, ours, and theirs. A true merge commit has two or more parents.

# Diverged history:
#   A --- B --- C (main)
#          \
#           D --- E (feature)

git switch main
git merge feature

# True merge:
#   A --- B --- C --- M (main)
#          \         /
#           D --- E (feature)

git cat-file -p HEAD
# parent c1c2c3c4...   # previous main
# parent e1e2e3e4...   # feature tip

During a conflict, the index stores multiple staged versions for the same path. When you resolve the file and run git add, Git replaces those stages with a normal stage-0 entry. The final git commit creates the merge commit.

MERGE VS REBASE Merge keeps both lines A B C M E D two parents Rebase copies changes onto a new base A B C D' E' D E D and E are old commits. D' and E' are new commits with new IDs.

How Rebase Actually Works

Rebase finds the common ancestor, records the changes introduced by each commit on your branch, resets the branch to the new base, and applies those changes in order. The resulting commits usually have different parents, timestamps, and object IDs even if their patches look the same.

# Before:
#   A --- B --- C (main)
#          \
#           D --- E (feature)

git switch feature
git rebase main

# After:
#   A --- B --- C (main)
#                \
#                 D' --- E' (feature)

The original D and E objects do not vanish immediately. They are just no longer named by the branch after it moves. The reflog normally gives you a recovery path until those entries expire and garbage collection can prune unreachable objects.

That is also the reason behind the common rule: do not rebase commits other people are already using unless the team has explicitly agreed to rewrite that shared history.

Reflog: Your Local Safety Net

Reflogs record updates to refs in your local repository. git reflog defaults to showing the HEAD reflog, and HEAD's reflog also records branch switches. Branches, remote-tracking branches, and other refs can have reflogs too. They are not pushed to remotes.

# View HEAD's recent positions
git reflog

# View a specific branch reflog
git reflog main

# Example:
# abc1234 HEAD@{0}: commit: Add new feature
# def5678 HEAD@{1}: rebase: moving to main
# 9ab0cde HEAD@{2}: checkout: moving from main to feature
# fgh1234 HEAD@{3}: reset: moving to HEAD~3

# Recover from an accidental reset:
git reset --hard HEAD@{3}

# Recover by creating a new branch instead:
git branch recovered-work HEAD@{3}

Default expiry is commonly 90 days for reachable reflog entries and 30 days for unreachable entries, controlled by gc.reflogExpire and gc.reflogExpireUnreachable. Treat that as a recovery window, not a backup policy.

Packfiles and Garbage Collection

Git initially writes objects as loose objects. Loose objects are compressed individually and stored below .git/objects. Over time, Git consolidates many objects into packfiles to reduce disk usage and improve lookup performance. A packfile stores many compressed objects, often with deltas between similar objects, and an .idx file maps object IDs to offsets inside the pack.

# Trigger repository housekeeping manually
git gc

# Inspect pack contents
git verify-pack -v .git/objects/pack/pack-*.idx

# See loose and packed object counts
git count-objects -v

git gc also performs housekeeping around unreachable objects, packed refs, reflogs, rerere metadata, stale worktrees, and sometimes ancillary indexes such as the commit-graph. Many porcelain commands can run lightweight automatic maintenance when repository thresholds are crossed; the default loose-object threshold for gc.auto is approximately 6700.

OBJECT LIFECYCLE new object loose + zlib reachable ref / tag / reflog unreachable no protecting name packfile compressed + deltas expired reflog window ends pruned deleted commit/tag/ref orphaned git gc expire prune later ref/reflog loss

Useful Plumbing Commands

Command	Purpose
`git cat-file -t SHA`	Show object type: blob, tree, commit, or tag.
`git cat-file -p SHA`	Pretty-print object contents.
`git hash-object -w FILE`	Compute an object ID and optionally write a blob.
`git update-index`	Manipulate the index at the plumbing level.
`git write-tree`	Create tree objects from the current index.
`git commit-tree`	Create a commit object from a tree and parent(s).
`git ls-files --stage`	Show index entries and conflict stages.
`git rev-parse HEAD`	Resolve a revision or ref to an object ID.
`git show-ref`	List refs and the object IDs they point to.
`git symbolic-ref HEAD`	Read or update symbolic refs such as HEAD.
`git merge-base A B`	Find the common ancestor used for a three-way merge or rebase.
`git verify-pack -v`	Inspect objects and deltas inside a pack index.
`git fsck`	Verify object database integrity and find unreachable objects.
`git reflog <ref>`	Show previous values of a local ref.

Key Takeaways

Git stores immutable objects: blobs, trees, commits, and annotated tag objects.
Object IDs are content-derived: SHA-1 is traditional, SHA-256 repositories exist, and filenames are not part of blob identity.
Commits are snapshots, not diffs: Git calculates diffs by comparing trees when you ask.
Branches are movable refs: they may be loose files or packed refs, but conceptually they point to commits.
HEAD is usually symbolic: in detached HEAD it points directly to a commit instead of a branch.
The index is the proposed next commit: conflict stages explain why merge conflicts are more structured than they look.
Merge preserves topology: a true merge commit records multiple parents; a fast-forward is just a ref move.
Rebase creates new commits: old commits usually remain recoverable through reflogs for a while.
Packfiles keep Git compact: git gc consolidates objects, compresses similar data, and eventually prunes unreachable objects.

Official References Used

This article was checked against the official Git book and Git manual pages:

Understanding Git internals transforms it from a scary tool into a simple one. A branch is a pointer-like ref. A commit is an immutable snapshot in a graph. The index prepares the next snapshot. The reflog remembers where local refs have been. Once you see the data structures, Git commands stop being incantations and start being logical operations on names, objects, and reachability.

Git Internals: How Git Actually Works Under the Hood

The Four Pieces Git Manages

Git Is a Content-Addressable Filesystem

The Four Object Types

1. Blob: File Content

2. Tree: Directory Listing

3. Commit: Snapshot Plus History

4. Annotated Tag Object: A Named Release Record

Inside the .git Directory

Refs, Branches, and HEAD

The Three Trees: HEAD, Index, Working Directory

How a Commit Is Created

Reachability: What Keeps Objects Alive

How Merge Actually Works

How Rebase Actually Works

Reflog: Your Local Safety Net

Packfiles and Garbage Collection

Useful Plumbing Commands

Key Takeaways

Official References Used

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

The Four Pieces Git Manages

Git Is a Content-Addressable Filesystem

The Four Object Types

1. Blob: File Content

2. Tree: Directory Listing

3. Commit: Snapshot Plus History

4. Annotated Tag Object: A Named Release Record

Inside the .git Directory

Refs, Branches, and HEAD

The Three Trees: HEAD, Index, Working Directory

How a Commit Is Created

Reachability: What Keeps Objects Alive

How Merge Actually Works

How Rebase Actually Works

Reflog: Your Local Safety Net

Packfiles and Garbage Collection

Useful Plumbing Commands

Key Takeaways

Official References Used

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

Bronze, Silver, and Gold Data Layers Explained

Monorepo vs Polyrepo: How to Structure Your Codebase at Scale