Git, Snapshots, and Names

8.13

Git, Snapshots, and Names🔗

Central metaphor: Git = immutable snapshot DAG + movable names. Files are incidental; the core objects are snapshots and pointers. Basic operations like push/pull/rebase/merge/amend/fixup are just name moves plus new snapshots.

Motivation🔗

Everyone has experienced the final_final_v2.zip problem. Shared drives and Dropbox give everyone access to the same files, but they don’t answer the deeper questions: who changed what, when, and why? How do you manage multiple versions of a file without getting confused? What if two people edit the same file differently? What if you need to go back, not just to yesterday’s version, but to a precise state from last week?

Git is designed to solve these problems. It isn’t a fancier shared folder. It’s a history machine, tuned for source code.

9.1 The central metaphor🔗

Git is two things:

An immutable directed acyclic graph (DAG) of snapshots.
A set of movable names (references) pointing into that graph.

Internally, Git’s object model (blobs, trees, and commits) exists solely to capture snapshots and link them together into a history. This design underpins everything else in Git.

Before diving into these object types, we should clarify what a “snapshot” means in Git. A snapshot in Git is not literally a photograph, nor is it simply the contents of a few files. It represents the entire state of your project at one moment: which files exist, how they are organized into directories, and what each file contains. Each commit captures one such snapshot, and a repository’s history is just a sequence of these frozen states.

9.1.1 Cast of characters🔗

blob: the raw content of a file.
tree: the directory structure of a snapshot, mapping names to blobs or subtrees. (This is different from the “graph” in DAG: a tree here means a file hierarchy inside one snapshot, not the larger graph of commits.)
commit: metadata + a pointer to one tree + parent commits.
ref: a human-readable name pointing to a commit (like main).
HEAD: the special ref for “the branch I’m on now.”
index: the staging area for the next commit.
working directory (sometimes called working tree): your files on disk.

Snapshots are the durable objects. Names are how we interact with them.

9.2 Scene 1: Making a snapshot🔗

The unit of history in Git is the snapshot. Each commit freezes the whole project state, and commits are tied together into a graph that shows how one state led to another.

In everyday language we talk about “changes” to files. In Git’s model, a “change” simply means that one snapshot of the project differs from another: a file has different content, a file was added, a file was removed, or the directory structure itself shifted. Commits record those differences implicitly by pointing to a new snapshot.

git add stages changes into the index.
git commit writes a new tree, then a commit pointing to it.
Finally Git moves a name (your branch) to the new commit.

o <-- main, HEAD
|
o
|
o

9.3 Scene 2: Undo and safety nets🔗

Mistakes are inevitable. Git’s design lets you undo changes safely. Because commits are immutable, “undo” does not destroy history. It moves names (branch pointers—see next section).

git reset: moves a branch name (and optionally the index and working tree) backward or forward. It rewinds history locally. Safe if the branch has not been shared.
git revert: makes a new commit that undoes the changes from an earlier commit. It never deletes history, so it is the right choice when you need to “undo” something on a shared branch like main.
git reflog: local log of where HEAD and branch refs have been.
git stash: temporarily save uncommitted work as hidden commits.

These features are the safety net. If you “lose” something, it’s usually just that the branch name moved away from it. The commit is still there, and git reflog can find it.

9.4 Scene 3: Branches as names🔗

A branch is a pointer to a commit. Creating a branch just makes a new name.

When commits are added on that branch, the name moves forward.

Fast-forward merges are trivial: if one branch is already ahead of another, merging just moves the name forward.

9.5 Scene 4: Remotes🔗

One of the most important aspects of Git and version control is collaborating with others.

A remote is a mapping to another copy of the graph, usually on GitHub.

origin: your fork, where you push.
upstream: the course repository, where you fetch from.

git fetch upstream moves the remote-tracking names (like upstream/main). I.e., it updates your local pointers like upstream/main to match the latest state of the upstream repo. Your local branch is untouched. Then git rebase upstream/main replays your work on top of the new base. Finally git push origin my-branch moves the name on your fork.

What does git pull do? It is shorthand for git fetch plus a merge (or rebase, if configured) of the remote branch into your current one.

9.6 Scene 5: Merging vs. rebasing🔗

If two different snapshots try to change the same lines of a file, Git cannot combine them automatically. This is reported as a conflict, and you must edit the file to decide which version (or what combination) is correct. Once resolved, the merge continues as usual with a new commit.

Suppose development diverges:

A---B---C---E   (main)
       \
        D       (feature)

Two ways to integrate:

Merge: make a new commit with two parents, preserving both branches’ histories (both lines of development).
Rebase: replay D on top of E, making a new D’. Produces a linear story.

Rebase:
A---B---C---E---D'

Merge records what actually happened. Rebase rewrites history to look simpler. Which is better? Merges can clutter, but are safe. Rebases keep history tidy, but should only be used on private branches (not on commits others may already have).

How to remember: in a rebase, the branch you specify is the new base. Your commits are lifted up and replayed “on top” of it. If you can picture which history should be on the bottom (the base) and which set of commits should be placed above, the mechanics of rebase are easier to recall.

9.7 Scene 6: History hygiene🔗

Commits should tell a story. Git has tools to adjust history before you share.

git commit –amend: adjust the most recent commit.
git commit –fixup <sha>: create a fix targeted at an earlier commit.
git rebase -i –autosquash: fold fixups and reorder history.
git push –force-with-lease: update a branch after rewriting, but with a safety check.

This process allows you to present a clean history in the end, without losing the benefit of your intermediate steps.

This flexibility in editing history is one of Git’s killer features. The promise we made back in the Motivation section—never again losing work to “final_final_v2.zip” disasters—is fulfilled here. Because commits are immutable and references can always be moved around, you can experiment freely and still recover any earlier state.

9.8 Scene 7: Beyond the basics🔗

GitHub adds layers on top of the raw graph.

Pull requests: a proposal to move a branch name on the shared repository (e.g. merging your feature branch into main).
Actions: automation that runs on each push (tests, deployments).

Git also has git bisect, a command that uses binary search over the DAG to find the exact commit where a bug was introduced. It is one of the “killer features” made possible by treating history as immutable snapshots.

contents ← prev up next →

	CSAS 2123 Syllabus
	Texts
	Lectures
	Schedule
	Recitations
	Assignments
	Pair Programming Overview
	Code style
	Documentation

	Lecture 1: Data Definitions in Java
	Lecture 2: Data Definitions: Unions
	Lecture 3: Methods for simple classes
	Lecture 4: Methods for unions
	Lecture 5: Methods for self-referential lists
	Lecture 6: Accumulator methods
	Lecture 7: Accumulator methods, continued
	Lecture 8: Practice Design
	Git, Snapshots, and Names
	Lecture 9: Abstract classes and inheritance
	Java at the Command Line
	Lecture 10: Customizing constructors for correctness and convenience
	Lecture 11: Defining sameness for complex data, part 1
	Lecture 12: Defining sameness for complex data, part 2
	Lecture 13: Abstracting over behavior
	Lecture 14: Abstractions over more than one argument
	Lecture 15: Abstracting over types
	Lecture 16: Visitors
	Lecture 17: Mutation
	Lecture 18: Mutation inside structures
	Lecture 19: Mutation, aliasing and testing
	Lecture 20: Mutable data structures
	Lecture 21: Array Lists
	Lecture 22: Array Lists
	Lecture 23: For-each loops and Counted-for loops
	Lecture 24: While loops
	Lecture 25: Iterator and Iterable
	Lecture 26: Hashing and Equality
	Lecture 27: Introduction to Big-O Analysis
	Lecture 28: Quicksort and Mergesort
	Lecture 29: Priority Queues and Heapsort
	Lecture 30: Breadth-first search and Depth-first search on graphs
	Lecture 31: Dijkstra’s Algorithm for single-source shortest paths
	Lecture 34: Implementing Objects

	Motivation
9.1	The central metaphor
9.2	Scene 1: Making a snapshot
9.3	Scene 2: Undo and safety nets
9.4	Scene 3: Branches as names
9.5	Scene 4: Remotes
9.6	Scene 5: Merging vs. rebasing
9.7	Scene 6: History hygiene
9.8	Scene 7: Beyond the basics