Tax Day? git add 1040.pdf

April 7, 2026

When I have an important PDF to store, like my tax returns or a contract, I prefer to store it in git rather than something like Dropbox, Google Drive or iCloud.

There’s the warm, fuzzy feeling I get when enacting the add / commit / push ritual, one of tranquility, permanence and order. I am hermetically sealing my document in a robust time capsule. Banish the thought, the sheer horror, of a bug causing the deletion of this document and its remote copy!

I also get a golden opportunity to add a commit message. If a document is worth holding onto, it’s worth adding a word or two before the context slips away.

On the other side of this equation, it’s useful to have git-style snapshots of my document set at any point in time. There might be important signals in groups of documents being added or removed in a single commit.

When working on behalf of a company, the same logic applies, only more so. Git allows sharing patterns we all know well.

In general, git is an expressive and deliberate way to hold onto documents, and it suits me. I realize this won’t be true for everyone, but it’s worth a try, if you haven’t tried it already. Git: it’s good for more than just code!

E2EE server-side git storage

Problem: which server to push to?

I don’t like the default idea, GitHub, since I don’t trust cloud providers and their many layers of infrastructure to keep my documents secret. We all know the hazards.

What about self-hosting? You could run your own git server, but then you need to worry about availability, backups, NAT-punching, and credentialling your team so they can get onto your LAN.

For many people, managed hosting is the right play. The key unlock for “storing important files in cloud-managed git” is end-to-end encrypted git storage. The client encrypts the commit, with its many constituent objects, and the server sees encrypted blobs. File names, branch names and commit hashes remain secret just like file contents.

Inspired by Keybase git, I wrote a new system called FOKS. FOKS git has similar encryption properties and works well across multiple devices and teams. But FOKS has new features, like federation, PQ-security, teams of teams, YubiKey support, and more. FOKS is fully open-source, server and all. Lastly, it takes a different implementation approach to git. It doesn’t build upon a file system abstraction but instead implements git as a layer on top of an encrypted key-value store.

How does FOKS’s encrypted git work?

First, for a quick recap of how git normally works. When you run git add and git commit, git turns your deltas into git objects, and it writes these objects to a simple database under the .git directory. A data structure encapsulates the preexisting state of the repository, the new deltas, and the commit message and metadata. The hash of this data structure is the new “commit hash”. Git updates a human readable pointer in .git/refs (like main or my-branch-3) to point to this new commit hash. In other words, these operations change .git to reflect your checkout. Operations like git checkout and git reset do the opposite.

Other git commands like git push and git fetch sync the local .git directory with a remote server. They do this via git’s remote helper protocol:

YOUR MACHINE Working Directory (your files on disk) Local .git (object store) git add / commit git checkout git internals INTERNET git push git fetch remote helper protocol Git Server

On git fetch, the git client calls into the remote helper protocol to start at a branch name, and sync all objects into .git so that the client can reconstruct the commit graph and the checkout for that branch. git push does the opposite, pushing objects to the server that it doesn’t already have, so that other clients pulling from the server can reconstruct the commit graph that the current client has.

A design goal of FOKS git is to maintain compatibility with existing git clients. Hence, the remote helper protocol provides the logical integration point. The FOKS flow works as follows:

YOUR MACHINE Working Directory (your files on disk) Local .git (object store) git add / commit git checkout git internals push fetch remote helper protocol FOKS Agent (encrypt / decrypt) INTERNET encrypted blobs put get FOKS KV Server

Consider an example where the remote helper asks to push a branch issue-42. This call gets routed to the FOKS agent, which walks the commit graph starting from the root of issue-42, and stopping when it reaches objects the server already has. Each object is encoded as a key-value pair: the key is an HMAC of the object hash, and the value is the encryption of the object itself. The secret keys used in the HMAC and encryption are shared among the devices of the users who have access to this repository. These keys rotate whenever a user revokes a device or a team evicts a user who had access to the repository.

Crucially, all encryption is authenticated. A malicious server cannot inject bogus commits without knowing the shared secret keys.

The agent pushes this key-value pair to the FOKS KV server, after confirming the server doesn’t already have it. Isn’t this a server round-trip on every node in the commit graph and therefore insanely slow? Yes, we will get to a key optimization in a bit.

After all the objects are in place, the agent pushes the reference issue-42 to the server as a key-value pair. The key is the HMAC of issue-42; the value is the encryption of issue-42’s commit hash.

git fetch does the opposite. The remote helper asks to fetch a branch name, like issue-42. Using the user (or group’s) shared key, the agent HMACs issue-42 to get the key for the reference, and requests this key from the server, which returns the encrypted commit hash. The agent decrypts to get the commit hash, then requests the corresponding commit object. It HMACs the commit hash as the key and requests the corresponding value from the remote KV store. The agent decrypts this value to get the commit object, and then recursively walks the tree, populating the local .git directory with objects needed to checkout issue-42.

Isn’t this insanely slow?

Yes. Modern git servers like GitHub have the benefit of fully understanding the structure of git repositories, and can deliver everything the client lacks in a single packfile. The typical flow is for a remote client to request a commit hash, along with a list of commits it already has. The server can walk the commit graph too, and responds with a single archive (or “packfile” in git-speak) containing all the objects the client lacks. The client writes this packfile into .git and voilà, finito. Checkout can extract the objects from the packfile when needed.

Our encrypted KV server is, by contrast, in the dark and therefore unable to guess which objects the client needs. In addition, the end-to-end flow lives within the confines of the existing remote helper protocol, from which we cannot deviate. Come hell or high water, the client must start with a remote branch name, and leave the local .git directory in a state where the checkout of that branch works.

Our optimization is as follows. We mentioned packfiles above, which can be quite large, as they contain many objects. We could naively program clients to sync down all packfiles, oblivious to what they contain. This would reduce round trips but would waste bandwidth, since those packfiles might correspond to branches that the client doesn’t care about. However, every packfile has an associated index file, which is a list of all the object hashes contained in the packfile. Index files are typically quite small, so there is little downside to eagerly slurping these down from the server.

Thus, when a client is searching for objects on the server, it: first downloads all index files; then scans the indices for the object it needs; and only downloads the packfile that contains the object it needs, under the supposition that the packfile contains the other objects in the desired commit graph. Under the hood, standard git repack tries to optimize packfiles to ensure this proximity assumption holds.

On the push side, FOKS git clients eagerly repack the local .git directory and push new packfiles to the server. There is a small privacy hitch here: if you push a commit on branch A, some commits on branch B might go along for the ride. In a team context, work on a branch might become visible to other team members before it’s explicitly pushed. FOKS users get a warning to this effect and can choose to disable this optimization if they want.

The packfile index optimization also speeds up push operations. We said above that clients should not push objects the server already has. Doing so would waste bandwidth. The indices provide a local manifest of the server’s object inventory and can be checked without a round trip.

Try it today

Getting started with FOKS is easy:

$ brew install foks
$ foks signup -u max-bialystok
$ foks git create taxes
$ git remote add origin foks://foks.app/max-bialystok/taxes
$ git add 2025-irs-1040.pdf 2025-nys-it201.pdf
$ git commit -m "First return; new accountant --- Leo Bloom"
$ git push origin main
$ curl -fsSL https://pkgs.foks.pub/install.sh | sh
$ foks signup -u max-bialystok
$ foks git create taxes
$ git remote add origin foks://foks.app/max-bialystok/taxes
$ git add 2025-irs-1040.pdf 2025-nys-it201.pdf
$ git commit -m "First return; new accountant --- Leo Bloom"
$ git push origin main
$ winget install foks
$ foks signup -u max-bialystok
$ foks git create taxes
$ git remote add origin foks://foks.app/max-bialystok/taxes
$ git add 2025-irs-1040.pdf 2025-nys-it201.pdf
$ git commit -m "First return; new accountant --- Leo Bloom"
$ git push origin main

Give it a whirl this tax season!

✌️️ Max 🔑

Credits

Thanks to Chris Coyne for reading drafts of this post.

FAQ for FOKS

What about mobile?

That would be great, mobile support is hopefully coming soon.

What about LFS?

In current use cases, it’s desirable to have a full checkout of the repository, so we haven’t needed LFS. We’re not opposed to supporting it in the future if there’s demand.

What about signing commits?

It actually does make sense to sign commits in addition to authenticated encryption, as this would prevent members of a team from impersonating each other. We haven’t implemented this yet, but it’s possible. Each user would get an SSH signing key that all their devices could access. Other team members could verify that signatures correspond to these advertised keys. One major snag here is this only makes sense if the repository is using SHA256 hashes rather than the horrible SHA1 default. Note, however, that FOKS’s current authenticated encryption, layered underneath the git layer, does not rely on the security of SHA1.

What about indexing my documents?

In a bygone era, we needed Google or Dropbox to index PDFs so that we could access them via search. Those days are over! LLMs are the new way. Now your document store is a bag of PDFs, some text files describing the graph structure of your documents, and text access logs recording how you accessed them. These files fit perfectly in git:

diff --git a/wiki/log.md b/wiki/log.md
index b84b989..2e65a3e 100644
--- a/wiki/log.md
+++ b/wiki/log.md

+## [2026-04-07] query | Personal taxable income chart 2020–2024
+
+Question: graph my taxable income over the last 5 years.
Show more
+Pulled Form 1040 line 15 from each of 2020-2024 via `pdftotext -layout` on
+[`personal/<year>/return/`](../personal/). Rendered a bar chart with matplotlib
+(via `uv run --with matplotlib`) and filed the chart + data table back to the
+wiki at [`concepts/taxable-income-history.md`](concepts/taxable-income-history.md);
+chart PNG at [`charts/taxable-income-2020-2024.png`](charts/taxable-income-2020-2024.png).
+
+| Year | Taxable income |
+| ---- | -------------- |
+| 2020 | $54,906       |
+| 2021 | $95,796    |
+| 2022 | $87,096     |
+| 2023 | $80,068     |
+| 2024 | $79,379 (as originally filed) |
+
+2024 carries the same amendment caveat as the carryover query — the on-disk
+1040-X is signature pages only, and the implied amended TI is ~$79k. Flagged
+in the new concept page.

Doesn’t using LLMs for indexing leak my data to the LLM provider?

Not if you run your own model locally.