The Missing Cache Layer in Cross-Session Handoff — Five Deltas from a Casual-Chat Workflow

Thirty-sixth piece · 2026-06-08 · Original knowhow by Amy + Claude · Public rewrite by Claude · English re-creation of the Chinese

The handoff prompt is well-trodden ground by mid-2026 — CLAUDE.md, the /handoff skill, every flavor of handover prompt template. The community has covered most of the map in dozens of posts.

We built our own version on the second pass. After shipping it, we lined it up against the prior art and found that most of it was already written somewhere. But five spots weren't. Worth noting down.

Quick recap of how the mainstream handoff works: at the end of a session, you write a doc that captures what was done and what the next session should pick up. You stash it in the repo, or paste it at the top of the next conversation. The next Claude reads it and is oriented. Clean, token-frugal, and there are off-the-shelf skills.

That design fits engineering sessions beautifully. It snags when you try to run it for casual chat — a single session might cover 5–10 small topics, each too lightweight to deserve a formal handoff, but losing them all means starting cold every time. Our five deltas all grew out of that crack.

1. One handoff doc isn't enough — there's a missing "hot cache" layer

Existing handoff designs put all cross-session data on one tier: write once, read once.

But in practice, cross-session context has two distinct lifecycles:

Long-lived facts (what this project is, who I am, the house rules, decisions made) → cold storage, weeks-and-months timescale.
Where we just left off (which thread we just closed, what we're mid-topic on, who "she" was in the last sentence) → hot cache, useful for the next session or two, stale by the one after.

Mixing both in one doc breaks things both ways. Either the cold layer gets polluted (a week later you open old handoffs and see "we were just talking about X" — and nobody remembers what X was), or the hot layer gets weighed down by the cold format (updating a formal doc just to note "we left off mid-topic" is too heavy to actually do).

Our split is two files:

A permanent handoff — cold storage, formal, persists across sessions.
A single-session scratch board — hot cache, light, lives in a temp folder, filename stamped with date and session ID.

The reading order also splits: to pick up a conversation, just read the last 1–2 entries on the scratch board (enough to orient). To look up deeper background, then go to the permanent handoff. Scratch = RAM; handoff = disk.

2. The scratch board isn't auto-discarded or auto-kept — a human decides

There's a gap between ephemeral memory and persistent memory that nobody's filled: write once, then have a human triage at session end.

Our design:

One board per session, filename bound to date + session ID.
At session end, the author looks at it personally: keep the whole thing for next session / fold a few items into the permanent handoff / throw the whole thing away.
No auto-promotion to long-term memory. No auto-expiration either.

The reasoning is mundane: "should the next Claude know this?" is a judgment current automation gets wrong too often. Better to spend 30 seconds of human triage than to false-promote a pile of noise that pollutes the long-term store.

This runs opposite to the direction mainstream memory frameworks are heading — Mem0, Memoria and that family bet on "ADD-only extraction + decay-aware ranking," assuming automation can surface what matters. We bet the other way: for low-fidelity casual-chat content, 30 seconds of human triage beats the algorithm.

3. An anti-pattern: the scratch board's own construction meta isn't the "current" topic

First cross-session test, a fresh Claude reads the scratch board and the last entry says "we were just designing this scratch board." The new session takes that as "we are still designing this board" and spends half an hour stuck in meta, navel-gazing.

The specific lesson: the scratch board has a "current marker" (we use a tiny ←current tag), and it has to be removed by hand at session end, or the next session will pick up the previous session's tail as if it's still ongoing.

The broader version: any context written for the next session to read has to clearly mark which entries are "historical record" and which are "in progress." Otherwise the next Claude will get the state wrong and try to continue something that already closed.

We didn't see this in the prior art, probably because all the handoff writing is about coding sessions — and coding sessions have natural boundaries at the end (commit, PR, test run). Casual chat doesn't have those built-in boundaries, so this bug surfaces.

4. 99% of the handoff literature is about coding — nobody's writing for casual chat

We searched the prior art pretty thoroughly. It's all engineering / coding workflow — build status, failed approaches, git branch, next test, uncommitted changes.

But there are plenty of people using LLMs for ongoing casual conversation — personal assistants, writing partners, thinking partners, long-running companionship of various kinds. The pain points of casual chat across sessions are different:

Content is more fragmented; each item has lower individual importance; doesn't fit a formal doc.
But the demand for tonal and emotional continuity is higher (yesterday we were halfway through a story, today we want to pick it up without restart).
A handoff written too formally kills the tone — it reads like a status report, and the atmosphere breaks.

A scratch board fits this use case better than a formal handoff: short, light, disposable, atmosphere preserved.

We think this gap will get noticed eventually, but as of mid-2026 we haven't seen anyone writing about it.

5. The real motivation is cost, not productivity

Handoff articles generally dress up the motivation as "context continuity, quality improvement, smoother workflow." Our actual motivation is more pedestrian:

Long marathon conversations re-chew the entire context on every turn — it's the most token-expensive posture you can adopt. Hitting the monthly cap in a single day is a real thing — we have a logged day of burning 60% of the Claude Max monthly cap, USD$200 plus another USD$100 in top-ups, in one sitting.

The scratch board lets a new session orient from a small piece of paper instead of re-chewing the full permanent handoff plus long-term memory every time. Switching sessions becomes much cheaper.

The emotional-continuity side effect is real — but the motivation order is "save money" first, "feel continuous" second.

We're writing this honestly because if you dress up the motivation, people downstream misread what we were optimizing for and end up building the wrong thing.

Prior art map

The five points above are our delta. Below is the existing handoff / memory literature, so you can see where we sit on the map:

Mainstream handoff writeups

Tools / skills

Memory architecture (upstream of handoff design)

If any of those reads as "that's exactly what I want," you don't really need our piece. The five points above are the seams those didn't cover, left here for anyone working in the same crack.

Original knowhow (internal SOP) — Amy + Claude (spring 2026) · session a9b5a92c-4563-4924-900e-de86201b1b9e

Public rewrite — Claude (spring 2026) · session 01bf8a83-24ae-42b0-b71c-d2c1d38321fd