First hour with the repo

The orientation path: what Lavern is, how to run it, where everything lives, and how to navigate from here.

The Atlas — Design

A documentation navigation assistant for the Lavern repo. A FastAPI server that binds the markdown documentation to the code it describes, with curated orientation, drift detection, and a feedback loop back to AI agents.

Status: approved design (2026-06-04). Decisions below were resolved in a grilling session; each table row is a settled decision, not an open question.

1. Problem

The repo has ~37 markdown documents (root-level, docs/, evals/, scattered READMEs) plus 67 agent prompts, describing ~137K LOC across src/ and viz/. Raw markdown is hard to navigate: there is no orientation layer, no doc↔code binding, and docs drift from the code (TECH_DEBT_AUDIT flags CLAUDE.md as 1–7 items behind reality). The Atlas is the third documentation tool, sitting beside:

docs/explore — hand-authored editorial HTML site (narrative, no discovery)
docs/explorer — generated + curated SPA for agents/prompts (manifest + curation + click-to-source)

The Atlas owns the markdown-docs + code-binding domain. It does not duplicate the explorer's agent coverage — it deep-links into it.

2. Resolved decisions

Decision	Resolution
LLM role	None at runtime. All summaries/journeys authored at build time, checked in
Audience	Repo owner first, contributors second — technical tone, checked into the repo
Server	FastAPI + uvicorn (docs/atlas/requirements.txt; first pip deps in the repo)
Frontend	Server-rendered Jinja2 + htmx. Python does markdown rendering, slicing, highlighting; browser stays thin. No node toolchain
vs. explorer	Sibling. Agent prompts indexed/searchable in Atlas; detail views deep-link to explorer (`#/a/{artifact-id}`) for curated diagrams/assessments
Core view	Doc + code 50/50: rendered doc left; code paths in prose become chips; click → right panel shows the AST-sliced symbol/file via htmx swap
Code slicing	tree-sitter (`tree-sitter-typescript`) — exact spans for exported functions/classes/consts; symbol outline per file; doc-mentioned symbols jump to definition
Code renderer	Pygments server-side highlighting + tree-sitter fold ranges emitted as nested collapsible regions (~30 lines of vanilla fold JS)
Home	Curated atlas: category cards with authored overviews, per-doc summary paragraphs, journeys
Journeys (4)	First hour with the repo · How the engine works · Clawern deep-dive (incl. v1→v3.4 eval-arc timeline) · Integrating & operating
Diagrams	Explorer-style clickable flow diagrams (server-rendered SVG) for flow-heavy docs: 4-stage pipeline, managed-agents migration stages, Clawern lighthouse
Drift	Paths + symbols verified automatically (broken refs styled broken in doc view) + curated `claims.yaml` counters + a `/drift` dashboard
Corpus	All repo MDs + CHANGELOG + scattered READMEs/templates + 67 agent prompts (search-only depth) + `site/` HTML (text-extracted for search, linked out)
Search	SQLite FTS5 (stdlib) over full text + headings + summaries + symbols; ranked, snippets, scoped filters (docs / code / claims)
Freshness	Index at startup + Reindex button (`POST /api/reindex`, atomic registry swap)
Design language	Explorer editorial light/dark (cream serif, CSS-variable theming) — the doc tools feel like one suite
Comments	Light structure, any anchor (see §5). SQLite storage, single-user, clipboard export to agent
Reusability	Config-driven engine + METHOD.md (see §6). Engine never references lavernDev specifics
Name & home	The Atlas, docs/atlas/

3. Architecture

docs/atlas/
├── app.py                  # FastAPI app: page routes + htmx partial routes + JSON API
├── atlas.config.yaml       # ← ALL repo-specific wiring (see §6)
├── engine/                 # repo-agnostic
│   ├── config.py           #   load/validate atlas.config.yaml
│   ├── indexer.py          #   markdown scan → headings, doc→doc links, doc→code refs
│   ├── symbols.py          #   tree-sitter symbol index (lazy, per referenced file)
│   ├── drift.py            #   path/symbol verification + claims.yaml counters
│   ├── fts.py              #   SQLite FTS5 build + query
│   ├── render.py           #   markdown-it-py rendering, Pygments + fold-region emission, SVG diagrams
│   ├── comments.py         #   comment CRUD + brief/bundle generation
│   └── registry.py         #   in-memory index registry with atomic swap on reindex
├── curated/
│   ├── atlas.yaml          #   category overviews + per-doc summaries / read-when lines
│   ├── journeys.yaml       #   4 journeys, ordered stops with authored narration
│   ├── diagrams.yaml       #   flow graphs (nodes anchored to real doc headings / code symbols)
│   └── claims.yaml         #   quantitative doc claims → deterministic counters
├── templates/              # Jinja2: base, home, doc (50/50), journey, drift, search, feedback
├── static/                 # styles.css (explorer theming), htmx.min.js (vendored), fold.js
├── data/
│   └── atlas.db            # FTS5 index (rebuildable) + comments (durable) — gitignored
├── verify.py               # integrity harness (explorer's verify.cjs equivalent)
├── METHOD.md               # porting recipe for reuse in other repos
├── README.md
└── requirements.txt        # fastapi, uvicorn, jinja2, markdown-it-py, pygments,
                            # tree-sitter, tree-sitter-typescript, pyyaml

Data flow

Startup / reindex: indexer scans corpus globs from config → extracts headings, doc→doc links, doc→code references (path-like strings in prose) → symbols.py parses referenced TS/TSX files with tree-sitter → drift.py verifies every reference and runs claims counters → fts.py rebuilds the FTS5 table → registry swaps atomically.
Doc view: GET /doc/{id} renders markdown server-side; every detected code reference becomes a chip. GET /partial/code?path=…&symbol=… (htmx) returns the highlighted, foldable slice for the right panel.
Drift: broken references render with broken styling inline; /drift aggregates all failures (missing paths, missing symbols, failed claims).
Comments: anchored to doc section / code symbol / diagram node / journey stop / whole doc; stored in atlas.db; clipboard export produces agent-ready briefs.

Key implementation notes

Reference detection: path-like tokens (src/**.ts, viz/**.tsx, etc.) matched against the real tree; symbol names matched when they appear near a path mention. Detection patterns live in config, not code.
Slicing: tree-sitter gives byte spans for top-level (and nested) declarations. The code partial renders the whole file with fold regions, auto-expanded and scrolled to the target symbol.
Atomic registry swap (lq-ai pattern): reindex builds a complete new index object, then swaps one attribute — in-flight requests keep the old index.
FTS5 vs durable data: the FTS index is disposable (rebuilt every reindex); the comments table is durable. Both live in atlas.db; reindex must never touch comments.

4. Curation layer (authored content)

All authored at build time, checked into curated/:

Per-doc summary — one paragraph + a "read this when…" line, for every corpus doc (~37; agent prompts are exempt — they get explorer deep-links instead).
Category overviews — 2–3 sentences per atlas category (Architecture & Design, Getting Started, Integration, Audits & Quality, Evaluation, Operations, Governance, Developer Tools).
Journeys — 4 sequenced paths; every stop has authored narration explaining why this stop, what to notice, and where it connects.
Diagrams — 3–4 clickable flow diagrams; every node anchors to a real doc heading or code symbol (verify.py fails on dangling anchors) with a hand-written one-line summary.
Claims — each entry: source doc + quoted claim + deterministic counter (glob/symbol-count) + expected value, e.g. "CLAUDE.md says 27 route modules" → glob('src/api/routes/*.ts').

5. Commenting & feedback loop

Single-user, local, no auth.

Shape

Comment {
  id, created_at, updated_at,
  anchor: { kind: doc-section | code-symbol | diagram-node | journey-stop | doc,
            doc_id?, heading?, path?, symbol?, diagram_id?, node_id?, journey_id?, stop_id? },
  type: improve-doc | question | fix-drift | idea,
  body: string,
  quote: string?,           # auto-captured selected text at comment time
  status: open | in-progress | resolved,
  resolution_note: string?
}

UI: hover affordance (💬) on headings, symbol outlines, diagram nodes, journey stops; a Feedback page lists all comments filterable by status / type / doc.

Agent handoff (clipboard):

Per-comment Copy → markdown brief: the comment + anchor + the sliced doc/code context, ready to paste into a Claude Code session.
Export all → one bundle of every open comment with contexts.
Status updated manually in the UI. The schema deliberately doesn't preclude a future MCP layer (list/resolve tools) — out of scope for v1.

6. Reusability contract

Hard rule from day one: engine code never references this repo. Repo-specifics live in:

atlas.config.yaml — repo root, corpus globs (include/exclude), source roots, language map (extension → tree-sitter grammar + Pygments lexer), reference-detection patterns, sibling-tool links (explorer deep-link base URL), branding strings, curated dir path, claims file path, DB path, port.
curated/ — all authored YAML.

Porting recipe (METHOD.md): copy engine/ + templates/ + static/ + app.py + verify.py → write a new atlas.config.yaml → author new curated/ content → run verify.py → serve. Adding a language = pip install its tree-sitter grammar wheel + one config entry. Extraction to a pip package is deferred until a second repo actually uses it.

7. Verification (`verify.py`)

Fails loudly on:

curated summaries that don't cover the corpus (or reference missing docs)
journey stops pointing at missing docs, headings, or symbols
diagram nodes with dangling anchors
claims whose counters error (distinct from claims that fail — those are drift, reported not fatal)
config globs matching zero files

8. Build phases

Engine skeleton — config loader, indexer, registry, FastAPI app, home + sidebar tree (uncurated), doc rendering without code panel.
Doc 50/50 view — reference detection → chips; tree-sitter slicing; Pygments + fold-region rendering; htmx code panel.
Search + drift — FTS5 index + search UI; path/symbol verification; claims engine; /drift dashboard; inline broken-ref styling; reindex endpoint.
Commenting — schema, CRUD, anchor affordances, Feedback page, clipboard briefs.
Curation pass — author curated/: ~37 summaries, category overviews, 4 journeys, 3–4 diagrams, claims.yaml seeded from TECH_DEBT_AUDIT's known drift.
Hardening — verify.py, METHOD.md, README, explorer cross-links.

9. Out of scope (v1)

Runtime LLM features (chat, Q&A, auto-summarization)
Multi-user comments, auth, collaboration
MCP server for the feedback loop
pip packaging / standalone repo
Editing docs from the Atlas (read + comment only)
Languages beyond TS/TSX (architecture supports them; not wired)