First hour with the repo
The orientation path: what Lavern is, how to run it, where everything lives, and how to navigate from here.
The Atlas — Design
A documentation navigation assistant for the Lavern repo. A FastAPI server that binds the markdown documentation to the code it describes, with curated orientation, drift detection, and a feedback loop back to AI agents.
Status: approved design (2026-06-04). Decisions below were resolved in a grilling session; each table row is a settled decision, not an open question.
1. Problem
The repo has ~37 markdown documents (root-level, docs/, evals/, scattered READMEs) plus
67 agent prompts, describing ~137K LOC across src/ and viz/. Raw markdown is hard to
navigate: there is no orientation layer, no doc↔code binding, and docs drift from the code
(TECH_DEBT_AUDIT flags CLAUDE.md as 1–7 items behind reality). The Atlas is the third
documentation tool, sitting beside:
- docs/explore — hand-authored editorial HTML site (narrative, no discovery)
- docs/explorer — generated + curated SPA for agents/prompts (manifest + curation + click-to-source)
The Atlas owns the markdown-docs + code-binding domain. It does not duplicate the explorer's agent coverage — it deep-links into it.
2. Resolved decisions
| Decision | Resolution |
|---|---|
| LLM role | None at runtime. All summaries/journeys authored at build time, checked in |
| Audience | Repo owner first, contributors second — technical tone, checked into the repo |
| Server | FastAPI + uvicorn (docs/atlas/requirements.txt; first pip deps in the repo) |
| Frontend | Server-rendered Jinja2 + htmx. Python does markdown rendering, slicing, highlighting; browser stays thin. No node toolchain |
| vs. explorer | Sibling. Agent prompts indexed/searchable in Atlas; detail views deep-link to explorer (#/a/{artifact-id}) for curated diagrams/assessments |
| Core view | Doc + code 50/50: rendered doc left; code paths in prose become chips; click → right panel shows the AST-sliced symbol/file via htmx swap |
| Code slicing | tree-sitter (tree-sitter-typescript) — exact spans for exported functions/classes/consts; symbol outline per file; doc-mentioned symbols jump to definition |
| Code renderer | Pygments server-side highlighting + tree-sitter fold ranges emitted as nested collapsible regions (~30 lines of vanilla fold JS) |
| Home | Curated atlas: category cards with authored overviews, per-doc summary paragraphs, journeys |
| Journeys (4) | First hour with the repo · How the engine works · Clawern deep-dive (incl. v1→v3.4 eval-arc timeline) · Integrating & operating |
| Diagrams | Explorer-style clickable flow diagrams (server-rendered SVG) for flow-heavy docs: 4-stage pipeline, managed-agents migration stages, Clawern lighthouse |
| Drift | Paths + symbols verified automatically (broken refs styled broken in doc view) + curated claims.yaml counters + a /drift dashboard |
| Corpus | All repo MDs + CHANGELOG + scattered READMEs/templates + 67 agent prompts (search-only depth) + site/ HTML (text-extracted for search, linked out) |
| Search | SQLite FTS5 (stdlib) over full text + headings + summaries + symbols; ranked, snippets, scoped filters (docs / code / claims) |
| Freshness | Index at startup + Reindex button (POST /api/reindex, atomic registry swap) |
| Design language | Explorer editorial light/dark (cream serif, CSS-variable theming) — the doc tools feel like one suite |
| Comments | Light structure, any anchor (see §5). SQLite storage, single-user, clipboard export to agent |
| Reusability | Config-driven engine + METHOD.md (see §6). Engine never references lavernDev specifics |
| Name & home | The Atlas, docs/atlas/ |
3. Architecture
docs/atlas/
├── app.py # FastAPI app: page routes + htmx partial routes + JSON API
├── atlas.config.yaml # ← ALL repo-specific wiring (see §6)
├── engine/ # repo-agnostic
│ ├── config.py # load/validate atlas.config.yaml
│ ├── indexer.py # markdown scan → headings, doc→doc links, doc→code refs
│ ├── symbols.py # tree-sitter symbol index (lazy, per referenced file)
│ ├── drift.py # path/symbol verification + claims.yaml counters
│ ├── fts.py # SQLite FTS5 build + query
│ ├── render.py # markdown-it-py rendering, Pygments + fold-region emission, SVG diagrams
│ ├── comments.py # comment CRUD + brief/bundle generation
│ └── registry.py # in-memory index registry with atomic swap on reindex
├── curated/
│ ├── atlas.yaml # category overviews + per-doc summaries / read-when lines
│ ├── journeys.yaml # 4 journeys, ordered stops with authored narration
│ ├── diagrams.yaml # flow graphs (nodes anchored to real doc headings / code symbols)
│ └── claims.yaml # quantitative doc claims → deterministic counters
├── templates/ # Jinja2: base, home, doc (50/50), journey, drift, search, feedback
├── static/ # styles.css (explorer theming), htmx.min.js (vendored), fold.js
├── data/
│ └── atlas.db # FTS5 index (rebuildable) + comments (durable) — gitignored
├── verify.py # integrity harness (explorer's verify.cjs equivalent)
├── METHOD.md # porting recipe for reuse in other repos
├── README.md
└── requirements.txt # fastapi, uvicorn, jinja2, markdown-it-py, pygments,
# tree-sitter, tree-sitter-typescript, pyyaml
Data flow
- Startup / reindex: indexer scans corpus globs from config → extracts headings, doc→doc links, doc→code references (path-like strings in prose) → symbols.py parses referenced TS/TSX files with tree-sitter → drift.py verifies every reference and runs claims counters → fts.py rebuilds the FTS5 table → registry swaps atomically.
- Doc view:
GET /doc/{id}renders markdown server-side; every detected code reference becomes a chip.GET /partial/code?path=…&symbol=…(htmx) returns the highlighted, foldable slice for the right panel. - Drift: broken references render with broken styling inline;
/driftaggregates all failures (missing paths, missing symbols, failed claims). - Comments: anchored to doc section / code symbol / diagram node / journey stop /
whole doc; stored in
atlas.db; clipboard export produces agent-ready briefs.
Key implementation notes
- Reference detection: path-like tokens (
src/**.ts,viz/**.tsx, etc.) matched against the real tree; symbol names matched when they appear near a path mention. Detection patterns live in config, not code. - Slicing: tree-sitter gives byte spans for top-level (and nested) declarations. The code partial renders the whole file with fold regions, auto-expanded and scrolled to the target symbol.
- Atomic registry swap (lq-ai pattern): reindex builds a complete new index object, then swaps one attribute — in-flight requests keep the old index.
- FTS5 vs durable data: the FTS index is disposable (rebuilt every reindex); the
comments table is durable. Both live in
atlas.db; reindex must never touch comments.
4. Curation layer (authored content)
All authored at build time, checked into curated/:
- Per-doc summary — one paragraph + a "read this when…" line, for every corpus doc (~37; agent prompts are exempt — they get explorer deep-links instead).
- Category overviews — 2–3 sentences per atlas category (Architecture & Design, Getting Started, Integration, Audits & Quality, Evaluation, Operations, Governance, Developer Tools).
- Journeys — 4 sequenced paths; every stop has authored narration explaining why this stop, what to notice, and where it connects.
- Diagrams — 3–4 clickable flow diagrams; every node anchors to a real doc heading or code symbol (verify.py fails on dangling anchors) with a hand-written one-line summary.
- Claims — each entry: source doc + quoted claim + deterministic counter
(glob/symbol-count) + expected value, e.g. "CLAUDE.md says 27 route modules" →
glob('src/api/routes/*.ts').
5. Commenting & feedback loop
Single-user, local, no auth.
Shape
Comment {
id, created_at, updated_at,
anchor: { kind: doc-section | code-symbol | diagram-node | journey-stop | doc,
doc_id?, heading?, path?, symbol?, diagram_id?, node_id?, journey_id?, stop_id? },
type: improve-doc | question | fix-drift | idea,
body: string,
quote: string?, # auto-captured selected text at comment time
status: open | in-progress | resolved,
resolution_note: string?
}
UI: hover affordance (💬) on headings, symbol outlines, diagram nodes, journey stops; a Feedback page lists all comments filterable by status / type / doc.
Agent handoff (clipboard):
- Per-comment Copy → markdown brief: the comment + anchor + the sliced doc/code context, ready to paste into a Claude Code session.
- Export all → one bundle of every open comment with contexts.
- Status updated manually in the UI. The schema deliberately doesn't preclude a future MCP layer (list/resolve tools) — out of scope for v1.
6. Reusability contract
Hard rule from day one: engine code never references this repo. Repo-specifics live in:
atlas.config.yaml— repo root, corpus globs (include/exclude), source roots, language map (extension → tree-sitter grammar + Pygments lexer), reference-detection patterns, sibling-tool links (explorer deep-link base URL), branding strings, curated dir path, claims file path, DB path, port.curated/— all authored YAML.
Porting recipe (METHOD.md): copy engine/ + templates/ + static/ + app.py +
verify.py → write a new atlas.config.yaml → author new curated/ content → run
verify.py → serve. Adding a language = pip install its tree-sitter grammar wheel + one
config entry. Extraction to a pip package is deferred until a second repo actually uses it.
7. Verification (verify.py)
Fails loudly on:
- curated summaries that don't cover the corpus (or reference missing docs)
- journey stops pointing at missing docs, headings, or symbols
- diagram nodes with dangling anchors
- claims whose counters error (distinct from claims that fail — those are drift, reported not fatal)
- config globs matching zero files
8. Build phases
- Engine skeleton — config loader, indexer, registry, FastAPI app, home + sidebar tree (uncurated), doc rendering without code panel.
- Doc 50/50 view — reference detection → chips; tree-sitter slicing; Pygments + fold-region rendering; htmx code panel.
- Search + drift — FTS5 index + search UI; path/symbol verification; claims engine;
/driftdashboard; inline broken-ref styling; reindex endpoint. - Commenting — schema, CRUD, anchor affordances, Feedback page, clipboard briefs.
- Curation pass — author
curated/: ~37 summaries, category overviews, 4 journeys, 3–4 diagrams, claims.yaml seeded from TECH_DEBT_AUDIT's known drift. - Hardening — verify.py, METHOD.md, README, explorer cross-links.
9. Out of scope (v1)
- Runtime LLM features (chat, Q&A, auto-summarization)
- Multi-user comments, auth, collaboration
- MCP server for the feedback loop
- pip packaging / standalone repo
- Editing docs from the Atlas (read + comment only)
- Languages beyond TS/TSX (architecture supports them; not wired)