The Atlas Lavern's documentation, bound to its code
111 documents

First hour with the repo

The orientation path: what Lavern is, how to run it, where everything lives, and how to navigate from here.

The Atlas — Design

A documentation navigation assistant for the Lavern repo. A FastAPI server that binds the markdown documentation to the code it describes, with curated orientation, drift detection, and a feedback loop back to AI agents.

Status: approved design (2026-06-04). Decisions below were resolved in a grilling session; each table row is a settled decision, not an open question.


1. Problem

The repo has ~37 markdown documents (root-level, docs/, evals/, scattered READMEs) plus 67 agent prompts, describing ~137K LOC across src/ and viz/. Raw markdown is hard to navigate: there is no orientation layer, no doc↔code binding, and docs drift from the code (TECH_DEBT_AUDIT flags CLAUDE.md as 1–7 items behind reality). The Atlas is the third documentation tool, sitting beside:

  • docs/explore — hand-authored editorial HTML site (narrative, no discovery)
  • docs/explorer — generated + curated SPA for agents/prompts (manifest + curation + click-to-source)

The Atlas owns the markdown-docs + code-binding domain. It does not duplicate the explorer's agent coverage — it deep-links into it.

2. Resolved decisions

Decision Resolution
LLM role None at runtime. All summaries/journeys authored at build time, checked in
Audience Repo owner first, contributors second — technical tone, checked into the repo
Server FastAPI + uvicorn (docs/atlas/requirements.txt; first pip deps in the repo)
Frontend Server-rendered Jinja2 + htmx. Python does markdown rendering, slicing, highlighting; browser stays thin. No node toolchain
vs. explorer Sibling. Agent prompts indexed/searchable in Atlas; detail views deep-link to explorer (#/a/{artifact-id}) for curated diagrams/assessments
Core view Doc + code 50/50: rendered doc left; code paths in prose become chips; click → right panel shows the AST-sliced symbol/file via htmx swap
Code slicing tree-sitter (tree-sitter-typescript) — exact spans for exported functions/classes/consts; symbol outline per file; doc-mentioned symbols jump to definition
Code renderer Pygments server-side highlighting + tree-sitter fold ranges emitted as nested collapsible regions (~30 lines of vanilla fold JS)
Home Curated atlas: category cards with authored overviews, per-doc summary paragraphs, journeys
Journeys (4) First hour with the repo · How the engine works · Clawern deep-dive (incl. v1→v3.4 eval-arc timeline) · Integrating & operating
Diagrams Explorer-style clickable flow diagrams (server-rendered SVG) for flow-heavy docs: 4-stage pipeline, managed-agents migration stages, Clawern lighthouse
Drift Paths + symbols verified automatically (broken refs styled broken in doc view) + curated claims.yaml counters + a /drift dashboard
Corpus All repo MDs + CHANGELOG + scattered READMEs/templates + 67 agent prompts (search-only depth) + site/ HTML (text-extracted for search, linked out)
Search SQLite FTS5 (stdlib) over full text + headings + summaries + symbols; ranked, snippets, scoped filters (docs / code / claims)
Freshness Index at startup + Reindex button (POST /api/reindex, atomic registry swap)
Design language Explorer editorial light/dark (cream serif, CSS-variable theming) — the doc tools feel like one suite
Comments Light structure, any anchor (see §5). SQLite storage, single-user, clipboard export to agent
Reusability Config-driven engine + METHOD.md (see §6). Engine never references lavernDev specifics
Name & home The Atlas, docs/atlas/

3. Architecture

docs/atlas/
├── app.py                  # FastAPI app: page routes + htmx partial routes + JSON API
├── atlas.config.yaml       # ← ALL repo-specific wiring (see §6)
├── engine/                 # repo-agnostic
│   ├── config.py           #   load/validate atlas.config.yaml
│   ├── indexer.py          #   markdown scan → headings, doc→doc links, doc→code refs
│   ├── symbols.py          #   tree-sitter symbol index (lazy, per referenced file)
│   ├── drift.py            #   path/symbol verification + claims.yaml counters
│   ├── fts.py              #   SQLite FTS5 build + query
│   ├── render.py           #   markdown-it-py rendering, Pygments + fold-region emission, SVG diagrams
│   ├── comments.py         #   comment CRUD + brief/bundle generation
│   └── registry.py         #   in-memory index registry with atomic swap on reindex
├── curated/
│   ├── atlas.yaml          #   category overviews + per-doc summaries / read-when lines
│   ├── journeys.yaml       #   4 journeys, ordered stops with authored narration
│   ├── diagrams.yaml       #   flow graphs (nodes anchored to real doc headings / code symbols)
│   └── claims.yaml         #   quantitative doc claims → deterministic counters
├── templates/              # Jinja2: base, home, doc (50/50), journey, drift, search, feedback
├── static/                 # styles.css (explorer theming), htmx.min.js (vendored), fold.js
├── data/
│   └── atlas.db            # FTS5 index (rebuildable) + comments (durable) — gitignored
├── verify.py               # integrity harness (explorer's verify.cjs equivalent)
├── METHOD.md               # porting recipe for reuse in other repos
├── README.md
└── requirements.txt        # fastapi, uvicorn, jinja2, markdown-it-py, pygments,
                            # tree-sitter, tree-sitter-typescript, pyyaml

Data flow

  1. Startup / reindex: indexer scans corpus globs from config → extracts headings, doc→doc links, doc→code references (path-like strings in prose) → symbols.py parses referenced TS/TSX files with tree-sitter → drift.py verifies every reference and runs claims counters → fts.py rebuilds the FTS5 table → registry swaps atomically.
  2. Doc view: GET /doc/{id} renders markdown server-side; every detected code reference becomes a chip. GET /partial/code?path=…&symbol=… (htmx) returns the highlighted, foldable slice for the right panel.
  3. Drift: broken references render with broken styling inline; /drift aggregates all failures (missing paths, missing symbols, failed claims).
  4. Comments: anchored to doc section / code symbol / diagram node / journey stop / whole doc; stored in atlas.db; clipboard export produces agent-ready briefs.

Key implementation notes

  • Reference detection: path-like tokens (src/**.ts, viz/**.tsx, etc.) matched against the real tree; symbol names matched when they appear near a path mention. Detection patterns live in config, not code.
  • Slicing: tree-sitter gives byte spans for top-level (and nested) declarations. The code partial renders the whole file with fold regions, auto-expanded and scrolled to the target symbol.
  • Atomic registry swap (lq-ai pattern): reindex builds a complete new index object, then swaps one attribute — in-flight requests keep the old index.
  • FTS5 vs durable data: the FTS index is disposable (rebuilt every reindex); the comments table is durable. Both live in atlas.db; reindex must never touch comments.

4. Curation layer (authored content)

All authored at build time, checked into curated/:

  • Per-doc summary — one paragraph + a "read this when…" line, for every corpus doc (~37; agent prompts are exempt — they get explorer deep-links instead).
  • Category overviews — 2–3 sentences per atlas category (Architecture & Design, Getting Started, Integration, Audits & Quality, Evaluation, Operations, Governance, Developer Tools).
  • Journeys — 4 sequenced paths; every stop has authored narration explaining why this stop, what to notice, and where it connects.
  • Diagrams — 3–4 clickable flow diagrams; every node anchors to a real doc heading or code symbol (verify.py fails on dangling anchors) with a hand-written one-line summary.
  • Claims — each entry: source doc + quoted claim + deterministic counter (glob/symbol-count) + expected value, e.g. "CLAUDE.md says 27 route modules" → glob('src/api/routes/*.ts').

5. Commenting & feedback loop

Single-user, local, no auth.

Shape

Comment {
  id, created_at, updated_at,
  anchor: { kind: doc-section | code-symbol | diagram-node | journey-stop | doc,
            doc_id?, heading?, path?, symbol?, diagram_id?, node_id?, journey_id?, stop_id? },
  type: improve-doc | question | fix-drift | idea,
  body: string,
  quote: string?,           # auto-captured selected text at comment time
  status: open | in-progress | resolved,
  resolution_note: string?
}

UI: hover affordance (💬) on headings, symbol outlines, diagram nodes, journey stops; a Feedback page lists all comments filterable by status / type / doc.

Agent handoff (clipboard):

  • Per-comment Copy → markdown brief: the comment + anchor + the sliced doc/code context, ready to paste into a Claude Code session.
  • Export all → one bundle of every open comment with contexts.
  • Status updated manually in the UI. The schema deliberately doesn't preclude a future MCP layer (list/resolve tools) — out of scope for v1.

6. Reusability contract

Hard rule from day one: engine code never references this repo. Repo-specifics live in:

  • atlas.config.yaml — repo root, corpus globs (include/exclude), source roots, language map (extension → tree-sitter grammar + Pygments lexer), reference-detection patterns, sibling-tool links (explorer deep-link base URL), branding strings, curated dir path, claims file path, DB path, port.
  • curated/ — all authored YAML.

Porting recipe (METHOD.md): copy engine/ + templates/ + static/ + app.py + verify.py → write a new atlas.config.yaml → author new curated/ content → run verify.py → serve. Adding a language = pip install its tree-sitter grammar wheel + one config entry. Extraction to a pip package is deferred until a second repo actually uses it.

7. Verification (verify.py)

Fails loudly on:

  • curated summaries that don't cover the corpus (or reference missing docs)
  • journey stops pointing at missing docs, headings, or symbols
  • diagram nodes with dangling anchors
  • claims whose counters error (distinct from claims that fail — those are drift, reported not fatal)
  • config globs matching zero files

8. Build phases

  1. Engine skeleton — config loader, indexer, registry, FastAPI app, home + sidebar tree (uncurated), doc rendering without code panel.
  2. Doc 50/50 view — reference detection → chips; tree-sitter slicing; Pygments + fold-region rendering; htmx code panel.
  3. Search + drift — FTS5 index + search UI; path/symbol verification; claims engine; /drift dashboard; inline broken-ref styling; reindex endpoint.
  4. Commenting — schema, CRUD, anchor affordances, Feedback page, clipboard briefs.
  5. Curation pass — author curated/: ~37 summaries, category overviews, 4 journeys, 3–4 diagrams, claims.yaml seeded from TECH_DEBT_AUDIT's known drift.
  6. Hardening — verify.py, METHOD.md, README, explorer cross-links.

9. Out of scope (v1)

  • Runtime LLM features (chat, Q&A, auto-summarization)
  • Multi-user comments, auth, collaboration
  • MCP server for the feedback loop
  • pip packaging / standalone repo
  • Editing docs from the Atlas (read + comment only)
  • Languages beyond TS/TSX (architecture supports them; not wired)