The Atlas Lavern's documentation, bound to its code
111 documents

Integrating & operating

The system's edges: providers and tools, the API surface, the remote bridge, datasets, and the two live migration roadmaps.

Lavern → Azure migration research

Scope. Lavern dashboard, API, knowledge base, sessions, matters, auth, billing, and audit data move to Azure. Clawern stays on the client — it remains a local Node.js daemon on the user's machine, optionally pointing at the cloud KB for shared firm precedents. This document is the research artifact behind docs/explore/azure-migration.html.

Repository: github.com/Altien/lavernDev · Main branch: main


1. Constraints fixed up-front

These are inputs to the design, not findings.

Decision Value Rationale
Vectors Postgres pgvector owns the vector index (in-DB) Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (tsvector BM25 + vector) in one database.
Compute Azure Container Apps for API + workers Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings.
Identity Entra ID (workforce / B2B) Org-tenant aware. Works with conditional access, no separate user store, MFA from the tenant.
Database Azure Database for PostgreSQL Flexible Server (with pgvector) Schema-compatible target for the current SQLite tables, mature, and hosts the KB vector index in-DB via the allow-listed pgvector extension.
Async messaging Azure Service Bus for guaranteed-delivery work, Event Grid for blob/event fan-out Service Bus is the right primitive for cross-pod orchestration signals; Event Grid is right for "blob landed" triggers.
Region Single primary region for v1; design for multi-region but defer Cost and latency picked over DR until usage warrants it.
LLM provider Anthropic direct (unchanged); Mistral path preserved Chat/agent inference stays on Anthropic. Azure OpenAI is used only for KB embeddings (text-embedding-3), never for agent reasoning.

2. Target architecture (one picture, in words)

Public traffic enters via Azure Front Door (TLS termination, WAF, global anycast). Front Door fronts a Container Apps environment running:

  • lavern-api revision — Fastify HTTP + WebSocket server, the same one in src/api/server.ts.
  • lavern-worker revision — long-running orchestration worker that pulls Service Bus messages and runs workflows (today this happens in the API process; we split it).
  • lavern-jobs (KEDA-scaled job) — short-lived bursts (KB ingest, derivative rendering, large workflow runs).

Behind them, all on the same VNET with Private Endpoints:

  • PostgreSQL Flexible Server for relational data (src/db/database.ts).
  • Postgres pgvector holds the KB vector index alongside the relational tables — the SQLite FTS5 + n-gram re-rank in src/knowledge-base/retriever.ts becomes a Postgres hybrid query (tsvector BM25 + pgvector cosine).
  • Azure Blob Storage for user-generated content (uploads, deliverables, audit bundles, derivatives, log archive).
  • Azure Service Bus for cross-instance events and work scheduling.
  • Azure Key Vault for any non-managed secrets (Anthropic API keys, Azure OpenAI embeddings key, Stripe webhook secrets, SMTP credentials).
  • Application Insights + Log Analytics for logs, traces, metrics.
  • Azure Cache for Redis (optional, v2) — only if we move session state off in-process.

Egress to the public internet (Anthropic, Mistral, Stripe, Google OAuth where used) goes through Container Apps' managed egress.


3. Compute — Azure Container Apps

Why Container Apps over the alternatives

Option Why it wins Why it loses
Container Apps Container-native, long-lived WebSocket OK, KEDA scale incl. scale-to-zero on non-WS revisions, Dapr if we ever want it, managed envoy ingress. Cheapest target for a Fastify image. No fine-grained sidecar control, less mature than AKS.
App Service for Containers Mature, simple. Scale-to-zero awkward; per-instance sticky needed for WebSocket; KEDA not native.
AKS Maximum control. Operationally heavy; we don't need cluster-level features yet.

How the current process splits

The current single Fastify process does several things:

  1. Serve HTTP + WebSocket (src/api/server.ts, src/api/ws-handler.ts)
  2. Hold live SessionState in a Map (src/session/session-manager.ts)
  3. Run the orchestration loop (src/orchestrator.ts, src/dispatch.ts)
  4. Dispatch agents (Claude Agent SDK calls)
  5. Emit events on the in-process EventBus (src/events/event-bus.ts)
  6. Archive completed sessions to SQLite

For Azure, we split into three revisions:

Revision Replicas Scales on Holds
lavern-api min 2, max N HTTP / WS concurrency Session router + WebSocket connections + REST routes
lavern-worker min 1, max M Service Bus queue depth Active session orchestration loops
lavern-jobs 0..K KEDA (queue length) One-shot tasks (KB ingest, large workflow batches, derivatives)

This is the moment to make the orchestrator stateless across pods. See §6 (sessions).

WebSocket considerations

@fastify/websocket works inside Container Apps without modification. Two choices:

  1. Sticky routing + in-pod session state. Front Door affinity cookie, ARR-style. Simple, but limits horizontal scaling and disallows transparent rollouts mid-session.
  2. Connection broker (Azure Web PubSub). Browsers connect to Web PubSub directly; the API publishes events to a per-session topic. Pods are stateless re: clients; rollouts are clean. The current ws-handler.ts becomes an event publisher.

Recommendation: Start with sticky routing (option 1) on day 1 — keeps the migration small. Migrate to Azure Web PubSub when we hit horizontal scale or zero-downtime requirements that sticky can't satisfy.


4. Relational data — Postgres Flexible Server

What moves

Every table currently defined in src/db/database.ts:

SQLite table Notes for Postgres port
users UUID PKs (already TEXT in SQLite); add citext for email.
auth_tokens Becomes mostly redundant under Entra ID. Keep for service-to-service tokens only.
session_archive Direct port. summary_json becomes jsonb. Add GIN index for query.
matters Direct port. data_json becomes jsonb.
api_clients Direct port.
kb_collections, kb_documents Direct port. kb_documents stays the document-of-record.
kb_chunks Keep. Add a content_tsv tsvector column (lexical/FTS) and an embedding vector(1536) column (pgvector). Chunks + their vectors live in Postgres. See §6.
shared_agents, shared_teams Direct port. profile_jsonjsonb.
user_usage, billing_events, billable_hours, daily_spend Direct port. Money columns become numeric(12,4).
audit_log Direct port. Consider partitioning by month at scale.
waitlist Direct port. Probably retire when out of waitlist.
user_tokens OAuth refresh tokens — keep, or replace entirely with Entra-issued tokens.

SQLite-isms that need attention

  • TEXT NOT NULL everywhere — convert to text NOT NULL. Postgres is strict about type; SQLite isn't.
  • JSON columns stored as TEXT today (data_json, summary_json, profile_json, metadata) — promote to jsonb and add GIN indexes where we actually query into them.
  • No date type in SQLite — current created_at TEXT columns become timestamptz. Migration must parse ISO strings.
  • REFERENCES users(id) — keep, but SQLite is permissive about NULL FKs (anonymous sessions); Postgres needs ON DELETE SET NULL where the relationship is genuinely optional.
  • FTS5 virtual table — gone. The KB search path leaves Postgres entirely (see §6).
  • Concurrencybetter-sqlite3 is synchronous; Postgres driver (pg / postgres-js) is async. The DB layer needs an await-everywhere pass. This is the biggest mechanical change in the migration.

Migration mechanics

  • Use pgloader for the bulk move (handles SQLite quirks well), then run a TypeScript script for the JSON-column transforms (textjsonb).
  • Run dual-write for a window if we need zero-downtime cutover; otherwise a short maintenance window is honest and far simpler.
  • Connection pooling via PgBouncer (Flexible Server has a built-in pooler endpoint).
  • Use Managed Identity from Container Apps to authenticate to Postgres — no password in the connection string.

Why not Hyperscale (Citus) or Azure SQL

  • Hyperscale (Citus) is overkill until we shard. Flexible Server scales vertically well into the hundreds of GB and tens of thousands of TPS, which is well past where we will need a redesign for unrelated reasons.
  • Azure SQL would mean rewriting the schema for T-SQL idioms (no jsonb, different default semantics). Postgres is closer to SQLite in spirit.

5. Blob Storage — what becomes blob-native vs stays code-resident

This is the question that needs the most defending. The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that.

Recommendation: split by content authority

Stays code-resident (versioned in git, shipped with the container image):

  • src/agents/ — the 67 agent prompts. They are the product; they belong with the code that calls them.
  • src/workflows/templates/ — the 9 workflow definitions. Same argument.
  • SOUL.md — the default firm soul. Per-user override already comes from the database; the default is a code asset.
  • src/mcp/tools/ — MCP tool implementations.

Why not Blob?

  1. Change control. Today, modifying an agent prompt requires a pull request, reviewed and merged. Moving them to Blob means anyone with Blob write can change agent behaviour silently, in production, without leaving a trace in source control.
  2. Atomic deploys. Code + prompts ship together. If a prompt change breaks a workflow, you roll back one artifact. With Blob-resident prompts, the prompt version and the code version drift.
  3. Local development. Devs can run the engine on their machine without provisioning Blob, importing seed data, or syncing with prod.
  4. Test fixtures. The existing test suite (1,677 tests, tests/) reads prompts directly from disk. Moving them to Blob means mocking Blob in tests — pure overhead.

Goes to Blob (user-generated, runtime-mutable):

Category Container Lifecycle
Uploaded source documents lavern-uploads/<user>/<sessionId>/ Soft-delete after session archive; hard-delete after retention window.
Final deliverables lavern-deliveries/<user>/<sessionId>/ Retained per matter retention policy.
Audit bundles lavern-audit/<user>/<sessionId>/ Retain per the user's data class (regulated → cold tier, 7y).
KB document originals lavern-kb-originals/<user>/<collectionId>/ Until KB doc is deleted. The chunks + vectors live in Postgres; the original file lives in Blob.
Derivatives (HTML/DOCX/PDF renders) lavern-derivatives/<user>/<sessionId>/ Soft-delete after 30 days; users can regenerate.
Log archives lavern-logs/ Cool tier after 30d; archive tier after 180d.
Custom agent avatars / share OG images lavern-public/ Public read; the only public container.

Files that currently write to disk and need to move

From a grep over src/ (writeFileSync / mkdirSync):

File What it writes today Azure target
src/utils/logger.ts Rotating log files with gzip Application Insights (live), Blob lavern-logs/ (archive).
src/utils/audit-persistence.ts Audit log JSONL Blob lavern-audit/; query via Log Analytics if we want it searchable.
src/workflows/executor.ts Intermediate artefacts Blob lavern-deliveries/<sessionId>/intermediate/.
src/mcp/tools/baselines.ts Baseline JSON Blob lavern-baselines/, keyed by user.
src/mcp/tools/report-card.ts Final report Blob lavern-deliveries/.
src/mcp/tools/legal-md-compiler.ts Compiled markdown Blob lavern-deliveries/.
src/providers/mistral-executor.ts Provider artefacts Blob lavern-deliveries/.
src/providers/local-executor.ts Local-pipeline artefacts Blob lavern-deliveries/.
src/api/middleware/validation.ts Captured invalid-request bodies Application Insights (debug) only — no Blob.

Access pattern

Container Apps uses Managed Identity with a Storage Blob Data Contributor role on the containers it needs. No connection strings, no shared keys.

Front-end direct upload via user-delegation SAS issued by the API after authorisation — the file never traverses the API process for the upload, only for orchestration.


6. Vector + search — Postgres pgvector

Decision (supersedes the original v1 draft of this section). Vectors live in Postgres via pgvector, not Azure AI Search. This keeps KB retrieval in one store with the relational data, removes the AI Search service (and its ~$75/mo floor), is genuine feature-parity with today's lexical KB, and adds semantic recall on top. pgvector is an allow-listed extension on Flexible Server (azure.extensions = VECTOR).

Where the KB lives

The pipeline keeps its shape: src/knowledge-base/indexer.ts parses and section-chunks; src/knowledge-base/retriever.ts runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector column; a new pgvector column adds the semantic half. No separate search service.

kb_chunks schema (Postgres)

Column Type Notes
id text (PK) <documentId>_<chunkIndex>.
document_id text (FK → kb_documents)
collection_id text (indexed) Filter.
user_id text (indexed) Security-trim every query.
heading text Section heading.
content text Full chunk text.
content_tsv tsvector (GIN index) Lexical half — generated from heading || ' ' || content.
embedding vector(1536) (HNSW index) Semantic half — Azure OpenAI text-embedding-3-small.
doc_type, jurisdiction text (indexed) Filters.
level, word_count int

Embedding model — Azure OpenAI text-embedding-3 (decided)

This is the one place Azure OpenAI enters the architecture; agent reasoning stays on Anthropic (§13). text-embedding-3-small (1,536-dim) is the default cost/quality pick; text-embedding-3-large (up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint sits behind a Private Endpoint; authenticate via Managed Identity (preferred) or a Key Vault–held key.

Indexing pipeline

Keep our section-aware chunker — the structure detector in src/documents/structure-detector.ts does real work for legal documents and a generic chunker would degrade citations. On blob upload, Event Grid → Service Bus → KEDA-scaled lavern-jobs run:

  1. Parse + section-chunk (existing code).
  2. Call Azure OpenAI embeddings for each chunk (batched).
  3. INSERT chunk rows into Postgres; content_tsv is DB-generated, embedding carries the vector.

Hybrid query

One Postgres query fuses lexical + vector. Lexical rank via ts_rank_cd over content_tsv; semantic via embedding <=> $queryVec (cosine); combined with Reciprocal Rank Fusion:

WITH lex AS (
  SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
  FROM kb_chunks
  WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
  ORDER BY s DESC LIMIT 40
),
vec AS (
  SELECT id, 1 - (embedding <=> $3::vector) AS s
  FROM kb_chunks
  WHERE user_id = $2
  ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k
SELECT id FROM lex
UNION
SELECT id FROM vec;

user_id is filtered in every branch — the user-scoping guarantee in retriever.ts is preserved. The existing legal-synonym expansion stays as pre-processing on the lexical side; the n-gram re-rank is superseded by the vector half.

Scale note

pgvector with an HNSW index serves into the millions of vectors before tuning pressure — well beyond a single-region Lavern KB. If the corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1 requirement.


7. Async messaging — Service Bus + Event Grid

Where Service Bus replaces the in-process EventBus

The current EventBus in src/events/event-bus.ts is a Node.js EventEmitter. Two consumer classes today:

  1. WebSocket fan-out — clients subscribed to /api/sessions/:id/events receive every event for that session.
  2. In-process logger / audit / cost tracker.

The EventBus stays exactly as it is inside a single pod. We add an outbound bridge for events that need to cross pods.

Event type Crosses pods? Goes to Service Bus?
session_start, session_end Yes (worker → api) Yes — topic lavern-sessions
agent_start, agent_stop Maybe (if WS pod ≠ worker pod) Yes — topic lavern-agent-activity
finding_posted, challenge_posted, response_posted Yes Yes
debate_resolved Yes Yes
gate_requested, gate_decided Yes — gate decided in api, worker waiting Yes — queue (point-to-point) lavern-gates
verification_run No (worker-local) No
tool_used No No (App Insights metric instead)

Topology: Service Bus topic lavern-events with subscriptions per consumer (WebSocket fan-out, audit, cost). Service Bus queue lavern-gates for gate-decision request/reply, because the worker is genuinely blocked on it.

Where Event Grid fits

  • Blob upload triggers: user uploads a KB document → Event Grid → Service Bus → lavern-jobs KEDA-scales and runs the indexer.
  • Same for derivatives: user requests a DOCX render → enqueue → job runs → blob lands → Event Grid → API tells the client via WS.
  • Stripe webhooks route through Event Grid too if we want guaranteed retry and dead-lettering.

Why both, not just one

Service Bus is the right answer for guaranteed-delivery work scheduling. Event Grid is the right answer for fan-out at high event rates with cheap-per-event pricing. They compose — Event Grid feeds Service Bus for our heavy patterns.


8. Sessions — the hardest design call

Live sessions are an in-process Map today in src/session/session-manager.ts. TTL eviction, 100-session cap per pod. When the engagement completes, state is archived to SQLite.

When we split api / worker / jobs, sessions need a home that isn't "the api pod that happened to handle the first request."

Three options, in increasing strangeness

Option A — Sticky to api pod (no shared state). Affinity cookie binds a client to one api pod for the life of the engagement. The worker that runs orchestration is that same pod. Simplest; we keep SessionState exactly as it is.

  • Pro: zero code change.
  • Con: doesn't horizontally scale per session; api revision can't recycle until session expires; loses one core benefit of splitting the worker out.

Option B — Worker owns the session; api is stateless. Each engagement lives in a single lavern-worker replica. The api pod that has the WebSocket subscribes to a Service Bus topic filtered by sessionId. SessionState exists only in the worker.

  • Pro: clean separation; api is fully stateless and scales freely.
  • Con: gate-decision path is async — api receives the decision and posts to Service Bus, worker picks it up. We already model this with src/gates/gate-resolver.ts, so the lift is bounded.

Option C — Externalised SessionState (Redis). Move the SessionState object into Azure Cache for Redis. Any pod can pick up any session.

  • Pro: maximum elasticity.
  • Con: serialisation cost on every read/write; the SessionState is currently a rich object graph not designed for this.

Recommendation: Option B — worker owns the session, api is stateless. This is the cleanest target architecture and the gate-decision asynchrony is already most of the way there.

Drop the in-process Map's role of "the live store"; replace it with a per-pod working set scoped to that worker's active sessions. Archive on completion goes to Postgres just as today.


9. Identity — Entra ID workforce / B2B

What changes in the codebase

Currently src/api/middleware/auth.ts handles cookie + Bearer auth and runs a no-op LOCAL-MODE. The auth-shaped routes (auth-routes.ts, google-auth.ts) only register when LAVERN_AUTH_ENABLED=true.

For Entra ID:

  1. Replace the cookie/Bearer middleware with MSAL.js on the frontend (auth-code flow with PKCE) and passport-azure-ad or @azure/identity + JWT validation on the backend.
  2. The users table sticks around as the application-side mirror of Entra users — keyed by Entra oid, populated on first sign-in. We still need it because Lavern's user_usage, billing_events, matters, kb_collections all FK to users.id.
  3. Authorisation (what a user can do) stays application-side — Entra tells us who, Lavern decides what. App-roles in Entra optional; we can also keep our own roles table.
  4. Service-to-service authentication uses Managed Identity end-to-end — Container Apps to Postgres, to Blob, to Azure OpenAI (embeddings), to Service Bus, to Key Vault. Zero secrets in any environment variable.

Edge cases

  • API clients (api_clients table) — these are non-human callers (webhooks, integrations). Keep the existing API-key model; rotate via Key Vault references. Don't try to put non-humans into Entra unless they belong there.
  • OAuth refresh tokens (user_tokens) — if we still integrate Google Drive / Gmail / Calendar for the user, those stay on the existing OAuth flow; Entra is identity, not third-party authorisation.
  • Stripe customer mappingusers.stripe_customer_id survives unchanged.

10. Secrets — Key Vault

Today's secret surface from a grep over process.env:

Secret Source today Azure target
Anthropic API key ANTHROPIC_API_KEY env Key Vault, referenced by Container Apps secret.
Mistral API key MISTRAL_API_KEY env Key Vault.
Stripe secret key, webhook secret env Key Vault.
SMTP credentials env Key Vault.
Database connection conn string in env Managed Identity — no string at all.
Azure OpenAI embeddings key n/a (new) Managed Identity to AOAI — or a Key Vault–held key if MI isn't available.
Telegram bot token (Clawern) env on user's machine Stays on user's machine; not migrated.
Storage account keys n/a Managed Identity.
LAVERN_MANAGED_AGENTS_BRIDGE_SECRET env Key Vault.

Container Apps references Key Vault secrets by reference, so rotations propagate to running revisions without redeploys.


11. Observability — Application Insights + Log Analytics

Today: src/utils/logger.ts writes structured JSON to stdout and a rotating local file. src/utils/sentry.ts captures errors to Sentry if configured.

Azure target:

  • Application Insights SDK initialised at process start. The current createLogger becomes a thin adapter that emits structured trace events to App Insights.
  • OpenTelemetry-based instrumentation for Fastify, http client, SDK calls. App Insights connector ingests.
  • Custom events for product analytics: session_started, workflow_completed, verification_failed — fed off the EventBus.
  • Log Analytics workspace behind App Insights for KQL queries and dashboards.
  • Sentry stays as the error capture channel if we want richer crash grouping than App Insights provides; otherwise drop it.

Cost note: App Insights ingestion is the most variable cost in this whole architecture. Set ingestion sampling at 50% for verbose levels, 100% for warnings/errors. Disable in dev.


12. Networking — Front Door + Private Endpoints

  • Azure Front Door Premium in front of Container Apps for TLS termination, WAF, global anycast, and managed certs on the custom domain.
  • Private Endpoints for Postgres, Azure OpenAI, Blob, Key Vault. None of these are reachable from the public internet.
  • VNET integration on Container Apps — outbound goes through the VNET; inbound is via the Container Apps ingress, which Front Door fronts.
  • WebSocket through Front Door works without modification.
  • Egress allow-list: api.anthropic.com, api.mistral.ai, api.stripe.com, accounts.google.com if we keep Google OAuth. Everything else blocked at the firewall.

13. LLM provider plane

Stays on Anthropic direct via @anthropic-ai/sdk. The dual-provider abstraction in src/providers/ keeps Mistral as the EU-sovereign option.

Azure OpenAI — embeddings only. AOAI is not added to the agent provider set; the model behind AOAI is OpenAI's, not Anthropic's, and swapping Claude for GPT would undercut the product. The single use of AOAI is generating KB chunk embeddings for pgvector (§6) — a non-reasoning, mechanical call. Agent inference stays on Anthropic; Mistral remains the EU-sovereign agent option. If a future customer requires AOAI for inference specifically, the provider abstraction is already in place to add it.


14. CI/CD & infrastructure-as-code

  • Bicep (or Terraform if there's a house preference) for everything: Container Apps environment, registry, Postgres (with pgvector enabled via azure.extensions), Azure OpenAI (embeddings deployment), Blob, Service Bus, Key Vault, Front Door, App Insights, Log Analytics, networking.
  • GitHub Actions building the container, pushing to Azure Container Registry, deploying via az containerapp update --revision-suffix <sha> for blue-green.
  • Per-environment: dev (single small SKU of everything), staging (mirror prod), prod.
  • Database migrations: a separate lavern-migrate job that runs Postgres migrations via drizzle-kit or node-pg-migrate before the api revision swaps.
  • Secret rotation: GitHub Actions OIDC federation to Azure — no service principal passwords in repo.

15. Cost shape (rough OOM)

Per month, single region, modest load (say, ~50 active firms, ~5,000 documents indexed):

Service SKU Est. monthly
Container Apps (api+worker+jobs) 2 always-on small + bursts $200–400
Postgres Flexible Server (+ pgvector) D2ds_v5 (2vCPU/8GB) — bumped for vector index $150
Azure OpenAI embeddings text-embedding-3-small, usage-based $5–20
Blob Storage Hot tier ~500GB + ops $30
Service Bus Standard tier $10
Front Door Premium Base + per-request $330 + traffic
Key Vault Standard <$5
App Insights 5GB/day cap $80
Total ~$810–1,075

Keeping vectors in pgvector removes the Azure AI Search line (was ~$250/mo S1) at the cost of a modest Postgres SKU bump for the HNSW index — a net saving of roughly $200–230/mo. Agent LLM spend lives separately (passes through to Anthropic / Mistral usage costs); embeddings are the only Azure-side model cost and are tiny.

Cost levers:

  • Drop Front Door to Standard if WAF is overkill (-$200).
  • Move audit + log archive to cool/archive tiers (-$10–20).
  • Right-size Postgres up only when we see actual pressure; the vector index is the main memory driver.

16. Migration plan — three phases

Phase 0 — preparation (1–2 weeks)

  • Add an explicit persistence interface to src/db/ so the engine doesn't talk to better-sqlite3 directly. SQLite and Postgres adapters behind the same shape.
  • Add a blob abstraction for file IO: an interface that today wraps the local FS, tomorrow wraps @azure/storage-blob.
  • Add a search abstraction wrapping the FTS5 layer; tomorrow wraps the Postgres tsvector + pgvector hybrid query.
  • These three abstractions land in the existing repo, behind feature flags. No Azure infra needed yet.

Phase 1 — Azure infra up, dual-write (2–3 weeks)

  • Bicep stack deployed to a dev subscription.
  • Container Apps running the api image; Postgres (with pgvector) live but with sample data only.
  • Migration scripts to backfill from SQLite to Postgres in dev, including a one-time pass to generate embeddings for existing KB chunks via Azure OpenAI.
  • Dual-write enabled for new sessions in a canary tenant.
  • Validate WebSocket through Front Door, gate-decision through Service Bus, KB query through the Postgres hybrid search.

Phase 2 — cutover (1 week)

  • Production data migration window.
  • DNS swing to Front Door.
  • Sentry alerting, App Insights dashboards live.
  • SQLite kept read-only for 30 days as rollback insurance.

Phase 3 — decommission (1 week, 30+ days later)

  • Remove SQLite path from the code.
  • Retire local file write paths (audit, logger, baselines, deliveries) in favour of Blob.
  • Lock SQLite snapshot to an archive.

17. Open questions / decisions still to make

  1. Multi-tenancy model. Single Azure tenancy (one subscription, one resource group, userId everywhere) or per-firm subscription? Single-tenancy is much simpler and is the default unless a firm specifically asks for isolation.
  2. Data residency. Do we need to pin some firms to West Europe vs UK South vs East US? If yes, we need a per-tenant region attribute in the user record and routing logic in Front Door (Routes by header / cookie).
  3. Backup / DR. Postgres point-in-time restore on Flexible Server is built-in (7–35 days) and now covers the KB vectors too (one store). Do we want geo-redundant storage for Blob and a read replica for the DB? Probably yes for production.
  4. Audit-bundle retention. Some firms will want 7y retention with legal-hold immutability. Blob immutability policies cover this if we decide it's a hard product requirement.
  5. Web PubSub now or later. Sticky routing buys us months. Past a few thousand concurrent WebSocket sessions, the math flips.
  6. Cowork mode interop. The browser File System Access API path (viz/src/cowork/) keeps working unchanged — local files never touch the server. Still worth confirming the privacy promise survives the cloud-hosted positioning.
  7. Clawern → cloud KB. Should Clawern's precedent-board.ts optionally sync to the cloud KB for shared firm precedents, or stay strictly local? This is a product decision, not an architecture one.

Appendix A — file-by-file change summary

File Change
src/db/database.ts Becomes the SQLite adapter; add Postgres adapter behind interface.
src/knowledge-base/indexer.ts Adds an Azure OpenAI embedding call per chunk; writes content_tsv + embedding to the Postgres kb_chunks table.
src/knowledge-base/retriever.ts FTS5 MATCH becomes a Postgres tsvector + pgvector hybrid (RRF) query; synonym expansion and user-scoping retained.
src/api/middleware/auth.ts Replaced by Entra ID JWT validation.
src/api/server.ts No change to handler shape; deploy target changes.
src/session/session-manager.ts Moves to lavern-worker revision; api becomes stateless.
src/events/event-bus.ts In-process EventEmitter kept; outbound bridge to Service Bus added.
src/gates/ Async resolver becomes default; sync resolver retained for CLI.
src/utils/logger.ts Becomes App Insights adapter.
src/utils/audit-persistence.ts Writes to Blob lavern-audit/.
src/workflows/executor.ts Intermediate artefacts to Blob; control flow unchanged.
src/api/routes/knowledge-base.ts Upload returns a SAS URL; chunking + embedding + Postgres insert happen via Event Grid → job.
src/agents/ No change — prompts stay code-resident.
src/workflows/templates/ No change — templates stay code-resident.
SOUL.md No change — default firm soul stays code-resident; per-user override stays in DB.
src/claw/ No change — Clawern is out of scope; runs locally as today.
viz/src/cowork/ No change — still uses File System Access API; local files never touch cloud.

Appendix B — services not chosen, and why

Considered Why not
Azure SQL Schema rewrite for T-SQL; Postgres is closer to SQLite ergonomically, has jsonb, and its ON CONFLICT … excluded.* upserts (already used in src/db/database.ts) port verbatim. Also rules itself out by lacking pgvector.
Azure AI Search Would own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in pgvector — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors.
Cosmos DB for KB chunks Chunks + vectors live in Postgres; Cosmos would add a second store.
AKS Operationally heavy; Container Apps covers our needs.
Azure Functions Cold start hurts WebSocket and orchestration loops. Container Apps with KEDA gives us the autoscale without the cold-start tax.
App Service for Containers WebSocket sticky-only; less elegant scale.
Azure OpenAI (for agent inference) Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. AOAI is used only for KB embeddings (§6), never agent reasoning.
API Management Overkill for our REST surface. Front Door + Container Apps ingress suffices. Revisit if we expose a public partner API.
Azure Communication Services / SignalR Web PubSub is the modern replacement; SignalR remains for legacy .NET SignalR-hub interop, which we don't need.
Azure Cache for Redis in v1 Worker-owned sessions avoid the need; revisit if we move to fully stateless workers.