Deeper · Migration research

Lavern → Azure

How to move Lavern's dashboard, engagement engine, knowledge base, sessions, matters, auth, and audit data onto Azure. Clawern stays on the client — see Clawern for why that's a product call, not a limitation. Every code reference below links to the file in Altien/lavernDev on GitHub.

Constraints fixed up-front

These are inputs, not findings — set during research planning so the design has somewhere firm to stand.

Decision	Value	Rationale
Vectors	Postgres `pgvector` owns the vector index (in-DB)	Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (`tsvector` BM25 + vector) in one database.
Compute	Azure Container Apps	Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings.
Identity	Entra ID (workforce / B2B)	Org-tenant aware. Conditional access, MFA from the tenant, no separate user store to operate.
Database	PostgreSQL Flexible Server (with `pgvector`)	Schema-compatible target for the SQLite tables in `src/db/database.ts`; mature; `jsonb`; hosts the KB vector index in-DB via the allow-listed `pgvector` extension.
Async messaging	Service Bus + Event Grid	Service Bus for guaranteed-delivery work and gate-decision handoffs. Event Grid for blob-arrival fan-out.
Scope	Dashboard + KB; Clawern stays local	Clawern's whole identity is on-device — daemon, menu bar, Telegram, filesystem watch. Lifting it loses the privacy claim.

Target architecture in one paragraph

Public traffic arrives at Azure Front Door — TLS, WAF, global anycast — which fronts a Container Apps environment running three revisions: lavern-api (Fastify + WebSocket, today's src/api/server.ts), lavern-worker (orchestration loop), and lavern-jobs (KEDA-scaled short tasks: KB ingest, derivative rendering). All three sit on a private VNET with Private Endpoints to PostgreSQL Flexible Server (KB vectors via pgvector), Azure OpenAI (embeddings only), Blob Storage, Service Bus, Key Vault, and Application Insights. Identity is Entra ID for users; Managed Identity end-to-end for everything else, so no service-to-service password lives in an environment variable.

Compute — Azure Container Apps

Container Apps wins over App Service for Containers (sticky-only WebSocket, no native KEDA) and AKS (overkill for a small fleet). The one thing Container Apps doesn't give us is fine-grained sidecar control; we don't need any.

How the current process splits

Today, a single Fastify process serves HTTP + WebSocket, holds live SessionState in a Map, runs the orchestrator (src/orchestrator.ts, src/dispatch.ts), dispatches agents through the Claude Agent SDK, and emits events on the in-process EventBus. For Azure, we split into three revisions:

Revision	Replicas	Scales on	Holds
`lavern-api`	min 2	HTTP / WS concurrency	REST routes, WebSocket fan-out, session router
`lavern-worker`	min 1	Service Bus queue depth	Active orchestration loops, live SessionState
`lavern-jobs`	0 .. N	KEDA (queue length)	KB ingest, derivative render, batch workflows

WebSocket

@fastify/websocket works in Container Apps without modification. Two options for cross-pod fan-out: sticky routing (Front Door affinity cookie, simple) or Azure Web PubSub (browsers connect to PubSub directly, API publishes events; stateless pods, clean rollouts). Recommendation: start with sticky, migrate to Web PubSub when concurrent connections start fighting the rollout story.

Relational data — Postgres Flexible Server

Every table in src/db/database.ts ports straight across with three categories of edit.

SQLite table	Postgres treatment
`users`	Direct; add `citext` on email; `oid` column to mirror Entra subject.
`auth_tokens`	Mostly redundant under Entra. Keep only for service-to-service tokens.
`session_archive`	Direct port. `summary_json` becomes `jsonb` + GIN index.
`matters`	Direct port. `data_json` becomes `jsonb`.
`kb_collections`, `kb_documents`	Direct port; `kb_documents` stays the document-of-record.
`kb_chunks`	Kept. Add a `content_tsv tsvector` column (lexical/FTS) and an `embedding vector(1536)` column (`pgvector`). Chunks + vectors live in Postgres.
`shared_agents`, `shared_teams`	Direct port; JSON payloads → `jsonb`.
`user_usage`, `billing_events`, `billable_hours`, `daily_spend`	Direct port; money columns → `numeric(12,4)`.
`audit_log`	Direct port; partition by month at scale.

SQLite-isms to watch

JSON columns stored as TEXT today (data_json, summary_json, profile_json, metadata) promote to jsonb. Add GIN indexes where we actually query into them.
Dates live as ISO TEXT; convert to timestamptz.
FTS5 virtual table becomes a Postgres tsvector column (GIN index); a pgvector column adds the semantic half. The KB search path stays in Postgres.
Concurrency is the big mechanical change: better-sqlite3 is synchronous; the Postgres driver isn't. The DB layer needs an await-everywhere pass.

Mechanics: pgloader for the bulk move, then a TypeScript script for JSON-column transforms. Connection pooling via PgBouncer (Flexible Server's built-in endpoint). Auth to Postgres via Managed Identity from Container Apps — no password in any connection string.

Blob Storage — the prompts/workflows question, answered

The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that, with a defensible split.

Stays code-resident — versioned in git, shipped with the image

src/agents/ — the 67 agent prompts. They are the product; they belong with the code that calls them.
src/workflows/templates/ — the 9 workflow definitions. Same argument.
SOUL.md — default firm soul. Per-user override already lives in the database; the default is a code asset.
src/mcp/tools/ — MCP tool implementations.

Why not Blob?

Change control. Modifying an agent prompt today requires a PR. Move them to Blob and anyone with write access can change agent behaviour silently, in production, with no trace in source control.
Atomic deploys. Code + prompts ship together. If a prompt change breaks a workflow, you roll back one artifact, not two.
Local development. Devs can run the engine on their machine without provisioning Blob or syncing seed data.
Tests. 1,677 tests in tests/ read prompts from disk. Blob-resident prompts mean mocking Blob in every test.

Goes to Blob — user-generated, runtime-mutable

Category	Container	Lifecycle
Uploaded source documents	`lavern-uploads`	Soft-delete after archive; hard-delete after retention window.
Final deliverables	`lavern-deliveries`	Per-matter retention.
Audit bundles	`lavern-audit`	Per data class. Regulated → cold tier, 7y, immutability policy.
KB document originals	`lavern-kb-originals`	Until KB doc is deleted. Chunks + vectors live in Postgres; original stays in Blob.
Derivatives (HTML/DOCX/PDF)	`lavern-derivatives`	Soft-delete after 30 days; users regenerate on demand.
Log archives	`lavern-logs`	Cool tier after 30d; archive tier after 180d.
Avatars, share OG images	`lavern-public`	Public read. The only public container.

What writes to disk today, and where it ends up

File	Today	Azure
`src/utils/logger.ts`	Rotating log files	App Insights (live); Blob `lavern-logs/` (archive)
`src/utils/audit-persistence.ts`	JSONL audit log	Blob `lavern-audit/`
`src/workflows/executor.ts`	Intermediate artefacts	Blob `lavern-deliveries/<sessionId>/intermediate/`
`src/mcp/tools/baselines.ts`	Baseline JSON	Blob `lavern-baselines/`
`src/mcp/tools/report-card.ts`	Final report	Blob `lavern-deliveries/`
`src/mcp/tools/legal-md-compiler.ts`	Compiled markdown	Blob `lavern-deliveries/`
`src/providers/mistral-executor.ts`	Provider artefacts	Blob `lavern-deliveries/`

Front-end uploads use a user-delegation SAS issued by the API after authorisation — the file doesn't traverse the API process for the upload itself, only for orchestration.

Vector + search — Postgres `pgvector`

Decision (supersedes the original v1 draft of this section): vectors live in Postgres via pgvector, not Azure AI Search. This keeps KB retrieval in one store with the relational data, removes the AI Search service (and its ~$75/mo floor), is genuine feature-parity with today's lexical KB, and adds semantic recall on top. pgvector is an allow-listed extension on Flexible Server (azure.extensions = VECTOR).

The pipeline keeps its shape: src/knowledge-base/indexer.ts parses and section-chunks; src/knowledge-base/retriever.ts runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector column; a new pgvector column adds the semantic half. No separate search service.

`kb_chunks` schema (Postgres)

Column	Type	Notes
`id`	text (PK)	`<documentId>_<chunkIndex>`
`document_id`	text (FK → `kb_documents`)	—
`collection_id`, `user_id`	text (indexed)	Security-trim every query on `user_id`
`heading`, `content`	text	Section heading from structure detector + chunk text
`content_tsv`	tsvector (GIN index)	Lexical half — generated from `heading \|\| ' ' \|\| content`
`embedding`	vector(1536) (HNSW index)	Semantic half — Azure OpenAI `text-embedding-3-small`
`doc_type`, `jurisdiction`	text (indexed)	From `kb_documents`

Embedding model — Azure OpenAI `text-embedding-3` (decided)

This is the one place Azure OpenAI enters the architecture; agent reasoning stays on Anthropic. text-embedding-3-small (1,536-dim) is the default cost/quality pick; text-embedding-3-large (up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint sits behind a Private Endpoint; authenticate via Managed Identity (preferred) or a Key Vault–held key.

Indexing pipeline

Push from a worker. Event Grid fires on Blob upload → Service Bus → lavern-jobs KEDA-scales → our parser + section chunker (existing code) runs → Azure OpenAI embeds each chunk → rows are inserted into Postgres (content_tsv DB-generated, embedding carrying the vector). We keep our legal-aware chunking, which materially affects citation quality.

Hybrid query in one statement

WITH lex AS (
  SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
  FROM kb_chunks
  WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
  ORDER BY s DESC LIMIT 40
),
vec AS (
  SELECT id, 1 - (embedding <=> $3::vector) AS s
  FROM kb_chunks
  WHERE user_id = $2
  ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k

user_id is filtered in every branch — the user-scoping guarantee in retriever.ts is preserved. Legal-synonym expansion stays as lexical pre-processing; the n-gram re-rank is superseded by the vector half. pgvector with an HNSW index serves into the millions of vectors before tuning pressure — if the corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1 requirement.

Async messaging — Service Bus + Event Grid

The in-process EventBus stays exactly as it is inside a single pod. We add an outbound bridge for events that need to cross pods. Most don't — verbose per-step events (tool_used) stay in-pod and surface as App Insights metrics. Cross-pod handoffs go through Service Bus.

Event class	Channel	Why
session lifecycle, findings, debate	Service Bus topic `lavern-events`	WS fan-out + audit consume separately
gate request/response	Service Bus queue `lavern-gates`	Worker blocked; needs guaranteed delivery
blob upload (KB doc, source doc)	Event Grid → Service Bus → KEDA job	Triggers ingest / derivative jobs
tool_used, verification_run	App Insights metric only	Worker-local, high-volume, observability concern

Sessions — the hardest design call

Live sessions live in an in-process Map today (session-manager.ts: TTL eviction, 100-session cap, hydrated on demand). When we split api / worker / jobs, sessions need a home that isn't "the api pod that happened to handle the first request."

Option	Pro	Con
A. Sticky to api pod	Zero code change	No per-session horizontal scale; api can't recycle until session expires
B. Worker owns the session; api stateless Recommended	Clean target architecture, api scales freely	Gate-decision becomes async (we mostly model this already via `src/gates/`)
C. Externalise to Redis	Maximum elasticity	Serialisation cost; SessionState wasn't designed for it

Recommendation: option B. Worker holds live state; api routes WebSocket subscriptions to Service Bus topics filtered by sessionId. Archive on completion goes to Postgres, same as today.

Identity — Entra ID workforce / B2B

The current auth.ts middleware handles cookie + Bearer and runs a no-op LOCAL-MODE. The auth-shaped routes in auth-routes.ts and google-auth.ts only register when LAVERN_AUTH_ENABLED=true.

Frontend: MSAL.js, auth-code flow with PKCE.
Backend: JWT validation against the Entra tenant — @azure/identity + a Fastify auth plugin.
Application user table: users stays as the app-side mirror keyed by Entra oid. Populated on first sign-in. Needed because matters, billing_events, kb_collections, et al. all FK to it.
Service-to-service: Managed Identity end-to-end — Container Apps → Postgres, Blob, Azure OpenAI (embeddings), Service Bus, Key Vault. No service account secrets in env.
API clients (the api_clients table) keep their existing key model; rotated via Key Vault references.

Secrets — Key Vault

Secret	Today	Azure
Anthropic API key	env	Key Vault → Container Apps secret reference
Mistral API key	env	Key Vault
Stripe secret + webhook secret	env	Key Vault
SMTP credentials	env	Key Vault
Database connection	env	Managed Identity (no string at all)
Storage account keys	n/a	Managed Identity
Azure OpenAI embeddings key	n/a (new)	Managed Identity (or Key Vault key if MI unavailable)
`LAVERN_MANAGED_AGENTS_BRIDGE_SECRET`	env	Key Vault

Observability — App Insights + Log Analytics

src/utils/logger.ts becomes a thin App Insights adapter — its existing shape stays. OpenTelemetry on Fastify + http client + SDK calls feeds App Insights traces. Product analytics (session_started, workflow_completed, verification_failed) ride off the EventBus as custom events. Log Analytics provides KQL behind App Insights. Sentry can stay as the crash channel if richer grouping matters.

Cost note: App Insights ingestion is the most variable cost in this architecture. Set sampling at 50% for verbose, 100% for warnings and errors. Off in dev.

Networking — Front Door + Private Endpoints

Front Door Premium in front of Container Apps — TLS, WAF, anycast, managed certs. WebSocket through Front Door works without modification.
Private Endpoints for Postgres, Azure OpenAI, Blob, Key Vault. None reachable from the public internet.
VNET integration on Container Apps; outbound through the VNET.
Egress allow-list: api.anthropic.com, api.mistral.ai, api.stripe.com, accounts.google.com (if Google OAuth is retained). Everything else blocked.

LLM provider plane

Stays on Anthropic direct. The dual-provider abstraction in src/providers/ keeps Mistral as the EU-sovereign option. Adding Azure OpenAI in v1 is plumbing without product benefit. The abstraction already supports adding AOAI without disturbing other paths if a future customer needs it.

CI/CD & infrastructure-as-code

Bicep for everything (or Terraform if there's a house preference): Container Apps env, ACR, Postgres (with pgvector via azure.extensions), Azure OpenAI (embeddings deployment), Blob, Service Bus, Key Vault, Front Door, App Insights, networking.
GitHub Actions with OIDC federation to Azure — no service-principal passwords in repo. Build container, push to ACR, az containerapp update --revision-suffix <sha> for blue-green.
Per-environment: dev (small SKUs of everything), staging (mirror), prod.
DB migrations in a separate lavern-migrate job that runs before the api swap.

Cost shape — rough monthly OOM

Single region, modest load (~50 active firms, ~5,000 documents indexed):

Service	SKU	Est. monthly
Container Apps (api + worker + jobs)	2 always-on small + bursts	$200–400
Postgres Flexible Server (+ `pgvector`)	D2ds_v5 (2 vCPU / 8 GB) — bumped for vector index	$150
Azure OpenAI embeddings	`text-embedding-3-small`, usage-based	$5–20
Blob Storage	Hot ~500 GB + ops	$30
Service Bus	Standard	$10
Front Door Premium	Base + traffic	$330+
Key Vault	Standard	<$5
App Insights	5 GB/day cap	$80
Total		~$810–1,075

Keeping vectors in pgvector removes the Azure AI Search line (was ~$250/mo S1) for a modest Postgres SKU bump — a net saving of roughly $200–230/mo. Agent LLM spend lives separately and passes through; embeddings are the only Azure-side model cost and are tiny. Cost levers: Front Door → Standard saves ~$200; audit and log archive in cool/archive tiers; right-size Postgres only on actual pressure (the vector index is the main memory driver).

Migration plan — three phases

Phase 0 — Preparation · 1–2 weeks

Add a persistence interface in front of src/db/ — SQLite and Postgres adapters behind the same shape.
Add a blob abstraction wrapping file IO; today local FS, tomorrow @azure/storage-blob.
Add a search abstraction wrapping the FTS5 layer; tomorrow the Postgres tsvector + pgvector hybrid query.
All three behind feature flags. No Azure infra yet.

Phase 1 — Azure up, dual-write · 2–3 weeks

Bicep stack deployed to a dev subscription.
Container Apps running the api image; Postgres (with pgvector) live with sample data.
Migration scripts backfill from SQLite → Postgres in dev, including a one-time pass to embed existing KB chunks via Azure OpenAI.
Dual-write enabled in a canary tenant. Validate WebSocket through Front Door, gate via Service Bus, KB query via the Postgres hybrid search.

Phase 2 — Cutover · 1 week

Production data migration window.
DNS swing to Front Door.
App Insights dashboards + Sentry alerts live.
SQLite kept read-only for 30 days as rollback insurance.

Phase 3 — Decommission · 1 week, 30+ days later

Remove SQLite path from code.
Retire local file-write paths (audit, logger, baselines, deliveries) in favour of Blob.
Archive the final SQLite snapshot.

Open questions

Multi-tenancy: single subscription + userId everywhere, or per-firm subscription for isolation-sensitive customers? Single-tenant unless a firm specifically asks otherwise.
Data residency: per-tenant region (West Europe vs UK South vs East US) requires routing logic in Front Door and a region attribute on users.
DR: Postgres Flexible Server has built-in PITR (7–35d), now covering the KB vectors too (one store). Do we want geo-redundant Blob and a DB read replica for prod? Likely yes.
Audit retention: Blob immutability policies (legal hold, 7y) — switch on if firms ask.
Web PubSub timing: sticky routing now; PubSub when concurrent WebSockets force the issue.
Cowork interop: viz/src/cowork/ keeps working unchanged — local files never touch the server. Privacy promise survives.
Clawern → cloud KB: should the local precedent board optionally sync to the cloud KB for shared firm precedents? Product call, not architecture.

Services considered but not chosen

Service	Why not
Azure SQL	T-SQL rewrite; Postgres is closer to SQLite ergonomically, has `jsonb`, and its `ON CONFLICT … excluded.*` upserts (already used in `src/db/database.ts`) port verbatim. Also lacks `pgvector`.
Azure AI Search	Would own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in `pgvector` — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors.
Cosmos DB for KB chunks	Chunks + vectors live in Postgres; Cosmos would be a parallel store.
AKS	Operationally heavy; Container Apps suffices.
Azure Functions	Cold start hurts orchestration loops and WebSockets.
App Service for Containers	WebSocket sticky-only, no native KEDA.
Azure OpenAI (for agent inference)	Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. Used only for KB embeddings, never agent reasoning.
API Management	Overkill for our REST surface; Front Door + Container Apps ingress suffices.
Azure Cache for Redis (v1)	Worker-owned sessions avoid the need.

File-by-file change summary

File	Change
`src/db/database.ts`	Becomes the SQLite adapter; add Postgres adapter behind interface.
`src/knowledge-base/indexer.ts`	Adds an Azure OpenAI embedding call per chunk; writes `content_tsv` + `embedding` to the Postgres `kb_chunks` table.
`src/knowledge-base/retriever.ts`	FTS5 `MATCH` becomes a Postgres `tsvector` + `pgvector` hybrid (RRF) query; synonym expansion and user-scoping retained.
`src/api/middleware/auth.ts`	Replaced by Entra ID JWT validation.
`src/api/server.ts`	Handler shape unchanged; deploy target changes.
`src/session/session-manager.ts`	Moves to `lavern-worker`; api becomes stateless.
`src/events/event-bus.ts`	In-pod EventEmitter retained; outbound bridge to Service Bus added.
`src/gates/`	Async resolver becomes default; sync resolver retained for CLI.
`src/utils/logger.ts`	Becomes App Insights adapter.
`src/utils/audit-persistence.ts`	Writes to Blob `lavern-audit/`.
`src/workflows/executor.ts`	Intermediate artefacts to Blob; control flow unchanged.
`src/api/routes/knowledge-base.ts`	Upload returns a SAS URL; indexing via Event Grid → job.
`src/agents/`	No change. Prompts stay code-resident.
`src/workflows/templates/`	No change. Templates stay code-resident.
`SOUL.md`	No change. Default firm soul stays code-resident.
`src/claw/`	No change. Clawern stays local — see Clawern.
`viz/src/cowork/`	No change. Local-only FS Access API path survives.

Research artifact: docs/azure-migration.md · Repo: github.com/Altien/lavernDev · Scope decision: Dashboard + KB to Azure; Clawern stays local.