Integrating & operating

The system's edges: providers and tools, the API surface, the remote bridge, datasets, and the two live migration roadmaps.

Lavern → Azure migration research

Scope. Lavern dashboard, API, knowledge base, sessions, matters, auth, billing, and audit data move to Azure. Clawern stays on the client — it remains a local Node.js daemon on the user's machine, optionally pointing at the cloud KB for shared firm precedents. This document is the research artifact behind docs/explore/azure-migration.html.

Repository: github.com/Altien/lavernDev · Main branch: main

1. Constraints fixed up-front

These are inputs to the design, not findings.

Decision	Value	Rationale
Vectors	Postgres `pgvector` owns the vector index (in-DB)	Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (`tsvector` BM25 + vector) in one database.
Compute	Azure Container Apps for API + workers	Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings.
Identity	Entra ID (workforce / B2B)	Org-tenant aware. Works with conditional access, no separate user store, MFA from the tenant.
Database	Azure Database for PostgreSQL Flexible Server (with `pgvector`)	Schema-compatible target for the current SQLite tables, mature, and hosts the KB vector index in-DB via the allow-listed `pgvector` extension.
Async messaging	Azure Service Bus for guaranteed-delivery work, Event Grid for blob/event fan-out	Service Bus is the right primitive for cross-pod orchestration signals; Event Grid is right for "blob landed" triggers.
Region	Single primary region for v1; design for multi-region but defer	Cost and latency picked over DR until usage warrants it.
LLM provider	Anthropic direct (unchanged); Mistral path preserved	Chat/agent inference stays on Anthropic. Azure OpenAI is used only for KB embeddings (`text-embedding-3`), never for agent reasoning.

2. Target architecture (one picture, in words)

Public traffic enters via Azure Front Door (TLS termination, WAF, global anycast). Front Door fronts a Container Apps environment running:

lavern-api revision — Fastify HTTP + WebSocket server, the same one in src/api/server.ts.
lavern-worker revision — long-running orchestration worker that pulls Service Bus messages and runs workflows (today this happens in the API process; we split it).
lavern-jobs (KEDA-scaled job) — short-lived bursts (KB ingest, derivative rendering, large workflow runs).

Behind them, all on the same VNET with Private Endpoints:

PostgreSQL Flexible Server for relational data (src/db/database.ts).
Postgres pgvector holds the KB vector index alongside the relational tables — the SQLite FTS5 + n-gram re-rank in src/knowledge-base/retriever.ts becomes a Postgres hybrid query (tsvector BM25 + pgvector cosine).
Azure Blob Storage for user-generated content (uploads, deliverables, audit bundles, derivatives, log archive).
Azure Service Bus for cross-instance events and work scheduling.
Azure Key Vault for any non-managed secrets (Anthropic API keys, Azure OpenAI embeddings key, Stripe webhook secrets, SMTP credentials).
Application Insights + Log Analytics for logs, traces, metrics.
Azure Cache for Redis (optional, v2) — only if we move session state off in-process.

Egress to the public internet (Anthropic, Mistral, Stripe, Google OAuth where used) goes through Container Apps' managed egress.

3. Compute — Azure Container Apps

Why Container Apps over the alternatives

Option	Why it wins	Why it loses
Container Apps	Container-native, long-lived WebSocket OK, KEDA scale incl. scale-to-zero on non-WS revisions, Dapr if we ever want it, managed envoy ingress. Cheapest target for a Fastify image.	No fine-grained sidecar control, less mature than AKS.
App Service for Containers	Mature, simple.	Scale-to-zero awkward; per-instance sticky needed for WebSocket; KEDA not native.
AKS	Maximum control.	Operationally heavy; we don't need cluster-level features yet.

How the current process splits

The current single Fastify process does several things:

Serve HTTP + WebSocket (src/api/server.ts, src/api/ws-handler.ts)
Hold live SessionState in a Map (src/session/session-manager.ts)
Run the orchestration loop (src/orchestrator.ts, src/dispatch.ts)
Dispatch agents (Claude Agent SDK calls)
Emit events on the in-process EventBus (src/events/event-bus.ts)
Archive completed sessions to SQLite

For Azure, we split into three revisions:

Revision	Replicas	Scales on	Holds
`lavern-api`	min 2, max N	HTTP / WS concurrency	Session router + WebSocket connections + REST routes
`lavern-worker`	min 1, max M	Service Bus queue depth	Active session orchestration loops
`lavern-jobs`	0..K	KEDA (queue length)	One-shot tasks (KB ingest, large workflow batches, derivatives)

This is the moment to make the orchestrator stateless across pods. See §6 (sessions).

WebSocket considerations

@fastify/websocket works inside Container Apps without modification. Two choices:

Sticky routing + in-pod session state. Front Door affinity cookie, ARR-style. Simple, but limits horizontal scaling and disallows transparent rollouts mid-session.
Connection broker (Azure Web PubSub). Browsers connect to Web PubSub directly; the API publishes events to a per-session topic. Pods are stateless re: clients; rollouts are clean. The current ws-handler.ts becomes an event publisher.

Recommendation: Start with sticky routing (option 1) on day 1 — keeps the migration small. Migrate to Azure Web PubSub when we hit horizontal scale or zero-downtime requirements that sticky can't satisfy.

4. Relational data — Postgres Flexible Server

What moves

Every table currently defined in src/db/database.ts:

SQLite table	Notes for Postgres port
`users`	UUID PKs (already TEXT in SQLite); add citext for email.
`auth_tokens`	Becomes mostly redundant under Entra ID. Keep for service-to-service tokens only.
`session_archive`	Direct port. `summary_json` becomes `jsonb`. Add GIN index for query.
`matters`	Direct port. `data_json` becomes `jsonb`.
`api_clients`	Direct port.
`kb_collections`, `kb_documents`	Direct port. `kb_documents` stays the document-of-record.
`kb_chunks`	Keep. Add a `content_tsv tsvector` column (lexical/FTS) and an `embedding vector(1536)` column (`pgvector`). Chunks + their vectors live in Postgres. See §6.
`shared_agents`, `shared_teams`	Direct port. `profile_json` → `jsonb`.
`user_usage`, `billing_events`, `billable_hours`, `daily_spend`	Direct port. Money columns become `numeric(12,4)`.
`audit_log`	Direct port. Consider partitioning by month at scale.
`waitlist`	Direct port. Probably retire when out of waitlist.
`user_tokens`	OAuth refresh tokens — keep, or replace entirely with Entra-issued tokens.

SQLite-isms that need attention

TEXT NOT NULL everywhere — convert to text NOT NULL. Postgres is strict about type; SQLite isn't.
JSON columns stored as TEXT today (data_json, summary_json, profile_json, metadata) — promote to jsonb and add GIN indexes where we actually query into them.
No date type in SQLite — current created_at TEXT columns become timestamptz. Migration must parse ISO strings.
REFERENCES users(id) — keep, but SQLite is permissive about NULL FKs (anonymous sessions); Postgres needs ON DELETE SET NULL where the relationship is genuinely optional.
FTS5 virtual table — gone. The KB search path leaves Postgres entirely (see §6).
Concurrency — better-sqlite3 is synchronous; Postgres driver (pg / postgres-js) is async. The DB layer needs an await-everywhere pass. This is the biggest mechanical change in the migration.

Migration mechanics

Use pgloader for the bulk move (handles SQLite quirks well), then run a TypeScript script for the JSON-column transforms (text → jsonb).
Run dual-write for a window if we need zero-downtime cutover; otherwise a short maintenance window is honest and far simpler.
Connection pooling via PgBouncer (Flexible Server has a built-in pooler endpoint).
Use Managed Identity from Container Apps to authenticate to Postgres — no password in the connection string.

Why not Hyperscale (Citus) or Azure SQL

Hyperscale (Citus) is overkill until we shard. Flexible Server scales vertically well into the hundreds of GB and tens of thousands of TPS, which is well past where we will need a redesign for unrelated reasons.
Azure SQL would mean rewriting the schema for T-SQL idioms (no jsonb, different default semantics). Postgres is closer to SQLite in spirit.

5. Blob Storage — what becomes blob-native vs stays code-resident

This is the question that needs the most defending. The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that.

Recommendation: split by content authority

Stays code-resident (versioned in git, shipped with the container image):

src/agents/ — the 67 agent prompts. They are the product; they belong with the code that calls them.
src/workflows/templates/ — the 9 workflow definitions. Same argument.
SOUL.md — the default firm soul. Per-user override already comes from the database; the default is a code asset.
src/mcp/tools/ — MCP tool implementations.

Why not Blob?

Change control. Today, modifying an agent prompt requires a pull request, reviewed and merged. Moving them to Blob means anyone with Blob write can change agent behaviour silently, in production, without leaving a trace in source control.
Atomic deploys. Code + prompts ship together. If a prompt change breaks a workflow, you roll back one artifact. With Blob-resident prompts, the prompt version and the code version drift.
Local development. Devs can run the engine on their machine without provisioning Blob, importing seed data, or syncing with prod.
Test fixtures. The existing test suite (1,677 tests, tests/) reads prompts directly from disk. Moving them to Blob means mocking Blob in tests — pure overhead.

Goes to Blob (user-generated, runtime-mutable):

Category	Container	Lifecycle
Uploaded source documents	`lavern-uploads/<user>/<sessionId>/`	Soft-delete after session archive; hard-delete after retention window.
Final deliverables	`lavern-deliveries/<user>/<sessionId>/`	Retained per matter retention policy.
Audit bundles	`lavern-audit/<user>/<sessionId>/`	Retain per the user's data class (regulated → cold tier, 7y).
KB document originals	`lavern-kb-originals/<user>/<collectionId>/`	Until KB doc is deleted. The chunks + vectors live in Postgres; the original file lives in Blob.
Derivatives (HTML/DOCX/PDF renders)	`lavern-derivatives/<user>/<sessionId>/`	Soft-delete after 30 days; users can regenerate.
Log archives	`lavern-logs/`	Cool tier after 30d; archive tier after 180d.
Custom agent avatars / share OG images	`lavern-public/`	Public read; the only public container.

Files that currently write to disk and need to move

From a grep over src/ (writeFileSync / mkdirSync):

File	What it writes today	Azure target
src/utils/logger.ts	Rotating log files with gzip	Application Insights (live), Blob `lavern-logs/` (archive).
src/utils/audit-persistence.ts	Audit log JSONL	Blob `lavern-audit/`; query via Log Analytics if we want it searchable.
src/workflows/executor.ts	Intermediate artefacts	Blob `lavern-deliveries/<sessionId>/intermediate/`.
src/mcp/tools/baselines.ts	Baseline JSON	Blob `lavern-baselines/`, keyed by user.
src/mcp/tools/report-card.ts	Final report	Blob `lavern-deliveries/`.
src/mcp/tools/legal-md-compiler.ts	Compiled markdown	Blob `lavern-deliveries/`.
src/providers/mistral-executor.ts	Provider artefacts	Blob `lavern-deliveries/`.
src/providers/local-executor.ts	Local-pipeline artefacts	Blob `lavern-deliveries/`.
src/api/middleware/validation.ts	Captured invalid-request bodies	Application Insights (debug) only — no Blob.

Access pattern

Container Apps uses Managed Identity with a Storage Blob Data Contributor role on the containers it needs. No connection strings, no shared keys.

Front-end direct upload via user-delegation SAS issued by the API after authorisation — the file never traverses the API process for the upload, only for orchestration.

6. Vector + search — Postgres `pgvector`

Decision (supersedes the original v1 draft of this section). Vectors live in Postgres via pgvector, not Azure AI Search. This keeps KB retrieval in one store with the relational data, removes the AI Search service (and its ~$75/mo floor), is genuine feature-parity with today's lexical KB, and adds semantic recall on top. pgvector is an allow-listed extension on Flexible Server (azure.extensions = VECTOR).

Where the KB lives

The pipeline keeps its shape: src/knowledge-base/indexer.ts parses and section-chunks; src/knowledge-base/retriever.ts runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector column; a new pgvector column adds the semantic half. No separate search service.

`kb_chunks` schema (Postgres)

Column	Type	Notes
`id`	`text` (PK)	`<documentId>_<chunkIndex>`.
`document_id`	`text` (FK → `kb_documents`)
`collection_id`	`text` (indexed)	Filter.
`user_id`	`text` (indexed)	Security-trim every query.
`heading`	`text`	Section heading.
`content`	`text`	Full chunk text.
`content_tsv`	`tsvector` (GIN index)	Lexical half — generated from `heading \|\| ' ' \|\| content`.
`embedding`	`vector(1536)` (HNSW index)	Semantic half — Azure OpenAI `text-embedding-3-small`.
`doc_type`, `jurisdiction`	`text` (indexed)	Filters.
`level`, `word_count`	`int`

Embedding model — Azure OpenAI `text-embedding-3` (decided)

This is the one place Azure OpenAI enters the architecture; agent reasoning stays on Anthropic (§13). text-embedding-3-small (1,536-dim) is the default cost/quality pick; text-embedding-3-large (up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint sits behind a Private Endpoint; authenticate via Managed Identity (preferred) or a Key Vault–held key.

Indexing pipeline

Keep our section-aware chunker — the structure detector in src/documents/structure-detector.ts does real work for legal documents and a generic chunker would degrade citations. On blob upload, Event Grid → Service Bus → KEDA-scaled lavern-jobs run:

Parse + section-chunk (existing code).
Call Azure OpenAI embeddings for each chunk (batched).
INSERT chunk rows into Postgres; content_tsv is DB-generated, embedding carries the vector.

Hybrid query

One Postgres query fuses lexical + vector. Lexical rank via ts_rank_cd over content_tsv; semantic via embedding <=> $queryVec (cosine); combined with Reciprocal Rank Fusion:

WITH lex AS (
  SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
  FROM kb_chunks
  WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
  ORDER BY s DESC LIMIT 40
),
vec AS (
  SELECT id, 1 - (embedding <=> $3::vector) AS s
  FROM kb_chunks
  WHERE user_id = $2
  ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k
SELECT id FROM lex
UNION
SELECT id FROM vec;

user_id is filtered in every branch — the user-scoping guarantee in retriever.ts is preserved. The existing legal-synonym expansion stays as pre-processing on the lexical side; the n-gram re-rank is superseded by the vector half.

Scale note

pgvector with an HNSW index serves into the millions of vectors before tuning pressure — well beyond a single-region Lavern KB. If the corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1 requirement.

7. Async messaging — Service Bus + Event Grid

Where Service Bus replaces the in-process EventBus

The current EventBus in src/events/event-bus.ts is a Node.js EventEmitter. Two consumer classes today:

WebSocket fan-out — clients subscribed to /api/sessions/:id/events receive every event for that session.
In-process logger / audit / cost tracker.

The EventBus stays exactly as it is inside a single pod. We add an outbound bridge for events that need to cross pods.

Event type	Crosses pods?	Goes to Service Bus?
`session_start`, `session_end`	Yes (worker → api)	Yes — topic `lavern-sessions`
`agent_start`, `agent_stop`	Maybe (if WS pod ≠ worker pod)	Yes — topic `lavern-agent-activity`
`finding_posted`, `challenge_posted`, `response_posted`	Yes	Yes
`debate_resolved`	Yes	Yes
`gate_requested`, `gate_decided`	Yes — gate decided in api, worker waiting	Yes — queue (point-to-point) `lavern-gates`
`verification_run`	No (worker-local)	No
`tool_used`	No	No (App Insights metric instead)

Topology: Service Bus topic lavern-events with subscriptions per consumer (WebSocket fan-out, audit, cost). Service Bus queue lavern-gates for gate-decision request/reply, because the worker is genuinely blocked on it.

Where Event Grid fits

Blob upload triggers: user uploads a KB document → Event Grid → Service Bus → lavern-jobs KEDA-scales and runs the indexer.
Same for derivatives: user requests a DOCX render → enqueue → job runs → blob lands → Event Grid → API tells the client via WS.
Stripe webhooks route through Event Grid too if we want guaranteed retry and dead-lettering.

Why both, not just one

Service Bus is the right answer for guaranteed-delivery work scheduling. Event Grid is the right answer for fan-out at high event rates with cheap-per-event pricing. They compose — Event Grid feeds Service Bus for our heavy patterns.

8. Sessions — the hardest design call

Live sessions are an in-process Map today in src/session/session-manager.ts. TTL eviction, 100-session cap per pod. When the engagement completes, state is archived to SQLite.

When we split api / worker / jobs, sessions need a home that isn't "the api pod that happened to handle the first request."

Three options, in increasing strangeness

Option A — Sticky to api pod (no shared state). Affinity cookie binds a client to one api pod for the life of the engagement. The worker that runs orchestration is that same pod. Simplest; we keep SessionState exactly as it is.

Pro: zero code change.
Con: doesn't horizontally scale per session; api revision can't recycle until session expires; loses one core benefit of splitting the worker out.

Option B — Worker owns the session; api is stateless. Each engagement lives in a single lavern-worker replica. The api pod that has the WebSocket subscribes to a Service Bus topic filtered by sessionId. SessionState exists only in the worker.

Pro: clean separation; api is fully stateless and scales freely.
Con: gate-decision path is async — api receives the decision and posts to Service Bus, worker picks it up. We already model this with src/gates/gate-resolver.ts, so the lift is bounded.

Option C — Externalised SessionState (Redis). Move the SessionState object into Azure Cache for Redis. Any pod can pick up any session.

Pro: maximum elasticity.
Con: serialisation cost on every read/write; the SessionState is currently a rich object graph not designed for this.

Recommendation: Option B — worker owns the session, api is stateless. This is the cleanest target architecture and the gate-decision asynchrony is already most of the way there.

Drop the in-process Map's role of "the live store"; replace it with a per-pod working set scoped to that worker's active sessions. Archive on completion goes to Postgres just as today.

9. Identity — Entra ID workforce / B2B

What changes in the codebase

Currently src/api/middleware/auth.ts handles cookie + Bearer auth and runs a no-op LOCAL-MODE. The auth-shaped routes (auth-routes.ts, google-auth.ts) only register when LAVERN_AUTH_ENABLED=true.

For Entra ID:

Replace the cookie/Bearer middleware with MSAL.js on the frontend (auth-code flow with PKCE) and passport-azure-ad or @azure/identity + JWT validation on the backend.
The users table sticks around as the application-side mirror of Entra users — keyed by Entra oid, populated on first sign-in. We still need it because Lavern's user_usage, billing_events, matters, kb_collections all FK to users.id.
Authorisation (what a user can do) stays application-side — Entra tells us who, Lavern decides what. App-roles in Entra optional; we can also keep our own roles table.
Service-to-service authentication uses Managed Identity end-to-end — Container Apps to Postgres, to Blob, to Azure OpenAI (embeddings), to Service Bus, to Key Vault. Zero secrets in any environment variable.

Edge cases

API clients (api_clients table) — these are non-human callers (webhooks, integrations). Keep the existing API-key model; rotate via Key Vault references. Don't try to put non-humans into Entra unless they belong there.
OAuth refresh tokens (user_tokens) — if we still integrate Google Drive / Gmail / Calendar for the user, those stay on the existing OAuth flow; Entra is identity, not third-party authorisation.
Stripe customer mapping — users.stripe_customer_id survives unchanged.

10. Secrets — Key Vault

Today's secret surface from a grep over process.env:

Secret	Source today	Azure target
Anthropic API key	`ANTHROPIC_API_KEY` env	Key Vault, referenced by Container Apps secret.
Mistral API key	`MISTRAL_API_KEY` env	Key Vault.
Stripe secret key, webhook secret	env	Key Vault.
SMTP credentials	env	Key Vault.
Database connection	conn string in env	Managed Identity — no string at all.
Azure OpenAI embeddings key	n/a (new)	Managed Identity to AOAI — or a Key Vault–held key if MI isn't available.
Telegram bot token (Clawern)	env on user's machine	Stays on user's machine; not migrated.
Storage account keys	n/a	Managed Identity.
`LAVERN_MANAGED_AGENTS_BRIDGE_SECRET`	env	Key Vault.

Container Apps references Key Vault secrets by reference, so rotations propagate to running revisions without redeploys.

11. Observability — Application Insights + Log Analytics

Today: src/utils/logger.ts writes structured JSON to stdout and a rotating local file. src/utils/sentry.ts captures errors to Sentry if configured.

Azure target:

Application Insights SDK initialised at process start. The current createLogger becomes a thin adapter that emits structured trace events to App Insights.
OpenTelemetry-based instrumentation for Fastify, http client, SDK calls. App Insights connector ingests.
Custom events for product analytics: session_started, workflow_completed, verification_failed — fed off the EventBus.
Log Analytics workspace behind App Insights for KQL queries and dashboards.
Sentry stays as the error capture channel if we want richer crash grouping than App Insights provides; otherwise drop it.

Cost note: App Insights ingestion is the most variable cost in this whole architecture. Set ingestion sampling at 50% for verbose levels, 100% for warnings/errors. Disable in dev.

12. Networking — Front Door + Private Endpoints

Azure Front Door Premium in front of Container Apps for TLS termination, WAF, global anycast, and managed certs on the custom domain.
Private Endpoints for Postgres, Azure OpenAI, Blob, Key Vault. None of these are reachable from the public internet.
VNET integration on Container Apps — outbound goes through the VNET; inbound is via the Container Apps ingress, which Front Door fronts.
WebSocket through Front Door works without modification.
Egress allow-list: api.anthropic.com, api.mistral.ai, api.stripe.com, accounts.google.com if we keep Google OAuth. Everything else blocked at the firewall.

13. LLM provider plane

Stays on Anthropic direct via @anthropic-ai/sdk. The dual-provider abstraction in src/providers/ keeps Mistral as the EU-sovereign option.

Azure OpenAI — embeddings only. AOAI is not added to the agent provider set; the model behind AOAI is OpenAI's, not Anthropic's, and swapping Claude for GPT would undercut the product. The single use of AOAI is generating KB chunk embeddings for pgvector (§6) — a non-reasoning, mechanical call. Agent inference stays on Anthropic; Mistral remains the EU-sovereign agent option. If a future customer requires AOAI for inference specifically, the provider abstraction is already in place to add it.

14. CI/CD & infrastructure-as-code

Bicep (or Terraform if there's a house preference) for everything: Container Apps environment, registry, Postgres (with pgvector enabled via azure.extensions), Azure OpenAI (embeddings deployment), Blob, Service Bus, Key Vault, Front Door, App Insights, Log Analytics, networking.
GitHub Actions building the container, pushing to Azure Container Registry, deploying via az containerapp update --revision-suffix <sha> for blue-green.
Per-environment: dev (single small SKU of everything), staging (mirror prod), prod.
Database migrations: a separate lavern-migrate job that runs Postgres migrations via drizzle-kit or node-pg-migrate before the api revision swaps.
Secret rotation: GitHub Actions OIDC federation to Azure — no service principal passwords in repo.

15. Cost shape (rough OOM)

Per month, single region, modest load (say, ~50 active firms, ~5,000 documents indexed):

Service	SKU	Est. monthly
Container Apps (api+worker+jobs)	2 always-on small + bursts	$200–400
Postgres Flexible Server (+ `pgvector`)	D2ds_v5 (2vCPU/8GB) — bumped for vector index	$150
Azure OpenAI embeddings	`text-embedding-3-small`, usage-based	$5–20
Blob Storage	Hot tier ~500GB + ops	$30
Service Bus	Standard tier	$10
Front Door Premium	Base + per-request	$330 + traffic
Key Vault	Standard	<$5
App Insights	5GB/day cap	$80
Total		~$810–1,075

Keeping vectors in pgvector removes the Azure AI Search line (was ~$250/mo S1) at the cost of a modest Postgres SKU bump for the HNSW index — a net saving of roughly $200–230/mo. Agent LLM spend lives separately (passes through to Anthropic / Mistral usage costs); embeddings are the only Azure-side model cost and are tiny.

Cost levers:

Drop Front Door to Standard if WAF is overkill (-$200).
Move audit + log archive to cool/archive tiers (-$10–20).
Right-size Postgres up only when we see actual pressure; the vector index is the main memory driver.

16. Migration plan — three phases

Phase 0 — preparation (1–2 weeks)

Add an explicit persistence interface to src/db/ so the engine doesn't talk to better-sqlite3 directly. SQLite and Postgres adapters behind the same shape.
Add a blob abstraction for file IO: an interface that today wraps the local FS, tomorrow wraps @azure/storage-blob.
Add a search abstraction wrapping the FTS5 layer; tomorrow wraps the Postgres tsvector + pgvector hybrid query.
These three abstractions land in the existing repo, behind feature flags. No Azure infra needed yet.

Phase 1 — Azure infra up, dual-write (2–3 weeks)

Bicep stack deployed to a dev subscription.
Container Apps running the api image; Postgres (with pgvector) live but with sample data only.
Migration scripts to backfill from SQLite to Postgres in dev, including a one-time pass to generate embeddings for existing KB chunks via Azure OpenAI.
Dual-write enabled for new sessions in a canary tenant.
Validate WebSocket through Front Door, gate-decision through Service Bus, KB query through the Postgres hybrid search.

Phase 2 — cutover (1 week)

Production data migration window.
DNS swing to Front Door.
Sentry alerting, App Insights dashboards live.
SQLite kept read-only for 30 days as rollback insurance.

Phase 3 — decommission (1 week, 30+ days later)

Remove SQLite path from the code.
Retire local file write paths (audit, logger, baselines, deliveries) in favour of Blob.
Lock SQLite snapshot to an archive.

17. Open questions / decisions still to make

Multi-tenancy model. Single Azure tenancy (one subscription, one resource group, userId everywhere) or per-firm subscription? Single-tenancy is much simpler and is the default unless a firm specifically asks for isolation.
Data residency. Do we need to pin some firms to West Europe vs UK South vs East US? If yes, we need a per-tenant region attribute in the user record and routing logic in Front Door (Routes by header / cookie).
Backup / DR. Postgres point-in-time restore on Flexible Server is built-in (7–35 days) and now covers the KB vectors too (one store). Do we want geo-redundant storage for Blob and a read replica for the DB? Probably yes for production.
Audit-bundle retention. Some firms will want 7y retention with legal-hold immutability. Blob immutability policies cover this if we decide it's a hard product requirement.
Web PubSub now or later. Sticky routing buys us months. Past a few thousand concurrent WebSocket sessions, the math flips.
Cowork mode interop. The browser File System Access API path (viz/src/cowork/) keeps working unchanged — local files never touch the server. Still worth confirming the privacy promise survives the cloud-hosted positioning.
Clawern → cloud KB. Should Clawern's precedent-board.ts optionally sync to the cloud KB for shared firm precedents, or stay strictly local? This is a product decision, not an architecture one.

Appendix A — file-by-file change summary

File	Change
src/db/database.ts	Becomes the SQLite adapter; add Postgres adapter behind interface.
src/knowledge-base/indexer.ts	Adds an Azure OpenAI embedding call per chunk; writes `content_tsv` + `embedding` to the Postgres `kb_chunks` table.
src/knowledge-base/retriever.ts	FTS5 `MATCH` becomes a Postgres `tsvector` + `pgvector` hybrid (RRF) query; synonym expansion and user-scoping retained.
src/api/middleware/auth.ts	Replaced by Entra ID JWT validation.
src/api/server.ts	No change to handler shape; deploy target changes.
src/session/session-manager.ts	Moves to `lavern-worker` revision; api becomes stateless.
src/events/event-bus.ts	In-process EventEmitter kept; outbound bridge to Service Bus added.
src/gates/	Async resolver becomes default; sync resolver retained for CLI.
src/utils/logger.ts	Becomes App Insights adapter.
src/utils/audit-persistence.ts	Writes to Blob `lavern-audit/`.
src/workflows/executor.ts	Intermediate artefacts to Blob; control flow unchanged.
src/api/routes/knowledge-base.ts	Upload returns a SAS URL; chunking + embedding + Postgres insert happen via Event Grid → job.
src/agents/	No change — prompts stay code-resident.
src/workflows/templates/	No change — templates stay code-resident.
SOUL.md	No change — default firm soul stays code-resident; per-user override stays in DB.
src/claw/	No change — Clawern is out of scope; runs locally as today.
viz/src/cowork/	No change — still uses File System Access API; local files never touch cloud.

Appendix B — services not chosen, and why

Considered	Why not
Azure SQL	Schema rewrite for T-SQL; Postgres is closer to SQLite ergonomically, has `jsonb`, and its `ON CONFLICT … excluded.*` upserts (already used in src/db/database.ts) port verbatim. Also rules itself out by lacking `pgvector`.
Azure AI Search	Would own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in `pgvector` — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors.
Cosmos DB for KB chunks	Chunks + vectors live in Postgres; Cosmos would add a second store.
AKS	Operationally heavy; Container Apps covers our needs.
Azure Functions	Cold start hurts WebSocket and orchestration loops. Container Apps with KEDA gives us the autoscale without the cold-start tax.
App Service for Containers	WebSocket sticky-only; less elegant scale.
Azure OpenAI (for agent inference)	Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. AOAI is used only for KB embeddings (§6), never agent reasoning.
API Management	Overkill for our REST surface. Front Door + Container Apps ingress suffices. Revisit if we expose a public partner API.
Azure Communication Services / SignalR	Web PubSub is the modern replacement; SignalR remains for legacy .NET SignalR-hub interop, which we don't need.
Azure Cache for Redis in v1	Worker-owned sessions avoid the need; revisit if we move to fully stateless workers.