Deeper · Migration research

Lavern → Azure

How to move Lavern's dashboard, engagement engine, knowledge base, sessions, matters, auth, and audit data onto Azure. Clawern stays on the client — see Clawern for why that's a product call, not a limitation. Every code reference below links to the file in Altien/lavernDev on GitHub.

Constraints fixed up-front

These are inputs, not findings — set during research planning so the design has somewhere firm to stand.

DecisionValueRationale
Vectors Postgres pgvector owns the vector index (in-DB) Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (tsvector BM25 + vector) in one database.
Compute Azure Container Apps Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings.
Identity Entra ID (workforce / B2B) Org-tenant aware. Conditional access, MFA from the tenant, no separate user store to operate.
Database PostgreSQL Flexible Server (with pgvector) Schema-compatible target for the SQLite tables in src/db/database.ts; mature; jsonb; hosts the KB vector index in-DB via the allow-listed pgvector extension.
Async messaging Service Bus + Event Grid Service Bus for guaranteed-delivery work and gate-decision handoffs. Event Grid for blob-arrival fan-out.
Scope Dashboard + KB; Clawern stays local Clawern's whole identity is on-device — daemon, menu bar, Telegram, filesystem watch. Lifting it loses the privacy claim.

Target architecture in one paragraph

Public traffic arrives at Azure Front Door — TLS, WAF, global anycast — which fronts a Container Apps environment running three revisions: lavern-api (Fastify + WebSocket, today's src/api/server.ts), lavern-worker (orchestration loop), and lavern-jobs (KEDA-scaled short tasks: KB ingest, derivative rendering). All three sit on a private VNET with Private Endpoints to PostgreSQL Flexible Server (KB vectors via pgvector), Azure OpenAI (embeddings only), Blob Storage, Service Bus, Key Vault, and Application Insights. Identity is Entra ID for users; Managed Identity end-to-end for everything else, so no service-to-service password lives in an environment variable.

Compute — Azure Container Apps

Container Apps wins over App Service for Containers (sticky-only WebSocket, no native KEDA) and AKS (overkill for a small fleet). The one thing Container Apps doesn't give us is fine-grained sidecar control; we don't need any.

How the current process splits

Today, a single Fastify process serves HTTP + WebSocket, holds live SessionState in a Map, runs the orchestrator (src/orchestrator.ts, src/dispatch.ts), dispatches agents through the Claude Agent SDK, and emits events on the in-process EventBus. For Azure, we split into three revisions:

RevisionReplicasScales onHolds
lavern-apimin 2HTTP / WS concurrencyREST routes, WebSocket fan-out, session router
lavern-workermin 1Service Bus queue depthActive orchestration loops, live SessionState
lavern-jobs0 .. NKEDA (queue length)KB ingest, derivative render, batch workflows

WebSocket

@fastify/websocket works in Container Apps without modification. Two options for cross-pod fan-out: sticky routing (Front Door affinity cookie, simple) or Azure Web PubSub (browsers connect to PubSub directly, API publishes events; stateless pods, clean rollouts). Recommendation: start with sticky, migrate to Web PubSub when concurrent connections start fighting the rollout story.

Relational data — Postgres Flexible Server

Every table in src/db/database.ts ports straight across with three categories of edit.

SQLite tablePostgres treatment
usersDirect; add citext on email; oid column to mirror Entra subject.
auth_tokensMostly redundant under Entra. Keep only for service-to-service tokens.
session_archiveDirect port. summary_json becomes jsonb + GIN index.
mattersDirect port. data_json becomes jsonb.
kb_collections, kb_documentsDirect port; kb_documents stays the document-of-record.
kb_chunksKept. Add a content_tsv tsvector column (lexical/FTS) and an embedding vector(1536) column (pgvector). Chunks + vectors live in Postgres.
shared_agents, shared_teamsDirect port; JSON payloads → jsonb.
user_usage, billing_events, billable_hours, daily_spendDirect port; money columns → numeric(12,4).
audit_logDirect port; partition by month at scale.

SQLite-isms to watch

Mechanics: pgloader for the bulk move, then a TypeScript script for JSON-column transforms. Connection pooling via PgBouncer (Flexible Server's built-in endpoint). Auth to Postgres via Managed Identity from Container Apps — no password in any connection string.

Blob Storage — the prompts/workflows question, answered

The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that, with a defensible split.

Stays code-resident — versioned in git, shipped with the image

Why not Blob?

  1. Change control. Modifying an agent prompt today requires a PR. Move them to Blob and anyone with write access can change agent behaviour silently, in production, with no trace in source control.
  2. Atomic deploys. Code + prompts ship together. If a prompt change breaks a workflow, you roll back one artifact, not two.
  3. Local development. Devs can run the engine on their machine without provisioning Blob or syncing seed data.
  4. Tests. 1,677 tests in tests/ read prompts from disk. Blob-resident prompts mean mocking Blob in every test.

Goes to Blob — user-generated, runtime-mutable

CategoryContainerLifecycle
Uploaded source documentslavern-uploadsSoft-delete after archive; hard-delete after retention window.
Final deliverableslavern-deliveriesPer-matter retention.
Audit bundleslavern-auditPer data class. Regulated → cold tier, 7y, immutability policy.
KB document originalslavern-kb-originalsUntil KB doc is deleted. Chunks + vectors live in Postgres; original stays in Blob.
Derivatives (HTML/DOCX/PDF)lavern-derivativesSoft-delete after 30 days; users regenerate on demand.
Log archiveslavern-logsCool tier after 30d; archive tier after 180d.
Avatars, share OG imageslavern-publicPublic read. The only public container.

What writes to disk today, and where it ends up

FileTodayAzure
src/utils/logger.tsRotating log filesApp Insights (live); Blob lavern-logs/ (archive)
src/utils/audit-persistence.tsJSONL audit logBlob lavern-audit/
src/workflows/executor.tsIntermediate artefactsBlob lavern-deliveries/<sessionId>/intermediate/
src/mcp/tools/baselines.tsBaseline JSONBlob lavern-baselines/
src/mcp/tools/report-card.tsFinal reportBlob lavern-deliveries/
src/mcp/tools/legal-md-compiler.tsCompiled markdownBlob lavern-deliveries/
src/providers/mistral-executor.tsProvider artefactsBlob lavern-deliveries/

Front-end uploads use a user-delegation SAS issued by the API after authorisation — the file doesn't traverse the API process for the upload itself, only for orchestration.

Vector + search — Postgres pgvector

Decision (supersedes the original v1 draft of this section): vectors live in Postgres via pgvector, not Azure AI Search. This keeps KB retrieval in one store with the relational data, removes the AI Search service (and its ~$75/mo floor), is genuine feature-parity with today's lexical KB, and adds semantic recall on top. pgvector is an allow-listed extension on Flexible Server (azure.extensions = VECTOR).

The pipeline keeps its shape: src/knowledge-base/indexer.ts parses and section-chunks; src/knowledge-base/retriever.ts runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector column; a new pgvector column adds the semantic half. No separate search service.

kb_chunks schema (Postgres)

ColumnTypeNotes
idtext (PK)<documentId>_<chunkIndex>
document_idtext (FK → kb_documents)
collection_id, user_idtext (indexed)Security-trim every query on user_id
heading, contenttextSection heading from structure detector + chunk text
content_tsvtsvector (GIN index)Lexical half — generated from heading || ' ' || content
embeddingvector(1536) (HNSW index)Semantic half — Azure OpenAI text-embedding-3-small
doc_type, jurisdictiontext (indexed)From kb_documents

Embedding model — Azure OpenAI text-embedding-3 (decided)

This is the one place Azure OpenAI enters the architecture; agent reasoning stays on Anthropic. text-embedding-3-small (1,536-dim) is the default cost/quality pick; text-embedding-3-large (up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint sits behind a Private Endpoint; authenticate via Managed Identity (preferred) or a Key Vault–held key.

Indexing pipeline

Push from a worker. Event Grid fires on Blob upload → Service Bus → lavern-jobs KEDA-scales → our parser + section chunker (existing code) runs → Azure OpenAI embeds each chunk → rows are inserted into Postgres (content_tsv DB-generated, embedding carrying the vector). We keep our legal-aware chunking, which materially affects citation quality.

Hybrid query in one statement

WITH lex AS (
  SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
  FROM kb_chunks
  WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
  ORDER BY s DESC LIMIT 40
),
vec AS (
  SELECT id, 1 - (embedding <=> $3::vector) AS s
  FROM kb_chunks
  WHERE user_id = $2
  ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k

user_id is filtered in every branch — the user-scoping guarantee in retriever.ts is preserved. Legal-synonym expansion stays as lexical pre-processing; the n-gram re-rank is superseded by the vector half. pgvector with an HNSW index serves into the millions of vectors before tuning pressure — if the corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1 requirement.

Async messaging — Service Bus + Event Grid

The in-process EventBus stays exactly as it is inside a single pod. We add an outbound bridge for events that need to cross pods. Most don't — verbose per-step events (tool_used) stay in-pod and surface as App Insights metrics. Cross-pod handoffs go through Service Bus.

Event classChannelWhy
session lifecycle, findings, debateService Bus topic lavern-eventsWS fan-out + audit consume separately
gate request/responseService Bus queue lavern-gatesWorker blocked; needs guaranteed delivery
blob upload (KB doc, source doc)Event Grid → Service Bus → KEDA jobTriggers ingest / derivative jobs
tool_used, verification_runApp Insights metric onlyWorker-local, high-volume, observability concern

Sessions — the hardest design call

Live sessions live in an in-process Map today (session-manager.ts: TTL eviction, 100-session cap, hydrated on demand). When we split api / worker / jobs, sessions need a home that isn't "the api pod that happened to handle the first request."

OptionProCon
A. Sticky to api podZero code changeNo per-session horizontal scale; api can't recycle until session expires
B. Worker owns the session; api stateless RecommendedClean target architecture, api scales freelyGate-decision becomes async (we mostly model this already via src/gates/)
C. Externalise to RedisMaximum elasticitySerialisation cost; SessionState wasn't designed for it

Recommendation: option B. Worker holds live state; api routes WebSocket subscriptions to Service Bus topics filtered by sessionId. Archive on completion goes to Postgres, same as today.

Identity — Entra ID workforce / B2B

The current auth.ts middleware handles cookie + Bearer and runs a no-op LOCAL-MODE. The auth-shaped routes in auth-routes.ts and google-auth.ts only register when LAVERN_AUTH_ENABLED=true.

Secrets — Key Vault

SecretTodayAzure
Anthropic API keyenvKey Vault → Container Apps secret reference
Mistral API keyenvKey Vault
Stripe secret + webhook secretenvKey Vault
SMTP credentialsenvKey Vault
Database connectionenvManaged Identity (no string at all)
Storage account keysn/aManaged Identity
Azure OpenAI embeddings keyn/a (new)Managed Identity (or Key Vault key if MI unavailable)
LAVERN_MANAGED_AGENTS_BRIDGE_SECRETenvKey Vault

Observability — App Insights + Log Analytics

src/utils/logger.ts becomes a thin App Insights adapter — its existing shape stays. OpenTelemetry on Fastify + http client + SDK calls feeds App Insights traces. Product analytics (session_started, workflow_completed, verification_failed) ride off the EventBus as custom events. Log Analytics provides KQL behind App Insights. Sentry can stay as the crash channel if richer grouping matters.

Cost note: App Insights ingestion is the most variable cost in this architecture. Set sampling at 50% for verbose, 100% for warnings and errors. Off in dev.

Networking — Front Door + Private Endpoints

LLM provider plane

Stays on Anthropic direct. The dual-provider abstraction in src/providers/ keeps Mistral as the EU-sovereign option. Adding Azure OpenAI in v1 is plumbing without product benefit. The abstraction already supports adding AOAI without disturbing other paths if a future customer needs it.

CI/CD & infrastructure-as-code

Cost shape — rough monthly OOM

Single region, modest load (~50 active firms, ~5,000 documents indexed):

ServiceSKUEst. monthly
Container Apps (api + worker + jobs)2 always-on small + bursts$200–400
Postgres Flexible Server (+ pgvector)D2ds_v5 (2 vCPU / 8 GB) — bumped for vector index$150
Azure OpenAI embeddingstext-embedding-3-small, usage-based$5–20
Blob StorageHot ~500 GB + ops$30
Service BusStandard$10
Front Door PremiumBase + traffic$330+
Key VaultStandard<$5
App Insights5 GB/day cap$80
Total~$810–1,075

Keeping vectors in pgvector removes the Azure AI Search line (was ~$250/mo S1) for a modest Postgres SKU bump — a net saving of roughly $200–230/mo. Agent LLM spend lives separately and passes through; embeddings are the only Azure-side model cost and are tiny. Cost levers: Front Door → Standard saves ~$200; audit and log archive in cool/archive tiers; right-size Postgres only on actual pressure (the vector index is the main memory driver).

Migration plan — three phases

Phase 0 — Preparation · 1–2 weeks

Phase 1 — Azure up, dual-write · 2–3 weeks

Phase 2 — Cutover · 1 week

Phase 3 — Decommission · 1 week, 30+ days later

Open questions

  1. Multi-tenancy: single subscription + userId everywhere, or per-firm subscription for isolation-sensitive customers? Single-tenant unless a firm specifically asks otherwise.
  2. Data residency: per-tenant region (West Europe vs UK South vs East US) requires routing logic in Front Door and a region attribute on users.
  3. DR: Postgres Flexible Server has built-in PITR (7–35d), now covering the KB vectors too (one store). Do we want geo-redundant Blob and a DB read replica for prod? Likely yes.
  4. Audit retention: Blob immutability policies (legal hold, 7y) — switch on if firms ask.
  5. Web PubSub timing: sticky routing now; PubSub when concurrent WebSockets force the issue.
  6. Cowork interop: viz/src/cowork/ keeps working unchanged — local files never touch the server. Privacy promise survives.
  7. Clawern → cloud KB: should the local precedent board optionally sync to the cloud KB for shared firm precedents? Product call, not architecture.

Services considered but not chosen

ServiceWhy not
Azure SQLT-SQL rewrite; Postgres is closer to SQLite ergonomically, has jsonb, and its ON CONFLICT … excluded.* upserts (already used in src/db/database.ts) port verbatim. Also lacks pgvector.
Azure AI SearchWould own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in pgvector — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors.
Cosmos DB for KB chunksChunks + vectors live in Postgres; Cosmos would be a parallel store.
AKSOperationally heavy; Container Apps suffices.
Azure FunctionsCold start hurts orchestration loops and WebSockets.
App Service for ContainersWebSocket sticky-only, no native KEDA.
Azure OpenAI (for agent inference)Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. Used only for KB embeddings, never agent reasoning.
API ManagementOverkill for our REST surface; Front Door + Container Apps ingress suffices.
Azure Cache for Redis (v1)Worker-owned sessions avoid the need.

File-by-file change summary

FileChange
src/db/database.tsBecomes the SQLite adapter; add Postgres adapter behind interface.
src/knowledge-base/indexer.tsAdds an Azure OpenAI embedding call per chunk; writes content_tsv + embedding to the Postgres kb_chunks table.
src/knowledge-base/retriever.tsFTS5 MATCH becomes a Postgres tsvector + pgvector hybrid (RRF) query; synonym expansion and user-scoping retained.
src/api/middleware/auth.tsReplaced by Entra ID JWT validation.
src/api/server.tsHandler shape unchanged; deploy target changes.
src/session/session-manager.tsMoves to lavern-worker; api becomes stateless.
src/events/event-bus.tsIn-pod EventEmitter retained; outbound bridge to Service Bus added.
src/gates/Async resolver becomes default; sync resolver retained for CLI.
src/utils/logger.tsBecomes App Insights adapter.
src/utils/audit-persistence.tsWrites to Blob lavern-audit/.
src/workflows/executor.tsIntermediate artefacts to Blob; control flow unchanged.
src/api/routes/knowledge-base.tsUpload returns a SAS URL; indexing via Event Grid → job.
src/agents/No change. Prompts stay code-resident.
src/workflows/templates/No change. Templates stay code-resident.
SOUL.mdNo change. Default firm soul stays code-resident.
src/claw/No change. Clawern stays local — see Clawern.
viz/src/cowork/No change. Local-only FS Access API path survives.