How to move Lavern's dashboard, engagement engine, knowledge base, sessions, matters, auth, and audit data onto Azure. Clawern stays on the client — see Clawern for why that's a product call, not a limitation. Every code reference below links to the file in Altien/lavernDev on GitHub.
These are inputs, not findings — set during research planning so the design has somewhere firm to stand.
| Decision | Value | Rationale |
|---|---|---|
| Vectors | Postgres pgvector owns the vector index (in-DB) |
Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (tsvector BM25 + vector) in one database. |
| Compute | Azure Container Apps | Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings. |
| Identity | Entra ID (workforce / B2B) | Org-tenant aware. Conditional access, MFA from the tenant, no separate user store to operate. |
| Database | PostgreSQL Flexible Server (with pgvector) |
Schema-compatible target for the SQLite tables in src/db/database.ts; mature; jsonb; hosts the KB vector index in-DB via the allow-listed pgvector extension. |
| Async messaging | Service Bus + Event Grid | Service Bus for guaranteed-delivery work and gate-decision handoffs. Event Grid for blob-arrival fan-out. |
| Scope | Dashboard + KB; Clawern stays local | Clawern's whole identity is on-device — daemon, menu bar, Telegram, filesystem watch. Lifting it loses the privacy claim. |
Public traffic arrives at Azure Front Door — TLS,
WAF, global anycast — which fronts a Container Apps
environment running three revisions: lavern-api
(Fastify + WebSocket, today's
src/api/server.ts),
lavern-worker (orchestration loop), and
lavern-jobs (KEDA-scaled short tasks: KB ingest,
derivative rendering). All three sit on a private VNET with Private
Endpoints to PostgreSQL Flexible Server (KB vectors
via pgvector), Azure OpenAI (embeddings
only), Blob Storage,
Service Bus, Key Vault, and
Application Insights. Identity is
Entra ID for users; Managed Identity
end-to-end for everything else, so no service-to-service password
lives in an environment variable.
Container Apps wins over App Service for Containers (sticky-only WebSocket, no native KEDA) and AKS (overkill for a small fleet). The one thing Container Apps doesn't give us is fine-grained sidecar control; we don't need any.
Today, a single Fastify process serves HTTP + WebSocket, holds live
SessionState
in a Map, runs the orchestrator
(src/orchestrator.ts,
src/dispatch.ts),
dispatches agents through the Claude Agent SDK, and emits events on
the in-process
EventBus.
For Azure, we split into three revisions:
| Revision | Replicas | Scales on | Holds |
|---|---|---|---|
lavern-api | min 2 | HTTP / WS concurrency | REST routes, WebSocket fan-out, session router |
lavern-worker | min 1 | Service Bus queue depth | Active orchestration loops, live SessionState |
lavern-jobs | 0 .. N | KEDA (queue length) | KB ingest, derivative render, batch workflows |
@fastify/websocket
works in Container Apps without modification. Two options for
cross-pod fan-out: sticky routing (Front Door
affinity cookie, simple) or Azure Web PubSub
(browsers connect to PubSub directly, API publishes events;
stateless pods, clean rollouts). Recommendation: start with sticky,
migrate to Web PubSub when concurrent connections start fighting
the rollout story.
Every table in
src/db/database.ts
ports straight across with three categories of edit.
| SQLite table | Postgres treatment |
|---|---|
users | Direct; add citext on email; oid column to mirror Entra subject. |
auth_tokens | Mostly redundant under Entra. Keep only for service-to-service tokens. |
session_archive | Direct port. summary_json becomes jsonb + GIN index. |
matters | Direct port. data_json becomes jsonb. |
kb_collections, kb_documents | Direct port; kb_documents stays the document-of-record. |
kb_chunks | Kept. Add a content_tsv tsvector column (lexical/FTS) and an embedding vector(1536) column (pgvector). Chunks + vectors live in Postgres. |
shared_agents, shared_teams | Direct port; JSON payloads → jsonb. |
user_usage, billing_events, billable_hours, daily_spend | Direct port; money columns → numeric(12,4). |
audit_log | Direct port; partition by month at scale. |
TEXT today (data_json, summary_json, profile_json, metadata) promote to jsonb. Add GIN indexes where we actually query into them.TEXT; convert to timestamptz.tsvector column (GIN index); a pgvector column adds the semantic half. The KB search path stays in Postgres.better-sqlite3 is synchronous; the Postgres driver isn't. The DB layer needs an await-everywhere pass.
Mechanics: pgloader for the bulk move, then a TypeScript
script for JSON-column transforms. Connection pooling via PgBouncer
(Flexible Server's built-in endpoint). Auth to Postgres via
Managed Identity from Container Apps — no password
in any connection string.
The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that, with a defensible split.
src/agents/ — the 67 agent prompts. They are the product; they belong with the code that calls them.src/workflows/templates/ — the 9 workflow definitions. Same argument.SOUL.md — default firm soul. Per-user override already lives in the database; the default is a code asset.src/mcp/tools/ — MCP tool implementations.Why not Blob?
tests/ read prompts from disk. Blob-resident prompts mean mocking Blob in every test.| Category | Container | Lifecycle |
|---|---|---|
| Uploaded source documents | lavern-uploads | Soft-delete after archive; hard-delete after retention window. |
| Final deliverables | lavern-deliveries | Per-matter retention. |
| Audit bundles | lavern-audit | Per data class. Regulated → cold tier, 7y, immutability policy. |
| KB document originals | lavern-kb-originals | Until KB doc is deleted. Chunks + vectors live in Postgres; original stays in Blob. |
| Derivatives (HTML/DOCX/PDF) | lavern-derivatives | Soft-delete after 30 days; users regenerate on demand. |
| Log archives | lavern-logs | Cool tier after 30d; archive tier after 180d. |
| Avatars, share OG images | lavern-public | Public read. The only public container. |
| File | Today | Azure |
|---|---|---|
src/utils/logger.ts | Rotating log files | App Insights (live); Blob lavern-logs/ (archive) |
src/utils/audit-persistence.ts | JSONL audit log | Blob lavern-audit/ |
src/workflows/executor.ts | Intermediate artefacts | Blob lavern-deliveries/<sessionId>/intermediate/ |
src/mcp/tools/baselines.ts | Baseline JSON | Blob lavern-baselines/ |
src/mcp/tools/report-card.ts | Final report | Blob lavern-deliveries/ |
src/mcp/tools/legal-md-compiler.ts | Compiled markdown | Blob lavern-deliveries/ |
src/providers/mistral-executor.ts | Provider artefacts | Blob lavern-deliveries/ |
Front-end uploads use a user-delegation SAS issued by the API after authorisation — the file doesn't traverse the API process for the upload itself, only for orchestration.
pgvector
Decision (supersedes the original v1 draft of this section):
vectors live in Postgres via pgvector, not Azure
AI Search. This keeps KB retrieval in one store with the relational data,
removes the AI Search service (and its ~$75/mo floor), is genuine
feature-parity with today's lexical KB, and adds semantic recall on top.
pgvector is an allow-listed extension on Flexible Server
(azure.extensions = VECTOR).
The pipeline keeps its shape:
src/knowledge-base/indexer.ts
parses and section-chunks;
src/knowledge-base/retriever.ts
runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector
column; a new pgvector column adds the semantic half. No separate
search service.
kb_chunks schema (Postgres)| Column | Type | Notes |
|---|---|---|
id | text (PK) | <documentId>_<chunkIndex> |
document_id | text (FK → kb_documents) | — |
collection_id, user_id | text (indexed) | Security-trim every query on user_id |
heading, content | text | Section heading from structure detector + chunk text |
content_tsv | tsvector (GIN index) | Lexical half — generated from heading || ' ' || content |
embedding | vector(1536) (HNSW index) | Semantic half — Azure OpenAI text-embedding-3-small |
doc_type, jurisdiction | text (indexed) | From kb_documents |
text-embedding-3 (decided)
This is the one place Azure OpenAI enters the architecture;
agent reasoning stays on Anthropic. text-embedding-3-small
(1,536-dim) is the default cost/quality pick; text-embedding-3-large
(up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint
sits behind a Private Endpoint; authenticate via Managed Identity (preferred)
or a Key Vault–held key.
Push from a worker. Event Grid fires on Blob
upload → Service Bus → lavern-jobs KEDA-scales → our
parser + section chunker (existing code) runs → Azure OpenAI embeds each
chunk → rows are inserted into Postgres (content_tsv DB-generated,
embedding carrying the vector). We keep our legal-aware chunking,
which materially affects citation quality.
WITH lex AS (
SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
FROM kb_chunks
WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
ORDER BY s DESC LIMIT 40
),
vec AS (
SELECT id, 1 - (embedding <=> $3::vector) AS s
FROM kb_chunks
WHERE user_id = $2
ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k
user_id is filtered in every branch — the
user-scoping guarantee in
retriever.ts
is preserved. Legal-synonym expansion stays as lexical pre-processing; the
n-gram re-rank is superseded by the vector half. pgvector with an
HNSW index serves into the millions of vectors before tuning pressure — if the
corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1
requirement.
The
in-process EventBus
stays exactly as it is inside a single pod. We add an
outbound bridge for events that need to cross pods. Most don't —
verbose per-step events (tool_used) stay in-pod and
surface as App Insights metrics. Cross-pod handoffs go through
Service Bus.
| Event class | Channel | Why |
|---|---|---|
| session lifecycle, findings, debate | Service Bus topic lavern-events | WS fan-out + audit consume separately |
| gate request/response | Service Bus queue lavern-gates | Worker blocked; needs guaranteed delivery |
| blob upload (KB doc, source doc) | Event Grid → Service Bus → KEDA job | Triggers ingest / derivative jobs |
| tool_used, verification_run | App Insights metric only | Worker-local, high-volume, observability concern |
Live sessions live in an in-process Map today
(session-manager.ts:
TTL eviction, 100-session cap, hydrated on demand). When we split
api / worker / jobs, sessions need a home that isn't "the api pod
that happened to handle the first request."
| Option | Pro | Con |
|---|---|---|
| A. Sticky to api pod | Zero code change | No per-session horizontal scale; api can't recycle until session expires |
| B. Worker owns the session; api stateless Recommended | Clean target architecture, api scales freely | Gate-decision becomes async (we mostly model this already via src/gates/) |
| C. Externalise to Redis | Maximum elasticity | Serialisation cost; SessionState wasn't designed for it |
Recommendation: option B. Worker holds live state;
api routes WebSocket subscriptions to Service Bus topics filtered by
sessionId. Archive on completion goes to Postgres,
same as today.
The current
auth.ts
middleware handles cookie + Bearer and runs a no-op
LOCAL-MODE. The auth-shaped routes in
auth-routes.ts
and
google-auth.ts
only register when LAVERN_AUTH_ENABLED=true.
@azure/identity + a Fastify auth plugin.users stays as the app-side mirror keyed by Entra oid. Populated on first sign-in. Needed because matters, billing_events, kb_collections, et al. all FK to it.api_clients table) keep their existing key model; rotated via Key Vault references.| Secret | Today | Azure |
|---|---|---|
| Anthropic API key | env | Key Vault → Container Apps secret reference |
| Mistral API key | env | Key Vault |
| Stripe secret + webhook secret | env | Key Vault |
| SMTP credentials | env | Key Vault |
| Database connection | env | Managed Identity (no string at all) |
| Storage account keys | n/a | Managed Identity |
| Azure OpenAI embeddings key | n/a (new) | Managed Identity (or Key Vault key if MI unavailable) |
LAVERN_MANAGED_AGENTS_BRIDGE_SECRET | env | Key Vault |
src/utils/logger.ts
becomes a thin App Insights adapter — its existing shape stays.
OpenTelemetry on Fastify + http client + SDK calls feeds App
Insights traces. Product analytics (session_started,
workflow_completed, verification_failed) ride off the EventBus as
custom events. Log Analytics provides KQL behind App Insights.
Sentry can stay as the crash channel if richer grouping matters.
Cost note: App Insights ingestion is the most variable cost in this architecture. Set sampling at 50% for verbose, 100% for warnings and errors. Off in dev.
api.anthropic.com, api.mistral.ai, api.stripe.com, accounts.google.com (if Google OAuth is retained). Everything else blocked.
Stays on Anthropic direct. The dual-provider abstraction in
src/providers/
keeps Mistral as the EU-sovereign option. Adding Azure OpenAI in v1
is plumbing without product benefit. The abstraction already
supports adding AOAI without disturbing other paths if a future
customer needs it.
pgvector via azure.extensions), Azure OpenAI (embeddings deployment), Blob, Service Bus, Key Vault, Front Door, App Insights, networking.az containerapp update --revision-suffix <sha> for blue-green.dev (small SKUs of everything), staging (mirror), prod.lavern-migrate job that runs before the api swap.Single region, modest load (~50 active firms, ~5,000 documents indexed):
| Service | SKU | Est. monthly |
|---|---|---|
| Container Apps (api + worker + jobs) | 2 always-on small + bursts | $200–400 |
Postgres Flexible Server (+ pgvector) | D2ds_v5 (2 vCPU / 8 GB) — bumped for vector index | $150 |
| Azure OpenAI embeddings | text-embedding-3-small, usage-based | $5–20 |
| Blob Storage | Hot ~500 GB + ops | $30 |
| Service Bus | Standard | $10 |
| Front Door Premium | Base + traffic | $330+ |
| Key Vault | Standard | <$5 |
| App Insights | 5 GB/day cap | $80 |
| Total | ~$810–1,075 |
Keeping vectors in pgvector removes the Azure AI Search line
(was ~$250/mo S1) for a modest Postgres SKU bump — a net saving of roughly
$200–230/mo. Agent LLM spend lives separately and passes through; embeddings
are the only Azure-side model cost and are tiny. Cost levers: Front Door →
Standard saves ~$200; audit and log archive in cool/archive tiers; right-size
Postgres only on actual pressure (the vector index is the main memory driver).
src/db/ — SQLite and Postgres adapters behind the same shape.@azure/storage-blob.tsvector + pgvector hybrid query.pgvector) live with sample data.userId everywhere, or per-firm subscription for isolation-sensitive customers? Single-tenant unless a firm specifically asks otherwise.users.viz/src/cowork/ keeps working unchanged — local files never touch the server. Privacy promise survives.| Service | Why not |
|---|---|
| Azure SQL | T-SQL rewrite; Postgres is closer to SQLite ergonomically, has jsonb, and its ON CONFLICT … excluded.* upserts (already used in src/db/database.ts) port verbatim. Also lacks pgvector. |
| Azure AI Search | Would own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in pgvector — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors. |
| Cosmos DB for KB chunks | Chunks + vectors live in Postgres; Cosmos would be a parallel store. |
| AKS | Operationally heavy; Container Apps suffices. |
| Azure Functions | Cold start hurts orchestration loops and WebSockets. |
| App Service for Containers | WebSocket sticky-only, no native KEDA. |
| Azure OpenAI (for agent inference) | Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. Used only for KB embeddings, never agent reasoning. |
| API Management | Overkill for our REST surface; Front Door + Container Apps ingress suffices. |
| Azure Cache for Redis (v1) | Worker-owned sessions avoid the need. |
| File | Change |
|---|---|
src/db/database.ts | Becomes the SQLite adapter; add Postgres adapter behind interface. |
src/knowledge-base/indexer.ts | Adds an Azure OpenAI embedding call per chunk; writes content_tsv + embedding to the Postgres kb_chunks table. |
src/knowledge-base/retriever.ts | FTS5 MATCH becomes a Postgres tsvector + pgvector hybrid (RRF) query; synonym expansion and user-scoping retained. |
src/api/middleware/auth.ts | Replaced by Entra ID JWT validation. |
src/api/server.ts | Handler shape unchanged; deploy target changes. |
src/session/session-manager.ts | Moves to lavern-worker; api becomes stateless. |
src/events/event-bus.ts | In-pod EventEmitter retained; outbound bridge to Service Bus added. |
src/gates/ | Async resolver becomes default; sync resolver retained for CLI. |
src/utils/logger.ts | Becomes App Insights adapter. |
src/utils/audit-persistence.ts | Writes to Blob lavern-audit/. |
src/workflows/executor.ts | Intermediate artefacts to Blob; control flow unchanged. |
src/api/routes/knowledge-base.ts | Upload returns a SAS URL; indexing via Event Grid → job. |
src/agents/ | No change. Prompts stay code-resident. |
src/workflows/templates/ | No change. Templates stay code-resident. |
SOUL.md | No change. Default firm soul stays code-resident. |
src/claw/ | No change. Clawern stays local — see Clawern. |
viz/src/cowork/ | No change. Local-only FS Access API path survives. |