Integrating & operating
The system's edges: providers and tools, the API surface, the remote bridge, datasets, and the two live migration roadmaps.
Lavern → Azure migration research
Scope. Lavern dashboard, API, knowledge base, sessions, matters, auth, billing, and audit data move to Azure. Clawern stays on the client — it remains a local Node.js daemon on the user's machine, optionally pointing at the cloud KB for shared firm precedents. This document is the research artifact behind docs/explore/azure-migration.html.
Repository: github.com/Altien/lavernDev · Main branch:
main
1. Constraints fixed up-front
These are inputs to the design, not findings.
| Decision | Value | Rationale |
|---|---|---|
| Vectors | Postgres pgvector owns the vector index (in-DB) |
Keeps KB retrieval in the same store as the relational data — no separate search service to operate or pay for. Hybrid (tsvector BM25 + vector) in one database. |
| Compute | Azure Container Apps for API + workers | Container-native, long-lived WebSocket support, KEDA scale, native Service Bus & Event Grid bindings. |
| Identity | Entra ID (workforce / B2B) | Org-tenant aware. Works with conditional access, no separate user store, MFA from the tenant. |
| Database | Azure Database for PostgreSQL Flexible Server (with pgvector) |
Schema-compatible target for the current SQLite tables, mature, and hosts the KB vector index in-DB via the allow-listed pgvector extension. |
| Async messaging | Azure Service Bus for guaranteed-delivery work, Event Grid for blob/event fan-out | Service Bus is the right primitive for cross-pod orchestration signals; Event Grid is right for "blob landed" triggers. |
| Region | Single primary region for v1; design for multi-region but defer | Cost and latency picked over DR until usage warrants it. |
| LLM provider | Anthropic direct (unchanged); Mistral path preserved | Chat/agent inference stays on Anthropic. Azure OpenAI is used only for KB embeddings (text-embedding-3), never for agent reasoning. |
2. Target architecture (one picture, in words)
Public traffic enters via Azure Front Door (TLS termination, WAF, global anycast). Front Door fronts a Container Apps environment running:
lavern-apirevision — Fastify HTTP + WebSocket server, the same one in src/api/server.ts.lavern-workerrevision — long-running orchestration worker that pulls Service Bus messages and runs workflows (today this happens in the API process; we split it).lavern-jobs(KEDA-scaled job) — short-lived bursts (KB ingest, derivative rendering, large workflow runs).
Behind them, all on the same VNET with Private Endpoints:
- PostgreSQL Flexible Server for relational data (src/db/database.ts).
- Postgres
pgvectorholds the KB vector index alongside the relational tables — the SQLite FTS5 + n-gram re-rank in src/knowledge-base/retriever.ts becomes a Postgres hybrid query (tsvectorBM25 +pgvectorcosine). - Azure Blob Storage for user-generated content (uploads, deliverables, audit bundles, derivatives, log archive).
- Azure Service Bus for cross-instance events and work scheduling.
- Azure Key Vault for any non-managed secrets (Anthropic API keys, Azure OpenAI embeddings key, Stripe webhook secrets, SMTP credentials).
- Application Insights + Log Analytics for logs, traces, metrics.
- Azure Cache for Redis (optional, v2) — only if we move session state off in-process.
Egress to the public internet (Anthropic, Mistral, Stripe, Google OAuth where used) goes through Container Apps' managed egress.
3. Compute — Azure Container Apps
Why Container Apps over the alternatives
| Option | Why it wins | Why it loses |
|---|---|---|
| Container Apps | Container-native, long-lived WebSocket OK, KEDA scale incl. scale-to-zero on non-WS revisions, Dapr if we ever want it, managed envoy ingress. Cheapest target for a Fastify image. | No fine-grained sidecar control, less mature than AKS. |
| App Service for Containers | Mature, simple. | Scale-to-zero awkward; per-instance sticky needed for WebSocket; KEDA not native. |
| AKS | Maximum control. | Operationally heavy; we don't need cluster-level features yet. |
How the current process splits
The current single Fastify process does several things:
- Serve HTTP + WebSocket (src/api/server.ts, src/api/ws-handler.ts)
- Hold live SessionState in a Map (src/session/session-manager.ts)
- Run the orchestration loop (src/orchestrator.ts, src/dispatch.ts)
- Dispatch agents (Claude Agent SDK calls)
- Emit events on the in-process EventBus (src/events/event-bus.ts)
- Archive completed sessions to SQLite
For Azure, we split into three revisions:
| Revision | Replicas | Scales on | Holds |
|---|---|---|---|
lavern-api |
min 2, max N | HTTP / WS concurrency | Session router + WebSocket connections + REST routes |
lavern-worker |
min 1, max M | Service Bus queue depth | Active session orchestration loops |
lavern-jobs |
0..K | KEDA (queue length) | One-shot tasks (KB ingest, large workflow batches, derivatives) |
This is the moment to make the orchestrator stateless across pods. See §6 (sessions).
WebSocket considerations
@fastify/websocket works inside Container Apps without modification. Two choices:
- Sticky routing + in-pod session state. Front Door affinity cookie, ARR-style. Simple, but limits horizontal scaling and disallows transparent rollouts mid-session.
- Connection broker (Azure Web PubSub). Browsers connect to Web PubSub directly; the API publishes events to a per-session topic. Pods are stateless re: clients; rollouts are clean. The current
ws-handler.tsbecomes an event publisher.
Recommendation: Start with sticky routing (option 1) on day 1 — keeps the migration small. Migrate to Azure Web PubSub when we hit horizontal scale or zero-downtime requirements that sticky can't satisfy.
4. Relational data — Postgres Flexible Server
What moves
Every table currently defined in src/db/database.ts:
| SQLite table | Notes for Postgres port |
|---|---|
users |
UUID PKs (already TEXT in SQLite); add citext for email. |
auth_tokens |
Becomes mostly redundant under Entra ID. Keep for service-to-service tokens only. |
session_archive |
Direct port. summary_json becomes jsonb. Add GIN index for query. |
matters |
Direct port. data_json becomes jsonb. |
api_clients |
Direct port. |
kb_collections, kb_documents |
Direct port. kb_documents stays the document-of-record. |
kb_chunks |
Keep. Add a content_tsv tsvector column (lexical/FTS) and an embedding vector(1536) column (pgvector). Chunks + their vectors live in Postgres. See §6. |
shared_agents, shared_teams |
Direct port. profile_json → jsonb. |
user_usage, billing_events, billable_hours, daily_spend |
Direct port. Money columns become numeric(12,4). |
audit_log |
Direct port. Consider partitioning by month at scale. |
waitlist |
Direct port. Probably retire when out of waitlist. |
user_tokens |
OAuth refresh tokens — keep, or replace entirely with Entra-issued tokens. |
SQLite-isms that need attention
TEXT NOT NULLeverywhere — convert totext NOT NULL. Postgres is strict about type; SQLite isn't.- JSON columns stored as
TEXTtoday (data_json,summary_json,profile_json,metadata) — promote tojsonband add GIN indexes where we actually query into them. - No date type in SQLite — current
created_at TEXTcolumns becometimestamptz. Migration must parse ISO strings. REFERENCES users(id)— keep, but SQLite is permissive about NULL FKs (anonymous sessions); Postgres needsON DELETE SET NULLwhere the relationship is genuinely optional.- FTS5 virtual table — gone. The KB search path leaves Postgres entirely (see §6).
- Concurrency —
better-sqlite3is synchronous; Postgres driver (pg/postgres-js) is async. The DB layer needs anawait-everywhere pass. This is the biggest mechanical change in the migration.
Migration mechanics
- Use
pgloaderfor the bulk move (handles SQLite quirks well), then run a TypeScript script for the JSON-column transforms (text→jsonb). - Run dual-write for a window if we need zero-downtime cutover; otherwise a short maintenance window is honest and far simpler.
- Connection pooling via PgBouncer (Flexible Server has a built-in pooler endpoint).
- Use Managed Identity from Container Apps to authenticate to Postgres — no password in the connection string.
Why not Hyperscale (Citus) or Azure SQL
- Hyperscale (Citus) is overkill until we shard. Flexible Server scales vertically well into the hundreds of GB and tens of thousands of TPS, which is well past where we will need a redesign for unrelated reasons.
- Azure SQL would mean rewriting the schema for T-SQL idioms (no
jsonb, different default semantics). Postgres is closer to SQLite in spirit.
5. Blob Storage — what becomes blob-native vs stays code-resident
This is the question that needs the most defending. The brief said "all file native elements in here, prompts, workflows etc" go to Blob. I want to push back on part of that.
Recommendation: split by content authority
Stays code-resident (versioned in git, shipped with the container image):
- src/agents/ — the 67 agent prompts. They are the product; they belong with the code that calls them.
- src/workflows/templates/ — the 9 workflow definitions. Same argument.
- SOUL.md — the default firm soul. Per-user override already comes from the database; the default is a code asset.
- src/mcp/tools/ — MCP tool implementations.
Why not Blob?
- Change control. Today, modifying an agent prompt requires a pull request, reviewed and merged. Moving them to Blob means anyone with Blob write can change agent behaviour silently, in production, without leaving a trace in source control.
- Atomic deploys. Code + prompts ship together. If a prompt change breaks a workflow, you roll back one artifact. With Blob-resident prompts, the prompt version and the code version drift.
- Local development. Devs can run the engine on their machine without provisioning Blob, importing seed data, or syncing with prod.
- Test fixtures. The existing test suite (1,677 tests,
tests/) reads prompts directly from disk. Moving them to Blob means mocking Blob in tests — pure overhead.
Goes to Blob (user-generated, runtime-mutable):
| Category | Container | Lifecycle |
|---|---|---|
| Uploaded source documents | lavern-uploads/<user>/<sessionId>/ |
Soft-delete after session archive; hard-delete after retention window. |
| Final deliverables | lavern-deliveries/<user>/<sessionId>/ |
Retained per matter retention policy. |
| Audit bundles | lavern-audit/<user>/<sessionId>/ |
Retain per the user's data class (regulated → cold tier, 7y). |
| KB document originals | lavern-kb-originals/<user>/<collectionId>/ |
Until KB doc is deleted. The chunks + vectors live in Postgres; the original file lives in Blob. |
| Derivatives (HTML/DOCX/PDF renders) | lavern-derivatives/<user>/<sessionId>/ |
Soft-delete after 30 days; users can regenerate. |
| Log archives | lavern-logs/ |
Cool tier after 30d; archive tier after 180d. |
| Custom agent avatars / share OG images | lavern-public/ |
Public read; the only public container. |
Files that currently write to disk and need to move
From a grep over src/ (writeFileSync / mkdirSync):
| File | What it writes today | Azure target |
|---|---|---|
| src/utils/logger.ts | Rotating log files with gzip | Application Insights (live), Blob lavern-logs/ (archive). |
| src/utils/audit-persistence.ts | Audit log JSONL | Blob lavern-audit/; query via Log Analytics if we want it searchable. |
| src/workflows/executor.ts | Intermediate artefacts | Blob lavern-deliveries/<sessionId>/intermediate/. |
| src/mcp/tools/baselines.ts | Baseline JSON | Blob lavern-baselines/, keyed by user. |
| src/mcp/tools/report-card.ts | Final report | Blob lavern-deliveries/. |
| src/mcp/tools/legal-md-compiler.ts | Compiled markdown | Blob lavern-deliveries/. |
| src/providers/mistral-executor.ts | Provider artefacts | Blob lavern-deliveries/. |
| src/providers/local-executor.ts | Local-pipeline artefacts | Blob lavern-deliveries/. |
| src/api/middleware/validation.ts | Captured invalid-request bodies | Application Insights (debug) only — no Blob. |
Access pattern
Container Apps uses Managed Identity with a Storage Blob Data Contributor role on the containers it needs. No connection strings, no shared keys.
Front-end direct upload via user-delegation SAS issued by the API after authorisation — the file never traverses the API process for the upload, only for orchestration.
6. Vector + search — Postgres pgvector
Decision (supersedes the original v1 draft of this section). Vectors live in Postgres via
pgvector, not Azure AI Search. This keeps KB retrieval in one store with the relational data, removes the AI Search service (and its ~$75/mo floor), is genuine feature-parity with today's lexical KB, and adds semantic recall on top.pgvectoris an allow-listed extension on Flexible Server (azure.extensions = VECTOR).
Where the KB lives
The pipeline keeps its shape: src/knowledge-base/indexer.ts parses and section-chunks; src/knowledge-base/retriever.ts runs retrieval. The SQLite FTS5 index becomes a Postgres tsvector column; a new pgvector column adds the semantic half. No separate search service.
kb_chunks schema (Postgres)
| Column | Type | Notes |
|---|---|---|
id |
text (PK) |
<documentId>_<chunkIndex>. |
document_id |
text (FK → kb_documents) |
|
collection_id |
text (indexed) |
Filter. |
user_id |
text (indexed) |
Security-trim every query. |
heading |
text |
Section heading. |
content |
text |
Full chunk text. |
content_tsv |
tsvector (GIN index) |
Lexical half — generated from heading || ' ' || content. |
embedding |
vector(1536) (HNSW index) |
Semantic half — Azure OpenAI text-embedding-3-small. |
doc_type, jurisdiction |
text (indexed) |
Filters. |
level, word_count |
int |
Embedding model — Azure OpenAI text-embedding-3 (decided)
This is the one place Azure OpenAI enters the architecture; agent reasoning stays on Anthropic (§13). text-embedding-3-small (1,536-dim) is the default cost/quality pick; text-embedding-3-large (up to 3,072-dim, reducible) if retrieval quality demands it. The AOAI endpoint sits behind a Private Endpoint; authenticate via Managed Identity (preferred) or a Key Vault–held key.
Indexing pipeline
Keep our section-aware chunker — the structure detector in src/documents/structure-detector.ts does real work for legal documents and a generic chunker would degrade citations. On blob upload, Event Grid → Service Bus → KEDA-scaled lavern-jobs run:
- Parse + section-chunk (existing code).
- Call Azure OpenAI embeddings for each chunk (batched).
INSERTchunk rows into Postgres;content_tsvis DB-generated,embeddingcarries the vector.
Hybrid query
One Postgres query fuses lexical + vector. Lexical rank via ts_rank_cd over content_tsv; semantic via embedding <=> $queryVec (cosine); combined with Reciprocal Rank Fusion:
WITH lex AS (
SELECT id, ts_rank_cd(content_tsv, plainto_tsquery($1)) AS s
FROM kb_chunks
WHERE user_id = $2 AND content_tsv @@ plainto_tsquery($1)
ORDER BY s DESC LIMIT 40
),
vec AS (
SELECT id, 1 - (embedding <=> $3::vector) AS s
FROM kb_chunks
WHERE user_id = $2
ORDER BY embedding <=> $3::vector LIMIT 40
)
-- RRF-fuse lex + vec on id, return top-k
SELECT id FROM lex
UNION
SELECT id FROM vec;
user_id is filtered in every branch — the user-scoping guarantee in retriever.ts is preserved. The existing legal-synonym expansion stays as pre-processing on the lexical side; the n-gram re-rank is superseded by the vector half.
Scale note
pgvector with an HNSW index serves into the millions of vectors before tuning pressure — well beyond a single-region Lavern KB. If the corpus ever outgrows that, Azure AI Search becomes a re-evaluation, not a v1 requirement.
7. Async messaging — Service Bus + Event Grid
Where Service Bus replaces the in-process EventBus
The current EventBus in src/events/event-bus.ts is a Node.js EventEmitter. Two consumer classes today:
- WebSocket fan-out — clients subscribed to
/api/sessions/:id/eventsreceive every event for that session. - In-process logger / audit / cost tracker.
The EventBus stays exactly as it is inside a single pod. We add an outbound bridge for events that need to cross pods.
| Event type | Crosses pods? | Goes to Service Bus? |
|---|---|---|
session_start, session_end |
Yes (worker → api) | Yes — topic lavern-sessions |
agent_start, agent_stop |
Maybe (if WS pod ≠ worker pod) | Yes — topic lavern-agent-activity |
finding_posted, challenge_posted, response_posted |
Yes | Yes |
debate_resolved |
Yes | Yes |
gate_requested, gate_decided |
Yes — gate decided in api, worker waiting | Yes — queue (point-to-point) lavern-gates |
verification_run |
No (worker-local) | No |
tool_used |
No | No (App Insights metric instead) |
Topology: Service Bus topic lavern-events with subscriptions per consumer (WebSocket fan-out, audit, cost). Service Bus queue lavern-gates for gate-decision request/reply, because the worker is genuinely blocked on it.
Where Event Grid fits
- Blob upload triggers: user uploads a KB document → Event Grid → Service Bus →
lavern-jobsKEDA-scales and runs the indexer. - Same for derivatives: user requests a DOCX render → enqueue → job runs → blob lands → Event Grid → API tells the client via WS.
- Stripe webhooks route through Event Grid too if we want guaranteed retry and dead-lettering.
Why both, not just one
Service Bus is the right answer for guaranteed-delivery work scheduling. Event Grid is the right answer for fan-out at high event rates with cheap-per-event pricing. They compose — Event Grid feeds Service Bus for our heavy patterns.
8. Sessions — the hardest design call
Live sessions are an in-process Map today in src/session/session-manager.ts. TTL eviction, 100-session cap per pod. When the engagement completes, state is archived to SQLite.
When we split api / worker / jobs, sessions need a home that isn't "the api pod that happened to handle the first request."
Three options, in increasing strangeness
Option A — Sticky to api pod (no shared state). Affinity cookie binds a client to one api pod for the life of the engagement. The worker that runs orchestration is that same pod. Simplest; we keep SessionState exactly as it is.
- Pro: zero code change.
- Con: doesn't horizontally scale per session; api revision can't recycle until session expires; loses one core benefit of splitting the worker out.
Option B — Worker owns the session; api is stateless.
Each engagement lives in a single lavern-worker replica. The api pod that has the WebSocket subscribes to a Service Bus topic filtered by sessionId. SessionState exists only in the worker.
- Pro: clean separation; api is fully stateless and scales freely.
- Con: gate-decision path is async — api receives the decision and posts to Service Bus, worker picks it up. We already model this with src/gates/gate-resolver.ts, so the lift is bounded.
Option C — Externalised SessionState (Redis).
Move the SessionState object into Azure Cache for Redis. Any pod can pick up any session.
- Pro: maximum elasticity.
- Con: serialisation cost on every read/write; the SessionState is currently a rich object graph not designed for this.
Recommendation: Option B — worker owns the session, api is stateless. This is the cleanest target architecture and the gate-decision asynchrony is already most of the way there.
Drop the in-process Map's role of "the live store"; replace it with a per-pod working set scoped to that worker's active sessions. Archive on completion goes to Postgres just as today.
9. Identity — Entra ID workforce / B2B
What changes in the codebase
Currently src/api/middleware/auth.ts handles cookie + Bearer auth and runs a no-op LOCAL-MODE. The auth-shaped routes (auth-routes.ts, google-auth.ts) only register when LAVERN_AUTH_ENABLED=true.
For Entra ID:
- Replace the cookie/Bearer middleware with MSAL.js on the frontend (auth-code flow with PKCE) and
passport-azure-ador@azure/identity+ JWT validation on the backend. - The
userstable sticks around as the application-side mirror of Entra users — keyed by Entraoid, populated on first sign-in. We still need it because Lavern'suser_usage,billing_events,matters,kb_collectionsall FK tousers.id. - Authorisation (what a user can do) stays application-side — Entra tells us who, Lavern decides what. App-roles in Entra optional; we can also keep our own roles table.
- Service-to-service authentication uses Managed Identity end-to-end — Container Apps to Postgres, to Blob, to Azure OpenAI (embeddings), to Service Bus, to Key Vault. Zero secrets in any environment variable.
Edge cases
- API clients (
api_clientstable) — these are non-human callers (webhooks, integrations). Keep the existing API-key model; rotate via Key Vault references. Don't try to put non-humans into Entra unless they belong there. - OAuth refresh tokens (
user_tokens) — if we still integrate Google Drive / Gmail / Calendar for the user, those stay on the existing OAuth flow; Entra is identity, not third-party authorisation. - Stripe customer mapping —
users.stripe_customer_idsurvives unchanged.
10. Secrets — Key Vault
Today's secret surface from a grep over process.env:
| Secret | Source today | Azure target |
|---|---|---|
| Anthropic API key | ANTHROPIC_API_KEY env |
Key Vault, referenced by Container Apps secret. |
| Mistral API key | MISTRAL_API_KEY env |
Key Vault. |
| Stripe secret key, webhook secret | env | Key Vault. |
| SMTP credentials | env | Key Vault. |
| Database connection | conn string in env | Managed Identity — no string at all. |
| Azure OpenAI embeddings key | n/a (new) | Managed Identity to AOAI — or a Key Vault–held key if MI isn't available. |
| Telegram bot token (Clawern) | env on user's machine | Stays on user's machine; not migrated. |
| Storage account keys | n/a | Managed Identity. |
LAVERN_MANAGED_AGENTS_BRIDGE_SECRET |
env | Key Vault. |
Container Apps references Key Vault secrets by reference, so rotations propagate to running revisions without redeploys.
11. Observability — Application Insights + Log Analytics
Today: src/utils/logger.ts writes structured JSON to stdout and a rotating local file. src/utils/sentry.ts captures errors to Sentry if configured.
Azure target:
- Application Insights SDK initialised at process start. The current
createLoggerbecomes a thin adapter that emits structured trace events to App Insights. - OpenTelemetry-based instrumentation for Fastify, http client, SDK calls. App Insights connector ingests.
- Custom events for product analytics:
session_started,workflow_completed,verification_failed— fed off the EventBus. - Log Analytics workspace behind App Insights for KQL queries and dashboards.
- Sentry stays as the error capture channel if we want richer crash grouping than App Insights provides; otherwise drop it.
Cost note: App Insights ingestion is the most variable cost in this whole architecture. Set ingestion sampling at 50% for verbose levels, 100% for warnings/errors. Disable in dev.
12. Networking — Front Door + Private Endpoints
- Azure Front Door Premium in front of Container Apps for TLS termination, WAF, global anycast, and managed certs on the custom domain.
- Private Endpoints for Postgres, Azure OpenAI, Blob, Key Vault. None of these are reachable from the public internet.
- VNET integration on Container Apps — outbound goes through the VNET; inbound is via the Container Apps ingress, which Front Door fronts.
- WebSocket through Front Door works without modification.
- Egress allow-list: api.anthropic.com, api.mistral.ai, api.stripe.com, accounts.google.com if we keep Google OAuth. Everything else blocked at the firewall.
13. LLM provider plane
Stays on Anthropic direct via @anthropic-ai/sdk. The dual-provider abstraction in src/providers/ keeps Mistral as the EU-sovereign option.
Azure OpenAI — embeddings only. AOAI is not added to the agent provider set; the model behind AOAI is OpenAI's, not Anthropic's, and swapping Claude for GPT would undercut the product. The single use of AOAI is generating KB chunk embeddings for pgvector (§6) — a non-reasoning, mechanical call. Agent inference stays on Anthropic; Mistral remains the EU-sovereign agent option. If a future customer requires AOAI for inference specifically, the provider abstraction is already in place to add it.
14. CI/CD & infrastructure-as-code
- Bicep (or Terraform if there's a house preference) for everything: Container Apps environment, registry, Postgres (with
pgvectorenabled viaazure.extensions), Azure OpenAI (embeddings deployment), Blob, Service Bus, Key Vault, Front Door, App Insights, Log Analytics, networking. - GitHub Actions building the container, pushing to Azure Container Registry, deploying via
az containerapp update --revision-suffix <sha>for blue-green. - Per-environment:
dev(single small SKU of everything),staging(mirror prod),prod. - Database migrations: a separate
lavern-migratejob that runs Postgres migrations viadrizzle-kitornode-pg-migratebefore the api revision swaps. - Secret rotation: GitHub Actions OIDC federation to Azure — no service principal passwords in repo.
15. Cost shape (rough OOM)
Per month, single region, modest load (say, ~50 active firms, ~5,000 documents indexed):
| Service | SKU | Est. monthly |
|---|---|---|
| Container Apps (api+worker+jobs) | 2 always-on small + bursts | $200–400 |
Postgres Flexible Server (+ pgvector) |
D2ds_v5 (2vCPU/8GB) — bumped for vector index | $150 |
| Azure OpenAI embeddings | text-embedding-3-small, usage-based |
$5–20 |
| Blob Storage | Hot tier ~500GB + ops | $30 |
| Service Bus | Standard tier | $10 |
| Front Door Premium | Base + per-request | $330 + traffic |
| Key Vault | Standard | <$5 |
| App Insights | 5GB/day cap | $80 |
| Total | ~$810–1,075 |
Keeping vectors in pgvector removes the Azure AI Search line (was ~$250/mo S1) at the cost of a modest Postgres SKU bump for the HNSW index — a net saving of roughly $200–230/mo. Agent LLM spend lives separately (passes through to Anthropic / Mistral usage costs); embeddings are the only Azure-side model cost and are tiny.
Cost levers:
- Drop Front Door to Standard if WAF is overkill (-$200).
- Move audit + log archive to cool/archive tiers (-$10–20).
- Right-size Postgres up only when we see actual pressure; the vector index is the main memory driver.
16. Migration plan — three phases
Phase 0 — preparation (1–2 weeks)
- Add an explicit persistence interface to src/db/ so the engine doesn't talk to
better-sqlite3directly. SQLite and Postgres adapters behind the same shape. - Add a blob abstraction for file IO: an interface that today wraps the local FS, tomorrow wraps
@azure/storage-blob. - Add a search abstraction wrapping the FTS5 layer; tomorrow wraps the Postgres
tsvector+pgvectorhybrid query. - These three abstractions land in the existing repo, behind feature flags. No Azure infra needed yet.
Phase 1 — Azure infra up, dual-write (2–3 weeks)
- Bicep stack deployed to a dev subscription.
- Container Apps running the api image; Postgres (with
pgvector) live but with sample data only. - Migration scripts to backfill from SQLite to Postgres in dev, including a one-time pass to generate embeddings for existing KB chunks via Azure OpenAI.
- Dual-write enabled for new sessions in a canary tenant.
- Validate WebSocket through Front Door, gate-decision through Service Bus, KB query through the Postgres hybrid search.
Phase 2 — cutover (1 week)
- Production data migration window.
- DNS swing to Front Door.
- Sentry alerting, App Insights dashboards live.
- SQLite kept read-only for 30 days as rollback insurance.
Phase 3 — decommission (1 week, 30+ days later)
- Remove SQLite path from the code.
- Retire local file write paths (audit, logger, baselines, deliveries) in favour of Blob.
- Lock SQLite snapshot to an archive.
17. Open questions / decisions still to make
- Multi-tenancy model. Single Azure tenancy (one subscription, one resource group,
userIdeverywhere) or per-firm subscription? Single-tenancy is much simpler and is the default unless a firm specifically asks for isolation. - Data residency. Do we need to pin some firms to West Europe vs UK South vs East US? If yes, we need a per-tenant region attribute in the user record and routing logic in Front Door (Routes by header / cookie).
- Backup / DR. Postgres point-in-time restore on Flexible Server is built-in (7–35 days) and now covers the KB vectors too (one store). Do we want geo-redundant storage for Blob and a read replica for the DB? Probably yes for production.
- Audit-bundle retention. Some firms will want 7y retention with legal-hold immutability. Blob immutability policies cover this if we decide it's a hard product requirement.
- Web PubSub now or later. Sticky routing buys us months. Past a few thousand concurrent WebSocket sessions, the math flips.
- Cowork mode interop. The browser File System Access API path (viz/src/cowork/) keeps working unchanged — local files never touch the server. Still worth confirming the privacy promise survives the cloud-hosted positioning.
- Clawern → cloud KB. Should Clawern's
precedent-board.tsoptionally sync to the cloud KB for shared firm precedents, or stay strictly local? This is a product decision, not an architecture one.
Appendix A — file-by-file change summary
| File | Change |
|---|---|
| src/db/database.ts | Becomes the SQLite adapter; add Postgres adapter behind interface. |
| src/knowledge-base/indexer.ts | Adds an Azure OpenAI embedding call per chunk; writes content_tsv + embedding to the Postgres kb_chunks table. |
| src/knowledge-base/retriever.ts | FTS5 MATCH becomes a Postgres tsvector + pgvector hybrid (RRF) query; synonym expansion and user-scoping retained. |
| src/api/middleware/auth.ts | Replaced by Entra ID JWT validation. |
| src/api/server.ts | No change to handler shape; deploy target changes. |
| src/session/session-manager.ts | Moves to lavern-worker revision; api becomes stateless. |
| src/events/event-bus.ts | In-process EventEmitter kept; outbound bridge to Service Bus added. |
| src/gates/ | Async resolver becomes default; sync resolver retained for CLI. |
| src/utils/logger.ts | Becomes App Insights adapter. |
| src/utils/audit-persistence.ts | Writes to Blob lavern-audit/. |
| src/workflows/executor.ts | Intermediate artefacts to Blob; control flow unchanged. |
| src/api/routes/knowledge-base.ts | Upload returns a SAS URL; chunking + embedding + Postgres insert happen via Event Grid → job. |
| src/agents/ | No change — prompts stay code-resident. |
| src/workflows/templates/ | No change — templates stay code-resident. |
| SOUL.md | No change — default firm soul stays code-resident; per-user override stays in DB. |
| src/claw/ | No change — Clawern is out of scope; runs locally as today. |
| viz/src/cowork/ | No change — still uses File System Access API; local files never touch cloud. |
Appendix B — services not chosen, and why
| Considered | Why not |
|---|---|
| Azure SQL | Schema rewrite for T-SQL; Postgres is closer to SQLite ergonomically, has jsonb, and its ON CONFLICT … excluded.* upserts (already used in src/db/database.ts) port verbatim. Also rules itself out by lacking pgvector. |
| Azure AI Search | Would own vectors in a separate service (~$75/mo floor, ~$250 S1 at our tier). We keep vectors in pgvector — one store, feature-parity with today's lexical KB, semantic recall added in-DB. Re-evaluate only past millions of vectors. |
| Cosmos DB for KB chunks | Chunks + vectors live in Postgres; Cosmos would add a second store. |
| AKS | Operationally heavy; Container Apps covers our needs. |
| Azure Functions | Cold start hurts WebSocket and orchestration loops. Container Apps with KEDA gives us the autoscale without the cold-start tax. |
| App Service for Containers | WebSocket sticky-only; less elegant scale. |
| Azure OpenAI (for agent inference) | Gives OpenAI's models, not Anthropic's — swapping Claude for GPT undercuts the product. AOAI is used only for KB embeddings (§6), never agent reasoning. |
| API Management | Overkill for our REST surface. Front Door + Container Apps ingress suffices. Revisit if we expose a public partner API. |
| Azure Communication Services / SignalR | Web PubSub is the modern replacement; SignalR remains for legacy .NET SignalR-hub interop, which we don't need. |
| Azure Cache for Redis in v1 | Worker-owned sessions avoid the need; revisit if we move to fully stateless workers. |