All case studies
Case study · Kodori · Private Beta

Building Kodori, AI-native document management.

"AI document management your auditor can defend." A cloud-native DMS for law firms and AEC teams where AI agents are the substrate — not a feature — with a hash-chained audit trail, deny-always-wins legal holds, an AEC vertical module, DLP + anomaly detection, and 77 typed MCP tools. Built on Next.js, Neon Postgres, and content-addressable blob storage.

77
MCP tools
50
DB tables
9
Monorepo packages
125K+
Lines of TS
Product
Kodori (kodori.ai)
Industry
Legal & AEC · DMS
Services
Full-stack, AI integration
Status
Private beta
The challenge

Document management built for litigation.

Mid-market legal and AEC firms run on documents — contracts, matters, drawings, specs, holds. The incumbents (iManage, NetDocuments, and other established DMS vendors) own the folder metaphor and bolt AI on top. When litigation hits, exports take weeks and audit trails are fragile.

Pain points identified

What's broken

  • Folder trees deep enough to lose anything; final_final_v3 chaos
  • Manual metadata entry across every document and matter
  • Legal holds enforced by policy, not software — mistakes are catastrophic
  • Retention schedules guessed at; auto-disposal too risky to enable
  • AI features bolted on; no way to ask "all NDAs from Q4 with Acme"
  • Audit trails exist but tampering is hard to prove or disprove

Goals defined

What we built

  • Make the AI agent the interface, not a feature — ingest, classify, file, retrieve
  • Replace folder trees with collections-as-views; documents live once
  • Hash-chained audit log: tamper-evident by construction
  • Deny-always-wins legal holds enforced at UI, MCP, and retention layers
  • Hybrid search: Postgres full-text + pgvector semantic via RRF
  • Reversible agent transactions — every mutation has an inverse
Our solution

AI agents as the substrate.

Kodori treats AI agents as the substrate of the DMS, not an upsell. Every action is captured as an immutable, hash-chained event. Documents are stored content-addressably so dedup is automatic. Legal holds, retention classes, collections, AEC artifacts (RFIs, submittals, drawings), DLP findings, and anomaly review are all first-class concepts the agent reads and writes through 77 typed MCP tools.

01

Hybrid Search via RRF

Postgres full-text and pgvector semantic embeddings (1536-dim text-embedding-3-small) run in parallel and combine via Reciprocal Rank Fusion. Plain-language queries, exact phrases, or concept-based retrieval — with the path that matched each hit.

  • pgvector HNSW index, cosine distance
  • Reciprocal Rank Fusion blending
  • Multi-format extraction (PDF, Office, Illustrator, email)
  • Per-result match-path indicators
02

Agent-Driven Classification

Claude Haiku 4.5 proposes metadata after extraction — sensitivity, collection, keywords, document type. Every suggestion has Accept / Dismiss buttons; accepting writes a durable event so the audit log credits the human, not the agent.

  • Anthropic vision for PDFs and images
  • Pure-JS extractors for .docx / .xlsx / .pptx
  • Illustrator .ai sniffing for embedded PDF
  • Claude Opus 4.6 for reasoning, Haiku 4.5 for classification
03

Deny-Always-Wins Legal Holds

Bind documents to a matter or litigation hold and the system refuses to delete or downgrade sensitivity. The UI disables Delete, the retention queue disables disposal, and MCP tools enforce the same gate server-side. Three independent enforcement points.

  • Hold-aware UI controls
  • MCP tool guards mirror UI gates
  • Retention review queue respects holds
  • Object-lock on S3 / R2 backing storage
04

Hash-Chained Audit Trail

Every consequential mutation appends a row to the events table. Each row's prev_hash is the SHA-256 of the previous event — tampering is detectable. Same chain backs SOC 2, 21 CFR Part 11, GDPR, and FRCP discovery exports.

  • Append-only events table
  • SHA-256 prev_hash linkage
  • Reversible agent actions with inverse events
  • Replay-safe projection engine for views
Technical deep dive

Architecture and infrastructure.

Frontend & monorepo

Next.js 15.5 with shadcn/ui and Tailwind CSS 4. Turborepo + pnpm monorepo with 9 packages (core, db, events, agent, mcp, workflow, evals, migration, sdk) and 2 apps (web, sync-companion).

  • JetBrains Mono (display) + IBM Plex Sans (body)
  • Strict TypeScript: noUncheckedIndexedAccess + exactOptionalPropertyTypes
  • Server Components for live mutations; AI SDK streaming for chat
  • Zod at boundaries; internal code trusts internal types
  • Publishable @kumokodo/sdk client for external integrators

Backend & infrastructure

Next.js Route Handlers on Vercel for the API surface. Inngest for durable multi-step pipelines (extract → embed → auto-classify). Cloudflare Email Workers for HMAC-signed inbound email ingress.

  • Auth.js v5 (JWT, Google OAuth; Microsoft Entra scaffolded)
  • S3 / Cloudflare R2 with object-lock for legal holds
  • Per-tenant KMS keys for BYO-key encryption
  • Public REST API v1 with bearer-token auth + webhooks subsystem
  • OpenAPI 3.1 manifest at /api/openapi.json
  • Migration platform: external connectors + read-shadow + dual-write cut-over

MCP tool catalog (77)

Every agent capability is a typed MCP tool. The agent can't do anything the same tool wouldn't let the user do via the API. Read tools never mutate; write tools always emit events.

  • Search & retrieval: hybridSearch, semanticSearch, searchKeyword, searchExternalContent, readDocument, listAllDocuments, listDocumentVersions, listDocumentEvents, recentActivity
  • Document lifecycle: createDocument, renameDocument, setDocumentSensitivity, setDocumentMetadata, setVersionLabel/Significance, tombstoneDocument, restoreDocument, checkOutDocument, releaseDocumentCheckOut, bulkDocumentOperations
  • Collections & permissions: createCollection, renameCollection, setCollectionRule, add/remove document, grant/revoke (collection) permissions
  • Holds: createLegalHold, add/remove documents, bulkAddDocumentsToLegalHold, releaseLegalHold, previewHoldCandidates, previewMatterConflicts
  • Retention: createRetentionClass, archive/unarchive, setDocumentRetentionClass, deferRetention, listRetentionReviewQueue
  • AEC vertical: setAecMetadata, extractDrawings, bulkExtractDrawings, extractCitations, bulkExtractCitations, linkResponseDocument
  • Governance: listAnomalies, acknowledge/dismissAnomaly, unpauseAnomalyAgent, listDlpFindings, decideDlpFinding, recordProduction, verifyAuditChain
  • Annotations: createAnnotation, listAnnotations, resolveAnnotation, reopenAnnotation, deleteAnnotation
  • Tenant ops: getTenantSettings, tenantUsageSummary, listMembers, listApiKeys, listWebhooks, helpKnowledge

Database & storage

Neon Postgres with pgvector pre-installed. Drizzle ORM 0.36. 50 tables covering tenants, users, document objects + versions + content + chunks + redactions + drawings + citations, collections, permissions, hash-chained events, API keys, webhooks, invites, legal holds + custodians, retention, metadata suggestions, saved searches + alerts, AEC (projects, RFIs, submittals, inspections, change orders), AP invoice workflow, anomalies, DLP findings, annotations, productions, privilege-log overrides, share links, migration jobs, external connectors + messages, pending deletions, tenant policies, and per-tenant KMS keys.

  • Content-addressable blob storage — SHA-256 is the canonical key
  • Automatic dedup: identical files never duplicate
  • Object-lock on blobs referenced by legal holds (S3 / R2 WORM)
  • Per-tenant Neon DB branches available for isolation
  • pgvector HNSW index for embeddings (1536 dims, cosine)
  • Per-tenant KMS for BYO-key envelope encryption
Key features

Fifteen feature pillars.

Hybrid Keyword + Semantic Search

Plain language ("all NDAs from Q4 with Acme"), exact phrases, or concepts — results show which path matched.

Agent Auto-Classification

Claude Haiku proposes metadata; humans accept and the audit log credits the human, not the AI.

Collections-as-Views

No folder trees. Documents live once and appear in any number of saved metadata views — matter, project, quarter.

Deny-Always-Wins Holds

Three independent enforcement points: UI controls, MCP tool guards, retention queue. Held docs cannot be deleted.

Reversible Agent Actions

Most mutations have an inverse. From /audit, click Revert and Kodori dispatches the inverse — the chain stays intact.

SOC 2 + 21 CFR Part 11 Posture

Hash-chained audit, immutable events, sensitivity badges, signed-action attribution, GDPR-ready exports.

Multi-Format Extraction

Anthropic vision for PDFs and images; pure-JS for Office; native parsing for text/JSON/CSV/XML; Illustrator .ai sniffing.

Email Ingress

Cloudflare Email Workers with HMAC-signed inbound API. Forward a thread; Kodori files, classifies, and emits events.

Inngest Durable Workflows

Extract → embed → classify pipeline runs as crash-safe, retryable Inngest jobs with full observability.

Retention Review Queue

Define classes; when retention elapses, docs surface for human-confirmed disposition. Never auto-tombstones.

Public REST API v1

Bearer tokens, cursor-paginated listings, /me + /search + /documents endpoints, ACL-respecting, OpenAPI manifest.

Per-Tenant DB Branches

Neon's preview branches enable per-tenant isolation when enterprise customers require it — without leaving multi-tenant.

AEC Vertical Module

First-class RFIs, submittals, inspections, change orders, drawings, and project metadata. Tools to extract drawings + citations and link response documents to RFIs / submittals.

Anomaly Detection + DLP

Anomaly agent flags unusual mutation patterns for step-up approval. DLP scanner surfaces sensitivity findings; reviewers decide accept / dismiss / redact, all hash-chained.

Collaborative Annotations

Inline annotations with resolve / reopen lifecycle. Threads attach to documents, version-aware, ACL-trimmed, and emit events into the audit chain.

Plans & pricing

Per-seat. No surprises.

No implementation fees, no per-document charges. Early Access is free during private beta; Stripe billing wires up at GA.

Early Access

Free

Design partners

  • Up to 10 GB documents + 5 users
  • Google SSO, unlimited agent queries
  • Full search, extraction, collections, audit
  • Community support

Team

$30/user/mo

10–50 seat legal & AEC teams

  • Everything in Early Access
  • Unlimited storage (fair-use)
  • Microsoft SSO + SCIM
  • Ethical walls + deny-always-wins holds
  • Audit export + 1-business-day SLA

Enterprise

Custom

Regulated enterprises

  • Everything in Team
  • BYO S3 / R2 with object lock
  • HIPAA BAA + 21 CFR Part 11 posture
  • SAML SSO via WorkOS + SCIM
  • Named CSM + migration assistance
Built with

Technology stack.

Next.js 15TypeScript (strict)Tailwind 4shadcn/uiDrizzle ORMNeon PostgrespgvectorAnthropic ClaudeVercel AI SDKMCPInngestS3 / R2Auth.js v5Cloudflare Email WorkersTurborepo + pnpm
Private beta

Want early access?

Kodori is in private beta with design partners in legal and AEC. Request access to evaluate, or contact us if you'd like a similar regulated-industry AI product built for your firm.

Build with us

Want a project like this?

We build production-ready AI applications and SaaS platforms. Let's discuss your next project.