· Hernán Pérez Rodal · Engineering  · 3 min read

RAG over 10+ databases: what production taught us

Why vector-only RAG doesn't scale in compliance, how we designed hybrid retrieval across multiple stores, and the architectural decisions that worked in production.

Why vector-only RAG doesn't scale in compliance, how we designed hybrid retrieval across multiple stores, and the architectural decisions that worked in production.

TL;DR — In 2026 the consensus is clear: vector-only RAG doesn’t scale in production. At Darwin we built an agentic compliance system with hybrid retrieval across 10+ databases. The best decisions we made had nothing to do with the model — they were about the data layer.

The problem

Our agentic compliance system has to answer questions like:

“Which mango lots from producer X, processed between May and July, met the CTEs required by FSMA 204, and which ones have evidence gaps?”

A single question that combines:

  • Regulatory knowledge (FSMA 204 CTE/KDE definitions — structured text)
  • Traceability events (CTEs recorded with timestamp, geolocation, supplier, lot)
  • Internal business rules (gap analysis, risk scoring — dynamic logic)
  • Relationships between entities (producer → plant → shipment → retailer)

A single vector store with embeddings of everything mixed together can’t answer that well. The right answer requires joining structured data + semantic retrieval + aggregate computation.

The architecture

Our retrieval stack:

SourceTypeUsage
QdrantVector storeRegulations, doctrine, historical cases (unstructured data)
PostgreSQLRelationalTraceability events (CTE/KDE with timestamps, IDs, geo)
Firebase/FirestoreDocumentPer-customer config, UI state
Cloud StorageBlobOriginal PDFs, audit trails, digital evidence
On-chain (Polygon)ImmutableCritical attestations, digital signatures

The agent doesn’t know where each thing lives — the orchestrator resolves it.

LangGraph as orchestrator

We use LangGraph to route queries in multiple steps:

  1. Classify — what kind of question is it (regulatory / operational / mixed)
  2. Plan — which retrievals are needed (vector + SQL + graph traversal)
  3. Fan-out — execute retrievals in parallel
  4. Synthesize — pass results to the LLM with structured context
  5. Validate — guardrails to prevent hallucinations on numeric data

The key step was #2: giving the LLM a query planner that decides the retrieval strategy before going to fetch. Without that, the model hallucinates data or pulls in irrelevant context.

What didn’t work

Vector-only with aggressive chunking — our first attempt. It failed on two fronts:

  • Counting / aggregation queries (how many lots? weekly average?) — the LLM made up numbers when they weren’t explicitly in context
  • Relational joins (producer X + time window Y + certification Z) — impossible without a structured query

The fix wasn’t “better chunking” — it was separating semantic retrieval from structured queries.

What did work

Explicit query routing → the planner decides whether a question requires vector search, SQL, graph traversal, or a mix.

Numeric guardrails → if the LLM’s answer contains numbers, we verify they match what the structured query returned. If not, fail fast instead of returning wrong data.

Semantic caching at the similar-questions level → cuts LLM costs by ~40% without impact on quality.

Full-trace observability with OpenTelemetry → every query is tracked end-to-end (planner → retrieval → LLM → guardrails). Critical for debugging.

Lessons learned

  1. The bottleneck of RAG in production isn’t retrieval — it’s deciding which retrieval to use
  2. Numeric guardrails save lives when the correctness of an answer drives regulatory decisions
  3. LangGraph beats linear chains for orchestrating conditional retrievals
  4. Multi-store + planner > single vector store with better chunking
  5. LLMs will hallucinate on structured aggregations — no matter how good the model is

What’s next?

The next iteration is to replace some planner rules with router fine-tuning using real production examples. The planner as an LLM is flexible but expensive — caching its decisions into a smaller model is a logical step.

If you’re building RAG for regulated domains, my advice is: start with the query planner, not the vector store.


Are you building something similar? Let’s talk — we’re open to sharing architecture and learning from other cases.

Compartir:
Back to Blog

Related Posts

View All Posts »