Dec 2, 2025

6 min read

RAGEnterprise AISecurity

RAG Systems That Don't Hallucinate: Engineering Zero-Trust AI

How we engineered the Aya Knowledge Base with grounding scores, provenance tracking, and hallucination detection to build enterprise RAG that clients actually trust.

The Trust Problem in Enterprise RAG

Every enterprise wants RAG. Very few trust the answers it produces. When we started building the Aya Knowledge Base at AlysAI -- a retrieval-augmented generation system for enterprise clients handling compliance-sensitive documents -- the primary concern from every stakeholder was the same: "How do I know this answer is not hallucinated?"

It is a fair question. Standard RAG implementations retrieve relevant chunks, stuff them into a prompt, and hope the LLM synthesizes a faithful answer. In practice, LLMs confabulate details, merge information from unrelated chunks, and present fabricated content with absolute confidence. For an enterprise dealing with regulatory filings, legal contracts, or medical protocols, a single hallucination can mean compliance violations and real financial damage.

This post describes the zero-trust architecture we built for Aya -- a system where every claim in every answer is verified, scored, and traceable to its source document.

The Zero-Trust RAG Architecture

The core principle is simple: treat the LLM as an untrusted component. The LLM generates candidate answers, but a separate verification pipeline validates every factual claim before the answer reaches the user. The architecture has four stages:

Retrieval with relevance scoring and chunk provenance
Generation with inline citation requirements
Verification with grounding score computation
Filtering with hallucination threshold enforcement

Stage 1: Retrieval with Provenance

We use a hybrid retrieval strategy combining dense embeddings (via a fine-tuned BGE-large model) and sparse BM25 scoring, fused with Reciprocal Rank Fusion. But retrieval alone is not enough -- we attach full provenance metadata to every chunk:

python

@dataclass
class RetrievedChunk:
    content: str
    document_id: str
    document_title: str
    page_number: int
    paragraph_index: int
    chunk_hash: str  # SHA-256 of content for integrity verification
    ingestion_timestamp: datetime
    relevance_score: float  # hybrid retrieval score
    source_classification: str  # "primary", "secondary", "derived"
    pii_redacted: bool
    redaction_log: list[str]  # which PII categories were redacted

Every chunk carries its complete lineage. When the system produces an answer, a user can trace any claim back to a specific paragraph on a specific page of a specific document, ingested at a specific time. This is not just a nice feature -- for several of our clients in regulated industries, it is a compliance requirement.

Stage 2: Constrained Generation

The generation prompt explicitly instructs the LLM to cite its sources using chunk identifiers. We format the prompt so that each retrieved chunk has a unique tag, and the model must reference these tags inline:

python

GENERATION_PROMPT = """
You are answering questions using ONLY the provided source documents.

RULES:
1. Every factual claim MUST include a citation tag [ChunkID].
2. If the sources do not contain information to answer the question, say "I cannot find this information in the provided documents."
3. Do NOT combine information from different chunks to infer new facts.
4. Do NOT add information beyond what is explicitly stated in the sources.

Sources:
{formatted_chunks}

Question: {user_question}

Answer (with citations):
"""

This prompt engineering reduces hallucination rate significantly, but does not eliminate it. LLMs still occasionally fabricate citations or misattribute claims. That is why Stage 3 exists.

Stage 3: Grounding Verification

After the LLM generates an answer, a separate verification pipeline decomposes the answer into individual claims and scores each claim against the cited source chunk. We use a fine-tuned NLI (Natural Language Inference) model for this -- specifically, a DeBERTa-v3-large model fine-tuned on a combination of MNLI, FEVER, and our own domain-specific entailment dataset.

python

class GroundingVerifier:
    def __init__(self, nli_model, threshold=0.7):
        self.nli_model = nli_model
        self.threshold = threshold

    def verify_answer(self, answer: str, chunks: dict[str, RetrievedChunk]):
        claims = self.decompose_claims(answer)
        results = []

        for claim in claims:
            cited_chunk = chunks.get(claim.citation_id)
            if cited_chunk is None:
                results.append(GroundingResult(
                    claim=claim, score=0.0, status="FABRICATED_CITATION"
                ))
                continue

            # NLI: does the chunk entail the claim?
            score = self.nli_model.entailment_score(
                premise=cited_chunk.content,
                hypothesis=claim.text
            )

            status = "GROUNDED" if score >= self.threshold else "UNGROUNDED"
            results.append(GroundingResult(
                claim=claim, score=score, status=status
            ))

        return results

Each claim receives a grounding score between 0.0 and 1.0. Scores above 0.7 are considered grounded. Scores between 0.3 and 0.7 are flagged for review. Scores below 0.3 trigger automatic removal of the claim from the answer.

Stage 4: Hallucination Filtering and Response Assembly

The final stage assembles the verified answer. Ungrounded claims are either removed or replaced with a disclaimer. The overall answer receives an aggregate grounding score -- the weighted average of its individual claim scores. If the aggregate score falls below our hallucination threshold of 0.3, the entire answer is rejected and the user receives a message explaining that the system could not produce a sufficiently reliable answer.

python

def assemble_verified_response(claims, grounding_results, threshold=0.3):
    verified_claims = []
    warnings = []

    for claim, result in zip(claims, grounding_results):
        if result.status == "GROUNDED":
            verified_claims.append(claim.text)
        elif result.status == "UNGROUNDED" and result.score >= 0.3:
            verified_claims.append(
                f"{claim.text} [Low confidence - verify against source]"
            )
            warnings.append(f"Claim partially supported: {claim.text[:80]}...")
        else:
            warnings.append(f"Removed unverified claim: {claim.text[:80]}...")

    aggregate_score = mean([r.score for r in grounding_results])

    if aggregate_score < threshold:
        return {
            "answer": None,
            "message": "Unable to produce a sufficiently reliable answer.",
            "grounding_score": aggregate_score,
            "warnings": warnings,
        }

    return {
        "answer": " ".join(verified_claims),
        "grounding_score": aggregate_score,
        "warnings": warnings,
        "provenance": [r.to_dict() for r in grounding_results],
    }

PII Redaction as a First-Class Concern

Enterprise documents contain personally identifiable information -- names, addresses, social security numbers, medical record numbers. Our ingestion pipeline runs PII detection before chunking, using a combination of Microsoft Presidio and custom regex patterns for domain-specific identifiers.

Redacted content is replaced with typed placeholders ([PERSON_NAME_1], [SSN_REDACTED]) that preserve semantic structure while removing sensitive data. The redaction log is attached to each chunk's provenance metadata, so compliance teams can audit exactly what was redacted and why.

Critically, PII redaction happens before embeddings are computed. This means the vector store never contains PII in any form -- neither in the stored text nor in the embedding vectors that could theoretically be inverted.

OWASP LLM Top 10 Compliance

We audited the Aya system against the OWASP Top 10 for LLM Applications:

LLM01 (Prompt Injection): Input sanitization layer strips known injection patterns. The generation prompt uses XML delimiters that are validated before LLM submission.
LLM02 (Insecure Output): All LLM outputs are treated as untrusted. HTML is escaped, and outputs are validated against expected schemas before rendering.
LLM06 (Sensitive Information Disclosure): PII redaction pipeline described above, plus output scanning that catches any PII that bypasses ingestion-time redaction.
LLM09 (Overreliance): The grounding score system explicitly communicates confidence levels to users, discouraging blind trust in AI outputs.

Production Results

After three months in production with four enterprise clients, the system processes approximately 15,000 queries per day with the following metrics:

Average grounding score: 0.82
Full rejection rate (aggregate score below 0.3): 3.2% of queries
User-reported inaccuracies: 0.4% of queries (down from 11% with standard RAG)
Average response latency: 2.8 seconds (acceptable for the document analysis use case)

The latency cost of verification is real -- approximately 800ms is spent on claim decomposition and NLI scoring. But every client we have spoken to accepts this trade-off. In their words: "A slower correct answer is infinitely more valuable than a fast wrong one."

Lessons Learned

The NLI model is the linchpin. Off-the-shelf NLI models work reasonably well, but fine-tuning on domain-specific entailment pairs improved grounding accuracy by 15 percentage points. Invest in building a high-quality entailment dataset for your domain.

Chunk boundaries matter more than you think. A claim that spans two chunks will fail grounding verification even if both chunks support it. We implemented a chunk merging strategy for adjacent chunks from the same document section, which reduced false-negative grounding failures by 22%.

Users trust the system more when they can see the scores. Exposing grounding scores and provenance links in the UI, rather than hiding them, dramatically increased user adoption. Transparency builds trust faster than accuracy alone.

Back to all posts

RAG Systems That Don't Hallucinate: Engineering Zero-Trust AI

The Trust Problem in Enterprise RAG

The Zero-Trust RAG Architecture

Stage 1: Retrieval with Provenance

Stage 2: Constrained Generation

Stage 3: Grounding Verification

Stage 4: Hallucination Filtering and Response Assembly

PII Redaction as a First-Class Concern

OWASP LLM Top 10 Compliance

Production Results

Lessons Learned

Share Post

Comments

RAG Systems That Don't Hallucinate: Engineering Zero-Trust AI

The Trust Problem in Enterprise RAG

The Zero-Trust RAG Architecture

Stage 1: Retrieval with Provenance

Stage 2: Constrained Generation

Stage 3: Grounding Verification

Stage 4: Hallucination Filtering and Response Assembly

PII Redaction as a First-Class Concern

OWASP LLM Top 10 Compliance

Production Results

Lessons Learned

Share Post

Comments