Skip to main content

ADR-005: Evidence Pack Architecture (L0/L1/L2)

Status: Accepted Date: February 2026

Context

AI-generated coaching insights face two fundamental trust problems:

  1. Users don't trust AI: "Why should I believe this about me?"
  2. Regulators require explainability: EU AI Act and GDPR Article 22 demand explanation of automated decisions

Most AI coaching tools provide black-box outputs — a recommendation with no visible reasoning. This creates both a user trust gap and a compliance gap.

Decision

Implement a three-level Evidence Pack system (L0 → L1 → L2) that provides progressive explainability for every AI-generated insight.

LevelNameDescriptionExample
L0Headline InsightThe top-level finding"You tend to avoid delegation"
L1ReasoningAI's chain of reasoning"Throughout the session, you expressed anxiety about assigning work to your team. This pattern suggests delegation is a challenge area."
L2Direct EvidenceSource material with jump links"At 23:10: 'I often end up doing it myself because I worry it won't be done right.' [Play Video]"

Every insight must have all three levels. If L2 evidence cannot be found, the insight is either dropped or flagged as low-confidence.

Alternatives Considered

Alternative 1: Simple Summaries (No Evidence)

  • Pro: Faster to build, lower LLM cost per session
  • Con: No trust mechanism, no explainability, regulators may object
  • Rejected because: Fails both user trust and compliance requirements

Alternative 2: Two Levels (Summary + Quote)

  • Pro: Simpler, covers basic evidence needs
  • Con: Missing the reasoning layer means users jump from insight to raw quote without understanding the connection
  • Rejected because: L1 reasoning is what makes the evidence pack genuinely educational

Alternative 3: Full Chain-of-Thought Exposure

  • Pro: Maximum transparency
  • Con: AI chain-of-thought is often messy, verbose, and not user-friendly; may include internal reasoning that's confusing or concerning
  • Rejected because: Too raw — L1 is a curated reasoning summary, not the full internal process

Consequences

Positive

  • Trust: Users can verify any insight by clicking through to source material
  • Compliance: Directly addresses EU AI Act explainability and GDPR Article 22
  • Coaching quality: Coaches can review evidence before sharing with clients, improving accuracy
  • Differentiation: No competitor offers this level of AI transparency in coaching
  • Bias detection: If evidence pack reveals bias in reasoning, it can be caught and corrected

Negative

  • Higher LLM cost: Generating L1 reasoning + L2 evidence requires additional API calls (estimated 2-3x more tokens per session)
  • Latency: Evidence pack generation takes longer (acceptable if async post-session)
  • Hallucination risk: LLM may cite incorrect transcript locations — requires cross-validation
  • Storage: More data per insight (L1 text + multiple L2 references with metadata)

Technical Architecture

Cross-Validation Process

To prevent hallucinated evidence:

1. LLM generates insight with claimed evidence quotes
2. System searches actual transcript for each claimed quote
3. If exact match found → accept, store with precise offset
4. If fuzzy match found (>80% similarity) → accept with correction
5. If no match found → drop that evidence, flag insight as low-confidence
6. If all evidence dropped → discard insight entirely

References