ADR-005: Evidence Pack Architecture (L0/L1/L2)
Status: Accepted Date: February 2026
Context
AI-generated coaching insights face two fundamental trust problems:
- Users don't trust AI: "Why should I believe this about me?"
- Regulators require explainability: EU AI Act and GDPR Article 22 demand explanation of automated decisions
Most AI coaching tools provide black-box outputs — a recommendation with no visible reasoning. This creates both a user trust gap and a compliance gap.
Decision
Implement a three-level Evidence Pack system (L0 → L1 → L2) that provides progressive explainability for every AI-generated insight.
| Level | Name | Description | Example |
|---|---|---|---|
| L0 | Headline Insight | The top-level finding | "You tend to avoid delegation" |
| L1 | Reasoning | AI's chain of reasoning | "Throughout the session, you expressed anxiety about assigning work to your team. This pattern suggests delegation is a challenge area." |
| L2 | Direct Evidence | Source material with jump links | "At 23:10: 'I often end up doing it myself because I worry it won't be done right.' [Play Video]" |
Every insight must have all three levels. If L2 evidence cannot be found, the insight is either dropped or flagged as low-confidence.
Alternatives Considered
Alternative 1: Simple Summaries (No Evidence)
- Pro: Faster to build, lower LLM cost per session
- Con: No trust mechanism, no explainability, regulators may object
- Rejected because: Fails both user trust and compliance requirements
Alternative 2: Two Levels (Summary + Quote)
- Pro: Simpler, covers basic evidence needs
- Con: Missing the reasoning layer means users jump from insight to raw quote without understanding the connection
- Rejected because: L1 reasoning is what makes the evidence pack genuinely educational
Alternative 3: Full Chain-of-Thought Exposure
- Pro: Maximum transparency
- Con: AI chain-of-thought is often messy, verbose, and not user-friendly; may include internal reasoning that's confusing or concerning
- Rejected because: Too raw — L1 is a curated reasoning summary, not the full internal process
Consequences
Positive
- Trust: Users can verify any insight by clicking through to source material
- Compliance: Directly addresses EU AI Act explainability and GDPR Article 22
- Coaching quality: Coaches can review evidence before sharing with clients, improving accuracy
- Differentiation: No competitor offers this level of AI transparency in coaching
- Bias detection: If evidence pack reveals bias in reasoning, it can be caught and corrected
Negative
- Higher LLM cost: Generating L1 reasoning + L2 evidence requires additional API calls (estimated 2-3x more tokens per session)
- Latency: Evidence pack generation takes longer (acceptable if async post-session)
- Hallucination risk: LLM may cite incorrect transcript locations — requires cross-validation
- Storage: More data per insight (L1 text + multiple L2 references with metadata)
Technical Architecture
Cross-Validation Process
To prevent hallucinated evidence:
1. LLM generates insight with claimed evidence quotes
2. System searches actual transcript for each claimed quote
3. If exact match found → accept, store with precise offset
4. If fuzzy match found (>80% similarity) → accept with correction
5. If no match found → drop that evidence, flag insight as low-confidence
6. If all evidence dropped → discard insight entirely