ADR-005: Evidence Pack Architecture (L0/L1/L2)

Status: Accepted Date: February 2026

Context

AI-generated coaching insights face two fundamental trust problems:

Users don't trust AI: "Why should I believe this about me?"
Regulators require explainability: EU AI Act and GDPR Article 22 demand explanation of automated decisions

Most AI coaching tools provide black-box outputs — a recommendation with no visible reasoning. This creates both a user trust gap and a compliance gap.

Decision

Implement a three-level Evidence Pack system (L0 → L1 → L2) that provides progressive explainability for every AI-generated insight.

Level	Name	Description	Example
L0	Headline Insight	The top-level finding	"You tend to avoid delegation"
L1	Reasoning	AI's chain of reasoning	"Throughout the session, you expressed anxiety about assigning work to your team. This pattern suggests delegation is a challenge area."
L2	Direct Evidence	Source material with jump links	"At 23:10: 'I often end up doing it myself because I worry it won't be done right.' [Play Video]"

Every insight must have all three levels. If L2 evidence cannot be found, the insight is either dropped or flagged as low-confidence.

Alternatives Considered

Alternative 1: Simple Summaries (No Evidence)

Pro: Faster to build, lower LLM cost per session
Con: No trust mechanism, no explainability, regulators may object
Rejected because: Fails both user trust and compliance requirements

Alternative 2: Two Levels (Summary + Quote)

Pro: Simpler, covers basic evidence needs
Con: Missing the reasoning layer means users jump from insight to raw quote without understanding the connection
Rejected because: L1 reasoning is what makes the evidence pack genuinely educational

Alternative 3: Full Chain-of-Thought Exposure

Pro: Maximum transparency
Con: AI chain-of-thought is often messy, verbose, and not user-friendly; may include internal reasoning that's confusing or concerning
Rejected because: Too raw — L1 is a curated reasoning summary, not the full internal process

Consequences

Positive

Trust: Users can verify any insight by clicking through to source material
Compliance: Directly addresses EU AI Act explainability and GDPR Article 22
Coaching quality: Coaches can review evidence before sharing with clients, improving accuracy
Differentiation: No competitor offers this level of AI transparency in coaching
Bias detection: If evidence pack reveals bias in reasoning, it can be caught and corrected

Negative

Higher LLM cost: Generating L1 reasoning + L2 evidence requires additional API calls (estimated 2-3x more tokens per session)
Latency: Evidence pack generation takes longer (acceptable if async post-session)
Hallucination risk: LLM may cite incorrect transcript locations — requires cross-validation
Storage: More data per insight (L1 text + multiple L2 references with metadata)

Technical Architecture

Cross-Validation Process

To prevent hallucinated evidence:

LLM generates insight with claimed evidence quotes
System searches actual transcript for each claimed quote
If exact match found → accept, store with precise offset
If fuzzy match found (>80% similarity) → accept with correction
If no match found → drop that evidence, flag insight as low-confidence
If all evidence dropped → discard insight entirely

Context​

Decision​

Alternatives Considered​

Alternative 1: Simple Summaries (No Evidence)​

Alternative 2: Two Levels (Summary + Quote)​

Alternative 3: Full Chain-of-Thought Exposure​

Consequences​

Positive​

Negative​

Technical Architecture​

Cross-Validation Process​

References​