Data Minimization & Retention
- Executive Summary
- Working Knowledge
- Technical Spec
Data minimization is both a legal requirement and a business advantage. By collecting only the data necessary for coaching insights and enforcing strict retention policies, ReGenesis reduces its attack surface, simplifies compliance, and builds trust with enterprise clients who are increasingly skeptical of platforms that hoard data.
The approach is deliberate: the platform does not ingest entire email inboxes, full calendar data, or broad organizational datasets. ReGenesis collects what is needed for coaching — session transcripts, coaching notes, goals, feedback, and contextual data that the coachee or coach explicitly provides. Evidence Packs at the L2 (detail) level store excerpts and citations, not full transcripts, ensuring that Sasha's AI reasoning is explainable without retaining more data than necessary.
Retention is configurable per client organization, with sensible defaults and automated enforcement. When data reaches end-of-life, it is truly deleted — not soft-deleted or archived — and a cryptographic deletion certificate provides auditable proof. This is the level of rigor that regulated industries expect and that differentiates ReGenesis from competitors who treat data governance as an afterthought.
What the Platform Collects (and What It Does Not)
Data Collected
| Data Type | Source | Purpose | Retention Default |
|---|---|---|---|
| Session transcripts | Audio/video transcription | Core input for coaching insights | Engagement + 12 months |
| Coaching notes | Coach and coachee entries | Track progress, guide sessions | Engagement + 12 months |
| Goals and action items | Coach/coachee/Sasha | Track development progress | Engagement + 12 months |
| Feedback and reflections | Coachee self-reports | Measure growth, inform Sasha | Engagement + 12 months |
| Sasha insights | AI-generated | Core platform value | Engagement + 12 months |
| Evidence Packs | AI-generated from source data | Explainability and trust | Engagement + 12 months |
| User profile data | Onboarding | Platform identity, personalization | Account lifetime + 30 days |
| Usage metadata | Platform interaction | Security, product improvement | 90 days rolling |
| Morning check-in responses | Coachee self-report (app) | Mood/readiness tracking, Sasha context | 90 days rolling (aggregated trends retained for engagement duration) |
| Technical logs | Infrastructure | Security monitoring, debugging | 30 days rolling |
Data NOT Collected
These are not policy decisions — they are architectural constraints:
| Data Type | Why Not |
|---|---|
| Full email inboxes | Not needed for coaching; massive privacy risk; no defined purpose |
| Full calendar data | The platform may ingest session scheduling data, but not the entire calendar |
| HR performance reviews | Coaching data and HR data must remain separate (see Employment Data) |
| Browser history or screen activity | ReGenesis is not a surveillance tool |
| Social media content | No defined coaching purpose |
| Financial or compensation data | Not relevant to coaching engagement |
| Biometric data | Not collected (future: voice sentiment analysis would require explicit consent and DPIA) |
The Evidence Pack Data Minimization Strategy
Evidence Packs are how Sasha explains its reasoning — they are the explainability backbone of the platform. They have three levels:
| Level | What's Stored | Data Minimization Approach |
|---|---|---|
| L0 — Signal | A brief insight or observation | Minimal: one-sentence summary, no source data |
| L1 — Summary | Supporting context and reasoning | Moderate: aggregated patterns, anonymized references |
| L2 — Detail | Source evidence with citations | Excerpts and citations only — not full transcripts |
The critical design decision: L2 Evidence Packs store the minimum excerpt needed to support the insight, not the entire session transcript or note. For example:
- What L2 stores: "In session on Jan 15, coachee said: 'I keep avoiding the conversation with my VP because I'm afraid of the confrontation' — this pattern appeared 3 times across sessions 4, 7, and 9."
- What L2 does NOT store: The full 45-minute transcript of sessions 4, 7, and 9.
This means if an Evidence Pack is accessed, the accessor sees only what is relevant — not an entire session's worth of personal content.
Retention Policies
Configurable Per Client
Each enterprise client can configure its own retention periods within the platform's guardrails:
| Setting | Minimum | Default | Maximum | Notes |
|---|---|---|---|---|
| Active engagement data | Duration of engagement | Duration of engagement | Duration of engagement | Cannot delete during active coaching |
| Post-engagement retention | 30 days | 12 months | 36 months | After coaching engagement ends |
| Usage metadata | 30 days | 90 days | 180 days | Rolling window |
| Technical/security logs | 30 days | 30 days | 90 days | Compliance minimum |
| Deletion request processing | Immediate | 30 days max | 30 days max | GDPR/CCPA deadline |
What Happens at End of Retention
- Automated retention scheduler scans for data past its retention date (runs weekly)
- Data is flagged for deletion and enters a 7-day grace period (allows for error correction)
- After grace period, data is permanently deleted from all stores:
- Primary database
- Backup systems (within next backup rotation cycle, max 30 days)
- Search indices
- Cache layers
- LLM context windows (ephemeral, cleared after session)
- A deletion certificate is generated with:
- What was deleted (data categories, record count)
- When it was deleted
- Who authorized the deletion (automated policy or manual request)
- Cryptographic hash proving the deletion occurred
- Deletion certificates are retained for 7 years (audit trail requirement)
Backup systems present a challenge: selective deletion from a point-in-time backup is not feasible. The ReGenesis approach is to encrypt per-user data with per-user encryption keys. When a user's data is deleted, the platform destroys the encryption key, rendering that user's data in backups cryptographically inaccessible — a technique called crypto-shredding.
Deletion Flows
User-Requested Deletion (DSR)
A coachee or enterprise admin requests deletion of all personal data:
- Request received via privacy settings UI or formal DSR channel
- Identity verified (to prevent fraudulent deletion requests)
- Scope confirmed: what data, whose data, all or partial
- 30-day processing window begins (GDPR maximum)
- Data identified across all systems
- Deletion executed with crypto-shredding
- Deletion certificate generated and sent to requester + controller
- Confirmation notification sent
Automated Policy Deletion
Data reaches end of retention period:
- Retention scheduler identifies expired data
- 7-day grace period notification sent to system admin
- If no hold request, deletion proceeds automatically
- Deletion certificate generated and logged
- Monthly summary report sent to compliance team
Legal Hold Override
Sometimes data under retention must be preserved for legal reasons:
- Legal hold request received from enterprise client or internal legal
- Hold applied to specific user or date range
- Retention scheduler skips held data
- Hold reviewed quarterly
- When hold is lifted, normal retention resumes
Data Lifecycle Overview
Every byte of data in ReGenesis has a defined purpose, a classification tag, a retention deadline, and a deletion plan. There is no data that exists "just in case" — if the platform cannot articulate why data is needed, it is not collected.