Data Model & Storage Architecture
- Executive Summary
- Working Knowledge
- Technical Spec
The ReGenesis data model is the foundation of the entire platform. It defines how every piece of information — from user profiles to AI-generated insights to audit trails — is stored, protected, and accessed.
Key Numbers
| Metric | Value |
|---|---|
| Primary data stores | 12+ |
| Storage technologies | 5 (PostgreSQL, S3, Vector DB, Redis, Audit Store) |
| Visibility levels | 4 (client_visible, coach_only, admin_aggregate, system_internal) |
| Encryption standard | AES-256 at rest, TLS 1.3 in transit |
| Tenant isolation | Row-Level Security + per-tenant encryption keys |
Why This Matters to Stakeholders
- Data is the product: ReGenesis's value comes from transforming coaching data into actionable insights. The data model determines what insights are possible.
- Compliance starts here: Every GDPR right (access, deletion, portability) is implemented at the data layer. If the data model is wrong, compliance is impossible.
- Scalability depends on it: The schema design determines whether the platform can handle 10 or 10,000 concurrent coaching programs without re-architecture.
- Security is built in: Tenant isolation, field-level encryption, and visibility tagging are not bolted on — they are structural to the data model.
Understanding the 12 Data Stores
Every piece of data in ReGenesis lives in one of 12 logical stores. Each store has a specific purpose, technology choice, and security posture.
Store Map
| # | Store | What's In It | Who Accesses It | Technology |
|---|---|---|---|---|
| 1 | User Store | Profiles, auth, preferences, consent records | All roles (filtered) | PostgreSQL |
| 2 | Artifact Store | Session recordings, uploaded files, exports | Ingest Service, Coach, Coachee | S3 |
| 3 | Transcript Store | Parsed session transcripts with speaker/timestamp | NLP Pipeline, Sasha, Coach | PostgreSQL (+ full-text) |
| 4 | Derived Insight Store | AI-generated insights, patterns, themes | Sasha, Evidence Pack Builder, Coach | PostgreSQL |
| 5 | Evidence Pack Store | L0/L1/L2 evidence chains with provenance | Evidence Pack Builder, Coach, Coachee | PostgreSQL (JSONB) |
| 6 | Session Metadata Store | Session scheduling, attendance, duration, type | Integration Service, Coach, Admin | PostgreSQL |
| 7 | Feedback Store | Coach/coachee feedback on AI outputs, ratings | Feedback Service, Coach, Coachee | PostgreSQL |
| 8 | Coach QA/Rubric Store | Coaching quality rubrics, QA scores, standards | QA Service, Admin | PostgreSQL |
| 9 | Audit Log Store | Every system action, immutable, append-only | Compliance, System Admin | PostgreSQL (append-only) / S3 |
| 10 | Deletion Records Store | Compliance receipts for data deletions | Compliance, System Admin | PostgreSQL |
| 11 | Analytics/Aggregate Store | Anonymized program metrics, dashboards | Admin, Executive | PostgreSQL (materialized views) |
| 12 | Cache/Vector Store | Embeddings for semantic search, session cache | Sasha, Redis for cache | Vector DB (Pinecone/Weaviate) + Redis |
The Tenant ID Rule
Every single record in the database has a tenant_id column. This is non-negotiable. It enables:
- Row-Level Security (RLS): PostgreSQL automatically filters queries to the current tenant
- Separate encryption keys: Each tenant can have their own KMS key (GA target)
- Data residency: Records can be routed to region-specific databases per tenant
- Clean deletion: When a tenant leaves, you can cleanly delete all their data
The Visibility Tag Rule
Every data record also has a visibility column with one of four values. This is the ABAC (Attribute-Based Access Control) layer that determines who can see what. Combined with RBAC (role checks), this creates the full permission model.
Schema Versioning
The platform uses database migration files (Prisma Migrate or Alembic) with semantic versioning. Every schema change:
- Gets a numbered migration file
- Is reviewed in a pull request
- Includes both
upanddownmigrations - Is tested against a staging database before production
Entity Relationship Diagram
PostgreSQL RDS (db.r6g.large) starts at approximately $250/month. With read replicas and automated backups, plan for $500-800/month at pilot scale. S3 costs are negligible at pilot volumes (under $10/month for transcripts). Pinecone starts at $70/month for the starter tier. See Tech Stack for detailed cost breakdown.
All schemas in this document are proposed. They require CTO review, security audit, and load testing before production deployment. Migration files should be generated from these definitions after approval.