Data Model & Storage Architecture

Executive Summary
Working Knowledge
Technical Spec

The ReGenesis data model is the foundation of the entire platform. It defines how every piece of information — from user profiles to AI-generated insights to audit trails — is stored, protected, and accessed.

Key Numbers

Metric	Value
Primary data stores	12+
Storage technologies	5 (PostgreSQL, S3, Vector DB, Redis, Audit Store)
Visibility levels	4 (`client_visible`, `coach_only`, `admin_aggregate`, `system_internal`)
Encryption standard	AES-256 at rest, TLS 1.3 in transit
Tenant isolation	Row-Level Security + per-tenant encryption keys

Why This Matters to Stakeholders

Data is the product: ReGenesis's value comes from transforming coaching data into actionable insights. The data model determines what insights are possible.
Compliance starts here: Every GDPR right (access, deletion, portability) is implemented at the data layer. If the data model is wrong, compliance is impossible.
Scalability depends on it: The schema design determines whether the platform can handle 10 or 10,000 concurrent coaching programs without re-architecture.
Security is built in: Tenant isolation, field-level encryption, and visibility tagging are not bolted on — they are structural to the data model.

Understanding the 12 Data Stores

Every piece of data in ReGenesis lives in one of 12 logical stores. Each store has a specific purpose, technology choice, and security posture.

Store Map

#	Store	What's In It	Who Accesses It	Technology
1	User Store	Profiles, auth, preferences, consent records	All roles (filtered)	PostgreSQL
2	Artifact Store	Session recordings, uploaded files, exports	Ingest Service, Coach, Coachee	S3
3	Transcript Store	Parsed session transcripts with speaker/timestamp	NLP Pipeline, Sasha, Coach	PostgreSQL (+ full-text)
4	Derived Insight Store	AI-generated insights, patterns, themes	Sasha, Evidence Pack Builder, Coach	PostgreSQL
5	Evidence Pack Store	L0/L1/L2 evidence chains with provenance	Evidence Pack Builder, Coach, Coachee	PostgreSQL (JSONB)
6	Session Metadata Store	Session scheduling, attendance, duration, type	Integration Service, Coach, Admin	PostgreSQL
7	Feedback Store	Coach/coachee feedback on AI outputs, ratings	Feedback Service, Coach, Coachee	PostgreSQL
8	Coach QA/Rubric Store	Coaching quality rubrics, QA scores, standards	QA Service, Admin	PostgreSQL
9	Audit Log Store	Every system action, immutable, append-only	Compliance, System Admin	PostgreSQL (append-only) / S3
10	Deletion Records Store	Compliance receipts for data deletions	Compliance, System Admin	PostgreSQL
11	Analytics/Aggregate Store	Anonymized program metrics, dashboards	Admin, Executive	PostgreSQL (materialized views)
12	Cache/Vector Store	Embeddings for semantic search, session cache	Sasha, Redis for cache	Vector DB (Pinecone/Weaviate) + Redis

The Tenant ID Rule

Every single record in the database has a tenant_id column. This is non-negotiable. It enables:

Row-Level Security (RLS): PostgreSQL automatically filters queries to the current tenant
Separate encryption keys: Each tenant can have their own KMS key (GA target)
Data residency: Records can be routed to region-specific databases per tenant
Clean deletion: When a tenant leaves, you can cleanly delete all their data

The Visibility Tag Rule

Every data record also has a visibility column with one of four values. This is the ABAC (Attribute-Based Access Control) layer that determines who can see what. Combined with RBAC (role checks), this creates the full permission model.

Schema Versioning

The platform uses database migration files (Prisma Migrate or Alembic) with semantic versioning. Every schema change:

Gets a numbered migration file
Is reviewed in a pull request
Includes both up and down migrations
Is tested against a staging database before production

Entity Relationship Diagram

Cost Considerations

PostgreSQL RDS (db.r6g.large) starts at approximately $250/month. With read replicas and automated backups, plan for $500-800/month at pilot scale. S3 costs are negligible at pilot volumes (under $10/month for transcripts). Pinecone starts at $70/month for the starter tier. See Tech Stack for detailed cost breakdown.

Schema Status

All schemas in this document are proposed. They require CTO review, security audit, and load testing before production deployment. Migration files should be generated from these definitions after approval.

Key Numbers​

Why This Matters to Stakeholders​

Understanding the 12 Data Stores​

Store Map​

The Tenant ID Rule​

The Visibility Tag Rule​

Schema Versioning​

Entity Relationship Diagram​