Memory System

Open Genie has a persistent, retrieval-augmented knowledge system that scales from a handful of structured facts to an unbounded store of chat extracts, free-form notes, voice transcripts, and camera observations — pulling in only the slices relevant to the current conversation turn.

Architecture

The system uses three storage tiers, each with a different performance profile and retrieval mechanism.

Three-Tier Model

Tier	Table	Size budget	When in prompt	Primary writer
Facts	`memory_entries`	≤ ~50 entries	Always (stable half, prompt-cached)	`update_memory` tool, promoted from chunks
Chunks	`memory_chunks`	Unbounded	Top-8 per turn, retrieved by semantic search (volatile half)	`extractMemories`, `add_note` tool, UI
People	`households` + `persons`	One household, ~1–10 persons	Member list in stable half; person names on chunks in volatile half	Setup UI, `/api/household`

Why two tables instead of one

memory_entries is the "always-in-prompt" hot set. It must stay tiny so the cached stable prompt stays cheap to re-use across turns.
memory_chunks is the long tail — append-only growth, retrieved on demand, embedded as 768-dim vectors.
Two tables keeps queries simple and lets both evolve independently. The promotion job (Phase 5) gradually moves stable chunks into facts.

Preconditions

Before the RAG system is active, two runtime dependencies must be satisfied:

PostgreSQL ≥ 13 with pgvector ≥ 0.5.0. Confirm after running migrations:
```
SELECT extversion FROM pg_extension WHERE extname='vector';
```
The HNSW index (fast approximate nearest-neighbour) requires ≥ 0.5.0.
nomic-embed-text pulled in Ollama (~270 MB, 768-dim):
```
ollama pull nomic-embed-text
```
The preflight check (npm run preflight) warns if the model is missing.

Retrieval Flow

Every chat turn where GENIE_RAG_ENABLED=1 is set, this sequence runs before the system prompt is assembled:

1. getActiveHouseholdId()         — single DB row, cached per request
2. inferSpeaker(deviceId, msg)    — device hint → single-mode → first-person parser
3. embed(userMessage)             — Ollama /api/embeddings, ~30–80 ms locally
4. Two parallel DB queries:
   - Vector:   cosine on memory_chunks.embedding (HNSW index), top-40
   - Keyword:  tsvector rank on memory_chunks.content_tsv (GIN index), top-40
5. Reciprocal Rank Fusion (k=60)  — merge both lists by rank
6. Heuristic re-rank:
   - pinned   → +0.20
   - matched person hint → +0.15
   - recency decay: -0.10 × min(1, ageInDays / 90)  [skipped for pinned]
7. Take top-8, format as ## Relevant Memory, inject into volatile prompt half

The stable half of the system prompt (soul + facts + tool definitions + household members) is unchanged per-turn and benefits from prompt caching. Only the volatile section — including the retrieved chunks — changes each turn.

Embedding Model

Ollama nomic-embed-text (768-dim) is the default.

Env var	Default	Purpose
`GENIE_EMBED_MODEL`	`nomic-embed-text`	Override the embedding model. Changing it requires re-running the backfill.

The embedding_model column is stored per chunk so the corpus can be gradually re-indexed without downtime when the model is swapped.

Data Model

`households`

One row per installation (v1 is always single-household).

Column	Type	Notes
`id`	uuid	PK
`display_name`	varchar	Shown in prompt and UI
`mode`	varchar	`"single"` \| `"multi"`
`timezone`	varchar	e.g. `"America/New_York"`
`locale`	varchar	e.g. `"en-US"`
`created_at`	timestamptz

`persons`

One row per household member. Aliases let "mom", "wife", "Sarah" all resolve to the same row.

Column	Type	Notes
`id`	uuid	PK
`household_id`	uuid FK	Cascade delete
`display_name`	varchar	Canonical name
`pronouns`	varchar	Optional
`role`	varchar	`"primary"` \| `"spouse"` \| `"child"` \| `"guest"` …
`aliases`	jsonb `string[]`	e.g. `["mom", "Sarah"]`
`birth_date`	varchar	YYYY-MM-DD
`notes`	text

`memory_chunks`

The unbounded long-tail store.

Column	Type	Notes
`id`	uuid	PK
`household_id`	uuid FK	Cascade delete
`person_id`	uuid FK	Null = household-scoped
`source`	varchar	`"chat_extract"` \| `"note"` \| `"imported"` \| `"voice_transcript"` \| `"camera_observation"`
`source_ref`	jsonb	`{ conversationId?, memoryEntryId?, … }`
`category`	varchar	Optional bucket (same categories as facts)
`title`	varchar	Short headline for UI
`content`	text	1–3 sentences typical
`observed_at`	timestamptz	When the fact was observed
`expires_at`	timestamptz	Null = evergreen
`confidence`	integer	0–100
`pinned`	boolean	Pinned chunks get +0.20 retrieval boost, skip recency decay
`promoted`	boolean	Set to true when the promotion job has created a `memory_entries` fact
`embedding`	vector(768)	Populated by `addChunk()` automatically
`embedding_model`	varchar	Model name at embed time
`content_tsv`	tsvector	Generated column — `to_tsvector('english', title

Indexes:

Index	Type	Used for
`memory_chunks_embedding_idx`	HNSW (cosine)	Fast approximate nearest-neighbour
`memory_chunks_tsv_idx`	GIN	Full-text keyword search
`memory_chunks_household_idx`	btree	All per-household queries
`memory_chunks_person_idx`	btree	Per-person filtering
`memory_chunks_observed_idx`	btree	Recency ordering

Additive columns on existing tables

Table	Column	Purpose
`memory_entries`	`household_id`	Scope facts to a household
`memory_entries`	`person_id`	Attribute facts to a person
`devices`	`primary_person_id`	Speaker inference — which person typically uses this device

Module API

All new code lives under apps/web/lib/memory/.

`embeddings.ts`

embed(text: string): Promise<number[]>
embedBatch(texts: string[], concurrency?: number): Promise<number[][]>
toVectorLiteral(v: number[]): string   // "[0.1,0.2,...]" for raw SQL
getEmbedModel(): string
getEmbedDim(): number                  // 768

`chunks.ts`

addChunk(input: AddChunkInput): Promise<MemoryChunk>
// Always calls embed() internally — callers cannot forget to embed.

getChunk(id: string, householdId?: string): Promise<MemoryChunk | null>
listChunks(opts: { householdId, personId?, limit?, cursor? }): Promise<MemoryChunk[]>
updateChunk(id: string, patch): Promise<MemoryChunk>
// Re-embeds automatically when content or title changes.

deleteChunk(id: string): Promise<void>
pinChunk(id: string, pinned: boolean): Promise<void>

`retrieve.ts`

retrieveMemory(opts: RetrieveOptions): Promise<RankedChunk[]>
formatChunksForPrompt(chunks, personLabels): string
shouldRetrieve(message, history): boolean
// Returns false for pure greetings on the first turn.

RetrieveOptions:

{
  householdId: string;
  query: string;
  personHints?: string[];   // person UUIDs to boost in re-ranking
  k?: number;               // default 8
  recency?: "boost" | "neutral";
}

`speaker.ts`

inferSpeaker(input: InferSpeakerInput): Promise<SpeakerInference>
// Resolution order:
//   1. Voice session speaker ID (v2 TODO)
//   2. devices.primary_person_id (confidence: high)
//   3. Single-mode fast path: return the only person (confidence: high)
//   4. First-person parser: "my wife is cooking" → speaker is not the wife
//   5. Fallback: [] with confidence: low

resolvePersonRef(householdId, ref): Promise<{ id, displayName } | null>
// Matches displayName, aliases[], or role — returns null if ambiguous.

`persons.ts`

getActiveHouseholdId(): Promise<string>   // cached per process
clearHouseholdCache(): void
loadPersonLabels(householdId): Promise<Map<string, string>>
getPersons(householdId): Promise<Person[]>
createHousehold(input): Promise<Household>
createPerson(input): Promise<Person>
updateHousehold(id, patch): Promise<Household>
upsertPersons(householdId, members): Promise<Person[]>

`promote.ts`

runPromotion(): Promise<{ promoted: number }>
// Clusters chunks with cosine distance < 0.15 that appear ≥ 3 times,
// promotes each cluster to a memory_entries fact, marks chunks promoted=true.

Memory Sources

`source` value	Created by	Notes
`imported`	`backfill-rag.ts`	One-time import of existing `memory_entries` at migration time
`chat_extract`	`extractMemories()` in `memory-extraction.ts`	Both structured facts and free-form notes from each conversation
`note`	`add_note` AI tool or `POST /api/memory/chunks`	Model-authored or user-authored free-form notes
`voice_transcript`	(reserved)	Voice session transcripts — future
`camera_observation`	(reserved)	Ollama vision responses — future

Automatic Extraction

After every chat response (triggered every 5 messages, or when the user says "remember that"), the extraction LLM pass now returns:

{
  "facts": [
    { "category": "preferences", "key": "coffee_order", "value": "Oat milk latte, no sugar" }
  ],
  "notes": [
    "User mentioned they are considering switching jobs and wants to evaluate options in Q2."
  ]
}

Each facts[] entry is upserted to memory_entries (existing behavior) and written as a chat_extract chunk.
Each notes[] entry is written as a chat_extract chunk only — it doesn't fit the structured key/value model.

Multi-Person Households

Modes

Mode	Behavior
`single`	Speaker inference always returns the one household member. `person` parameters are hidden from AI tool schemas. `find_person` tool is hidden.
`multi`	Speaker inference uses device hints and first-person parsing. AI tools expose the `person` parameter for attribution.

Speaker Inference

Speaker inference runs on every turn (when RAG is enabled) and produces personHints[] — a list of person UUIDs used to boost retrieval for that person's chunks.

It does not attribute the speaker in conversations themselves; it only informs which chunks are most relevant. Attribution errors can be corrected after the fact via the memory UI.

First-Run Setup

Navigate to /setup to configure the household. The Household step:

Picks single vs. multi mode
Sets the household display name and timezone
For multi: adds household members with name, pronouns, role, and aliases

This can be edited any time at /setup or via PUT /api/household.

Mode-switch rules:

single → multi: existing person becomes role='primary'; add more members.
multi → single: only allowed when exactly one person remains. Remove others first.

Feature Flag

RAG retrieval is gated behind an environment variable during rollout:

Env var	Value	Effect
`GENIE_RAG_ENABLED`	`1`	Enable retrieval injection into the prompt
(unset / `0`)	—	Legacy behavior: only `MEMORY.md` in prompt
`GENIE_RAG_DISABLED`	`1`	Phase-5 kill switch — falls back to legacy even if `GENIE_RAG_ENABLED=1`

With GENIE_RAG_ENABLED=0 (default), chat behavior is byte-identical to pre-RAG behavior. The new tables exist but are not queried during chat.

Backfill & Migration

Running npm run db:migrate (or npm run db:generate && npm run db:migrate) applies the 0003_little_marrow.sql migration and then automatically runs the backfill:

Seeds the households table from opengenie.json if empty
Seeds persons from opengenie.json's household.members array (or creates a single default person "You")
Imports all existing memory_entries rows as pinned memory_chunks with source='imported'
Embeds each chunk via Ollama (nomic-embed-text)

The backfill is idempotent — safe to run multiple times. If Ollama is unreachable, the backfill logs a warning and the migrate command still exits 0. Re-run manually:

npm run backfill:rag

Note: opengenie.json's household block is now a seed-only value. Once a row exists in the households table, the database is the source of truth. Edit household settings via /setup or /api/household.

Promotion Job

lib/memory/promote.ts runs daily (register it with the scheduler) and clusters similar chunks:

Self-joins memory_chunks on embedding <=> embedding < 0.15 (cosine distance)
Finds clusters with ≥ 3 occurrences
Upserts a canonical fact into memory_entries
Sets promoted=true on all chunk members of the cluster

When GENIE_RAG_ENABLED=1, syncMemoryFile() only writes promoted facts to MEMORY.md, keeping the file small (≤ 20 entries) and the stable prompt cache tight.

AI Actions

Action	Description
`read_memory`	Query by category/key (exact) or `semantic=true` (vector search)
`update_memory`	Create/update a structured fact in `memory_entries`
`add_note`	Save a free-form observation as a chunk
`find_person`	Resolve a household member by name, alias, or relationship

When to use read_memory vs add_note:

Use update_memory for atomic facts: coffee_order: oat milk latte.
Use add_note for narrative context: "User mentioned they are considering a kitchen renovation, scope TBD, budget ~$40k."
Use read_memory({ semantic: true, key: "kitchen" }) to recall notes by topic.

// Store a free-form note
{
  "name": "add_note",
  "arguments": {
    "content": "User is planning a kitchen renovation. Budget approximately $40k, timeline unclear.",
    "category": "plans"
  }
}

// Recall semantically
{
  "name": "read_memory",
  "arguments": {
    "key": "what home projects is the user planning?",
    "semantic": true
  }
}

REST Endpoints

Facts (structured memory)

Method	Path	Description
`GET`	`/api/memory`	List entries (`?category=` or `?q=` search)
`POST`	`/api/memory`	Create/upsert entry `{ category, key, value }`
`DELETE`	`/api/memory/[id]`	Delete an entry
`GET`	`/api/memory/raw`	Get `SOUL.md` and `MEMORY.md` contents
`PUT`	`/api/memory/soul`	Update `SOUL.md`

Chunks (RAG long-tail)

Method	Path	Description
`GET`	`/api/memory/chunks`	List chunks (`?q=` for semantic search, `?personId=` to filter)
`POST`	`/api/memory/chunks`	Create a chunk (embeds automatically)
`GET`	`/api/memory/chunks/[id]`	Get a single chunk
`PATCH`	`/api/memory/chunks/[id]`	Update chunk (re-embeds if content/title changes)
`DELETE`	`/api/memory/chunks/[id]`	Delete a chunk

Household

Method	Path	Description
`GET`	`/api/household`	Get current household + persons list
`PUT`	`/api/household`	Update household settings + replace persons (transactional)

Prompt Cache Impact

The system prompt is split at :

Stable half (cached across turns): soul + facts + tool definitions + household member list
Volatile half (rebuilt per turn): inbound context + current datetime + ## Relevant Memory (retrieved chunks) + runtime info

Retrieved chunks land in the volatile half, so the expensive stable-half cache stays valid across all turns in a conversation. Only the volatile section — which changes anyway — includes the retrieval results.

Architecture​

Three-Tier Model​

Why two tables instead of one​

Preconditions​

Retrieval Flow​

Embedding Model​

Data Model​

households​

persons​

memory_chunks​

Additive columns on existing tables​

Module API​

embeddings.ts​

chunks.ts​

retrieve.ts​

speaker.ts​

persons.ts​

promote.ts​

Memory Sources​

Automatic Extraction​

Multi-Person Households​

Modes​

Speaker Inference​

First-Run Setup​

Feature Flag​

Backfill & Migration​

Promotion Job​

AI Actions​

REST Endpoints​

Facts (structured memory)​

Chunks (RAG long-tail)​

Household​

Prompt Cache Impact​