Memory System
Open Genie has a persistent, retrieval-augmented knowledge system that scales from a handful of structured facts to an unbounded store of chat extracts, free-form notes, voice transcripts, and camera observations — pulling in only the slices relevant to the current conversation turn.
Architecture
The system uses three storage tiers, each with a different performance profile and retrieval mechanism.
Three-Tier Model
| Tier | Table | Size budget | When in prompt | Primary writer |
|---|---|---|---|---|
| Facts | memory_entries | ≤ ~50 entries | Always (stable half, prompt-cached) | update_memory tool, promoted from chunks |
| Chunks | memory_chunks | Unbounded | Top-8 per turn, retrieved by semantic search (volatile half) | extractMemories, add_note tool, UI |
| People | households + persons | One household, ~1–10 persons | Member list in stable half; person names on chunks in volatile half | Setup UI, /api/household |
Why two tables instead of one
memory_entriesis the "always-in-prompt" hot set. It must stay tiny so the cached stable prompt stays cheap to re-use across turns.memory_chunksis the long tail — append-only growth, retrieved on demand, embedded as 768-dim vectors.- Two tables keeps queries simple and lets both evolve independently. The promotion job (Phase 5) gradually moves stable chunks into facts.
Preconditions
Before the RAG system is active, two runtime dependencies must be satisfied:
-
PostgreSQL ≥ 13 with pgvector ≥ 0.5.0. Confirm after running migrations:
SELECT extversion FROM pg_extension WHERE extname='vector';The HNSW index (fast approximate nearest-neighbour) requires ≥ 0.5.0.
-
nomic-embed-textpulled in Ollama (~270 MB, 768-dim):ollama pull nomic-embed-textThe preflight check (
npm run preflight) warns if the model is missing.
Retrieval Flow
Every chat turn where GENIE_RAG_ENABLED=1 is set, this sequence runs before the system prompt is assembled:
1. getActiveHouseholdId() — single DB row, cached per request
2. inferSpeaker(deviceId, msg) — device hint → single-mode → first-person parser
3. embed(userMessage) — Ollama /api/embeddings, ~30–80 ms locally
4. Two parallel DB queries:
- Vector: cosine on memory_chunks.embedding (HNSW index), top-40
- Keyword: tsvector rank on memory_chunks.content_tsv (GIN index), top-40
5. Reciprocal Rank Fusion (k=60) — merge both lists by rank
6. Heuristic re-rank:
- pinned → +0.20
- matched person hint → +0.15
- recency decay: -0.10 × min(1, ageInDays / 90) [skipped for pinned]
7. Take top-8, format as ## Relevant Memory, inject into volatile prompt half
The stable half of the system prompt (soul + facts + tool definitions + household members) is unchanged per-turn and benefits from prompt caching. Only the volatile section — including the retrieved chunks — changes each turn.
Embedding Model
Ollama nomic-embed-text (768-dim) is the default.
| Env var | Default | Purpose |
|---|---|---|
GENIE_EMBED_MODEL | nomic-embed-text | Override the embedding model. Changing it requires re-running the backfill. |
The embedding_model column is stored per chunk so the corpus can be gradually re-indexed without downtime when the model is swapped.
Data Model
households
One row per installation (v1 is always single-household).
| Column | Type | Notes |
|---|---|---|
id | uuid | PK |
display_name | varchar | Shown in prompt and UI |
mode | varchar | "single" | "multi" |
timezone | varchar | e.g. "America/New_York" |
locale | varchar | e.g. "en-US" |
created_at | timestamptz |
persons
One row per household member. Aliases let "mom", "wife", "Sarah" all resolve to the same row.
| Column | Type | Notes |
|---|---|---|
id | uuid | PK |
household_id | uuid FK | Cascade delete |
display_name | varchar | Canonical name |
pronouns | varchar | Optional |
role | varchar | "primary" | "spouse" | "child" | "guest" … |
aliases | jsonb string[] | e.g. ["mom", "Sarah"] |
birth_date | varchar | YYYY-MM-DD |
notes | text |
memory_chunks
The unbounded long-tail store.
| Column | Type | Notes |
|---|---|---|
id | uuid | PK |
household_id | uuid FK | Cascade delete |
person_id | uuid FK | Null = household-scoped |
source | varchar | "chat_extract" | "note" | "imported" | "voice_transcript" | "camera_observation" |
source_ref | jsonb | { conversationId?, memoryEntryId?, … } |
category | varchar | Optional bucket (same categories as facts) |
title | varchar | Short headline for UI |
content | text | 1–3 sentences typical |
observed_at | timestamptz | When the fact was observed |
expires_at | timestamptz | Null = evergreen |
confidence | integer | 0–100 |
pinned | boolean | Pinned chunks get +0.20 retrieval boost, skip recency decay |
promoted | boolean | Set to true when the promotion job has created a memory_entries fact |
embedding | vector(768) | Populated by addChunk() automatically |
embedding_model | varchar | Model name at embed time |
content_tsv | tsvector | Generated column — `to_tsvector('english', title |
Indexes:
| Index | Type | Used for |
|---|---|---|
memory_chunks_embedding_idx | HNSW (cosine) | Fast approximate nearest-neighbour |
memory_chunks_tsv_idx | GIN | Full-text keyword search |
memory_chunks_household_idx | btree | All per-household queries |
memory_chunks_person_idx | btree | Per-person filtering |
memory_chunks_observed_idx | btree | Recency ordering |
Additive columns on existing tables
| Table | Column | Purpose |
|---|---|---|
memory_entries | household_id | Scope facts to a household |
memory_entries | person_id | Attribute facts to a person |
devices | primary_person_id | Speaker inference — which person typically uses this device |
Module API
All new code lives under apps/web/lib/memory/.
embeddings.ts
embed(text: string): Promise<number[]>
embedBatch(texts: string[], concurrency?: number): Promise<number[][]>
toVectorLiteral(v: number[]): string // "[0.1,0.2,...]" for raw SQL
getEmbedModel(): string
getEmbedDim(): number // 768
chunks.ts
addChunk(input: AddChunkInput): Promise<MemoryChunk>
// Always calls embed() internally — callers cannot forget to embed.
getChunk(id: string, householdId?: string): Promise<MemoryChunk | null>
listChunks(opts: { householdId, personId?, limit?, cursor? }): Promise<MemoryChunk[]>
updateChunk(id: string, patch): Promise<MemoryChunk>
// Re-embeds automatically when content or title changes.
deleteChunk(id: string): Promise<void>
pinChunk(id: string, pinned: boolean): Promise<void>
retrieve.ts
retrieveMemory(opts: RetrieveOptions): Promise<RankedChunk[]>
formatChunksForPrompt(chunks, personLabels): string
shouldRetrieve(message, history): boolean
// Returns false for pure greetings on the first turn.
RetrieveOptions:
{
householdId: string;
query: string;
personHints?: string[]; // person UUIDs to boost in re-ranking
k?: number; // default 8
recency?: "boost" | "neutral";
}
speaker.ts
inferSpeaker(input: InferSpeakerInput): Promise<SpeakerInference>
// Resolution order:
// 1. Voice session speaker ID (v2 TODO)
// 2. devices.primary_person_id (confidence: high)
// 3. Single-mode fast path: return the only person (confidence: high)
// 4. First-person parser: "my wife is cooking" → speaker is not the wife
// 5. Fallback: [] with confidence: low
resolvePersonRef(householdId, ref): Promise<{ id, displayName } | null>
// Matches displayName, aliases[], or role — returns null if ambiguous.
persons.ts
getActiveHouseholdId(): Promise<string> // cached per process
clearHouseholdCache(): void
loadPersonLabels(householdId): Promise<Map<string, string>>
getPersons(householdId): Promise<Person[]>
createHousehold(input): Promise<Household>
createPerson(input): Promise<Person>
updateHousehold(id, patch): Promise<Household>
upsertPersons(householdId, members): Promise<Person[]>
promote.ts
runPromotion(): Promise<{ promoted: number }>
// Clusters chunks with cosine distance < 0.15 that appear ≥ 3 times,
// promotes each cluster to a memory_entries fact, marks chunks promoted=true.
Memory Sources
source value | Created by | Notes |
|---|---|---|
imported | backfill-rag.ts | One-time import of existing memory_entries at migration time |
chat_extract | extractMemories() in memory-extraction.ts | Both structured facts and free-form notes from each conversation |
note | add_note AI tool or POST /api/memory/chunks | Model-authored or user-authored free-form notes |
voice_transcript | (reserved) | Voice session transcripts — future |
camera_observation | (reserved) | Ollama vision responses — future |
Automatic Extraction
After every chat response (triggered every 5 messages, or when the user says "remember that"), the extraction LLM pass now returns:
{
"facts": [
{ "category": "preferences", "key": "coffee_order", "value": "Oat milk latte, no sugar" }
],
"notes": [
"User mentioned they are considering switching jobs and wants to evaluate options in Q2."
]
}
- Each
facts[]entry is upserted tomemory_entries(existing behavior) and written as achat_extractchunk. - Each
notes[]entry is written as achat_extractchunk only — it doesn't fit the structured key/value model.
Multi-Person Households
Modes
| Mode | Behavior |
|---|---|
single | Speaker inference always returns the one household member. person parameters are hidden from AI tool schemas. find_person tool is hidden. |
multi | Speaker inference uses device hints and first-person parsing. AI tools expose the person parameter for attribution. |
Speaker Inference
Speaker inference runs on every turn (when RAG is enabled) and produces personHints[] — a list of person UUIDs used to boost retrieval for that person's chunks.
It does not attribute the speaker in conversations themselves; it only informs which chunks are most relevant. Attribution errors can be corrected after the fact via the memory UI.
First-Run Setup
Navigate to /setup to configure the household. The Household step:
- Picks single vs. multi mode
- Sets the household display name and timezone
- For multi: adds household members with name, pronouns, role, and aliases
This can be edited any time at /setup or via PUT /api/household.
Mode-switch rules:
single → multi: existing person becomesrole='primary'; add more members.multi → single: only allowed when exactly one person remains. Remove others first.
Feature Flag
RAG retrieval is gated behind an environment variable during rollout:
| Env var | Value | Effect |
|---|---|---|
GENIE_RAG_ENABLED | 1 | Enable retrieval injection into the prompt |
(unset / 0) | — | Legacy behavior: only MEMORY.md in prompt |
GENIE_RAG_DISABLED | 1 | Phase-5 kill switch — falls back to legacy even if GENIE_RAG_ENABLED=1 |
With GENIE_RAG_ENABLED=0 (default), chat behavior is byte-identical to pre-RAG behavior. The new tables exist but are not queried during chat.
Backfill & Migration
Running npm run db:migrate (or npm run db:generate && npm run db:migrate) applies the 0003_little_marrow.sql migration and then automatically runs the backfill:
- Seeds the
householdstable fromopengenie.jsonif empty - Seeds
personsfromopengenie.json'shousehold.membersarray (or creates a single default person "You") - Imports all existing
memory_entriesrows as pinnedmemory_chunkswithsource='imported' - Embeds each chunk via Ollama (
nomic-embed-text)
The backfill is idempotent — safe to run multiple times. If Ollama is unreachable, the backfill logs a warning and the migrate command still exits 0. Re-run manually:
npm run backfill:rag
Note:
opengenie.json'shouseholdblock is now a seed-only value. Once a row exists in thehouseholdstable, the database is the source of truth. Edit household settings via/setupor/api/household.
Promotion Job
lib/memory/promote.ts runs daily (register it with the scheduler) and clusters similar chunks:
- Self-joins
memory_chunksonembedding <=> embedding < 0.15(cosine distance) - Finds clusters with ≥ 3 occurrences
- Upserts a canonical fact into
memory_entries - Sets
promoted=trueon all chunk members of the cluster
When GENIE_RAG_ENABLED=1, syncMemoryFile() only writes promoted facts to MEMORY.md, keeping the file small (≤ 20 entries) and the stable prompt cache tight.
AI Actions
| Action | Description |
|---|---|
read_memory | Query by category/key (exact) or semantic=true (vector search) |
update_memory | Create/update a structured fact in memory_entries |
add_note | Save a free-form observation as a chunk |
find_person | Resolve a household member by name, alias, or relationship |
When to use read_memory vs add_note:
- Use
update_memoryfor atomic facts:coffee_order: oat milk latte. - Use
add_notefor narrative context: "User mentioned they are considering a kitchen renovation, scope TBD, budget ~$40k." - Use
read_memory({ semantic: true, key: "kitchen" })to recall notes by topic.
// Store a free-form note
{
"name": "add_note",
"arguments": {
"content": "User is planning a kitchen renovation. Budget approximately $40k, timeline unclear.",
"category": "plans"
}
}
// Recall semantically
{
"name": "read_memory",
"arguments": {
"key": "what home projects is the user planning?",
"semantic": true
}
}
REST Endpoints
Facts (structured memory)
| Method | Path | Description |
|---|---|---|
GET | /api/memory | List entries (?category= or ?q= search) |
POST | /api/memory | Create/upsert entry { category, key, value } |
DELETE | /api/memory/[id] | Delete an entry |
GET | /api/memory/raw | Get SOUL.md and MEMORY.md contents |
PUT | /api/memory/soul | Update SOUL.md |
Chunks (RAG long-tail)
| Method | Path | Description |
|---|---|---|
GET | /api/memory/chunks | List chunks (?q= for semantic search, ?personId= to filter) |
POST | /api/memory/chunks | Create a chunk (embeds automatically) |
GET | /api/memory/chunks/[id] | Get a single chunk |
PATCH | /api/memory/chunks/[id] | Update chunk (re-embeds if content/title changes) |
DELETE | /api/memory/chunks/[id] | Delete a chunk |
Household
| Method | Path | Description |
|---|---|---|
GET | /api/household | Get current household + persons list |
PUT | /api/household | Update household settings + replace persons (transactional) |
Prompt Cache Impact
The system prompt is split at <!-- OPENGENIE_CACHE_BOUNDARY -->:
- Stable half (cached across turns): soul + facts + tool definitions + household member list
- Volatile half (rebuilt per turn): inbound context + current datetime +
## Relevant Memory(retrieved chunks) + runtime info
Retrieved chunks land in the volatile half, so the expensive stable-half cache stays valid across all turns in a conversation. Only the volatile section — which changes anyway — includes the retrieval results.