Skip to main content

Memory System

Open Genie has a persistent, retrieval-augmented knowledge system that scales from a handful of structured facts to an unbounded store of chat extracts, free-form notes, voice transcripts, and camera observations — pulling in only the slices relevant to the current conversation turn.

Architecture

The system uses three storage tiers, each with a different performance profile and retrieval mechanism.

Three-Tier Model

TierTableSize budgetWhen in promptPrimary writer
Factsmemory_entries≤ ~50 entriesAlways (stable half, prompt-cached)update_memory tool, promoted from chunks
Chunksmemory_chunksUnboundedTop-8 per turn, retrieved by semantic search (volatile half)extractMemories, add_note tool, UI
Peoplehouseholds + personsOne household, ~1–10 personsMember list in stable half; person names on chunks in volatile halfSetup UI, /api/household

Why two tables instead of one

  • memory_entries is the "always-in-prompt" hot set. It must stay tiny so the cached stable prompt stays cheap to re-use across turns.
  • memory_chunks is the long tail — append-only growth, retrieved on demand, embedded as 768-dim vectors.
  • Two tables keeps queries simple and lets both evolve independently. The promotion job (Phase 5) gradually moves stable chunks into facts.

Preconditions

Before the RAG system is active, two runtime dependencies must be satisfied:

  1. PostgreSQL ≥ 13 with pgvector ≥ 0.5.0. Confirm after running migrations:

    SELECT extversion FROM pg_extension WHERE extname='vector';

    The HNSW index (fast approximate nearest-neighbour) requires ≥ 0.5.0.

  2. nomic-embed-text pulled in Ollama (~270 MB, 768-dim):

    ollama pull nomic-embed-text

    The preflight check (npm run preflight) warns if the model is missing.


Retrieval Flow

Every chat turn where GENIE_RAG_ENABLED=1 is set, this sequence runs before the system prompt is assembled:

1. getActiveHouseholdId() — single DB row, cached per request
2. inferSpeaker(deviceId, msg) — device hint → single-mode → first-person parser
3. embed(userMessage) — Ollama /api/embeddings, ~30–80 ms locally
4. Two parallel DB queries:
- Vector: cosine on memory_chunks.embedding (HNSW index), top-40
- Keyword: tsvector rank on memory_chunks.content_tsv (GIN index), top-40
5. Reciprocal Rank Fusion (k=60) — merge both lists by rank
6. Heuristic re-rank:
- pinned → +0.20
- matched person hint → +0.15
- recency decay: -0.10 × min(1, ageInDays / 90) [skipped for pinned]
7. Take top-8, format as ## Relevant Memory, inject into volatile prompt half

The stable half of the system prompt (soul + facts + tool definitions + household members) is unchanged per-turn and benefits from prompt caching. Only the volatile section — including the retrieved chunks — changes each turn.


Embedding Model

Ollama nomic-embed-text (768-dim) is the default.

Env varDefaultPurpose
GENIE_EMBED_MODELnomic-embed-textOverride the embedding model. Changing it requires re-running the backfill.

The embedding_model column is stored per chunk so the corpus can be gradually re-indexed without downtime when the model is swapped.


Data Model

households

One row per installation (v1 is always single-household).

ColumnTypeNotes
iduuidPK
display_namevarcharShown in prompt and UI
modevarchar"single" | "multi"
timezonevarchare.g. "America/New_York"
localevarchare.g. "en-US"
created_attimestamptz

persons

One row per household member. Aliases let "mom", "wife", "Sarah" all resolve to the same row.

ColumnTypeNotes
iduuidPK
household_iduuid FKCascade delete
display_namevarcharCanonical name
pronounsvarcharOptional
rolevarchar"primary" | "spouse" | "child" | "guest"
aliasesjsonb string[]e.g. ["mom", "Sarah"]
birth_datevarcharYYYY-MM-DD
notestext

memory_chunks

The unbounded long-tail store.

ColumnTypeNotes
iduuidPK
household_iduuid FKCascade delete
person_iduuid FKNull = household-scoped
sourcevarchar"chat_extract" | "note" | "imported" | "voice_transcript" | "camera_observation"
source_refjsonb{ conversationId?, memoryEntryId?, … }
categoryvarcharOptional bucket (same categories as facts)
titlevarcharShort headline for UI
contenttext1–3 sentences typical
observed_attimestamptzWhen the fact was observed
expires_attimestamptzNull = evergreen
confidenceinteger0–100
pinnedbooleanPinned chunks get +0.20 retrieval boost, skip recency decay
promotedbooleanSet to true when the promotion job has created a memory_entries fact
embeddingvector(768)Populated by addChunk() automatically
embedding_modelvarcharModel name at embed time
content_tsvtsvectorGenerated column — `to_tsvector('english', title

Indexes:

IndexTypeUsed for
memory_chunks_embedding_idxHNSW (cosine)Fast approximate nearest-neighbour
memory_chunks_tsv_idxGINFull-text keyword search
memory_chunks_household_idxbtreeAll per-household queries
memory_chunks_person_idxbtreePer-person filtering
memory_chunks_observed_idxbtreeRecency ordering

Additive columns on existing tables

TableColumnPurpose
memory_entrieshousehold_idScope facts to a household
memory_entriesperson_idAttribute facts to a person
devicesprimary_person_idSpeaker inference — which person typically uses this device

Module API

All new code lives under apps/web/lib/memory/.

embeddings.ts

embed(text: string): Promise<number[]>
embedBatch(texts: string[], concurrency?: number): Promise<number[][]>
toVectorLiteral(v: number[]): string // "[0.1,0.2,...]" for raw SQL
getEmbedModel(): string
getEmbedDim(): number // 768

chunks.ts

addChunk(input: AddChunkInput): Promise<MemoryChunk>
// Always calls embed() internally — callers cannot forget to embed.

getChunk(id: string, householdId?: string): Promise<MemoryChunk | null>
listChunks(opts: { householdId, personId?, limit?, cursor? }): Promise<MemoryChunk[]>
updateChunk(id: string, patch): Promise<MemoryChunk>
// Re-embeds automatically when content or title changes.

deleteChunk(id: string): Promise<void>
pinChunk(id: string, pinned: boolean): Promise<void>

retrieve.ts

retrieveMemory(opts: RetrieveOptions): Promise<RankedChunk[]>
formatChunksForPrompt(chunks, personLabels): string
shouldRetrieve(message, history): boolean
// Returns false for pure greetings on the first turn.

RetrieveOptions:

{
householdId: string;
query: string;
personHints?: string[]; // person UUIDs to boost in re-ranking
k?: number; // default 8
recency?: "boost" | "neutral";
}

speaker.ts

inferSpeaker(input: InferSpeakerInput): Promise<SpeakerInference>
// Resolution order:
// 1. Voice session speaker ID (v2 TODO)
// 2. devices.primary_person_id (confidence: high)
// 3. Single-mode fast path: return the only person (confidence: high)
// 4. First-person parser: "my wife is cooking" → speaker is not the wife
// 5. Fallback: [] with confidence: low

resolvePersonRef(householdId, ref): Promise<{ id, displayName } | null>
// Matches displayName, aliases[], or role — returns null if ambiguous.

persons.ts

getActiveHouseholdId(): Promise<string> // cached per process
clearHouseholdCache(): void
loadPersonLabels(householdId): Promise<Map<string, string>>
getPersons(householdId): Promise<Person[]>
createHousehold(input): Promise<Household>
createPerson(input): Promise<Person>
updateHousehold(id, patch): Promise<Household>
upsertPersons(householdId, members): Promise<Person[]>

promote.ts

runPromotion(): Promise<{ promoted: number }>
// Clusters chunks with cosine distance < 0.15 that appear ≥ 3 times,
// promotes each cluster to a memory_entries fact, marks chunks promoted=true.

Memory Sources

source valueCreated byNotes
importedbackfill-rag.tsOne-time import of existing memory_entries at migration time
chat_extractextractMemories() in memory-extraction.tsBoth structured facts and free-form notes from each conversation
noteadd_note AI tool or POST /api/memory/chunksModel-authored or user-authored free-form notes
voice_transcript(reserved)Voice session transcripts — future
camera_observation(reserved)Ollama vision responses — future

Automatic Extraction

After every chat response (triggered every 5 messages, or when the user says "remember that"), the extraction LLM pass now returns:

{
"facts": [
{ "category": "preferences", "key": "coffee_order", "value": "Oat milk latte, no sugar" }
],
"notes": [
"User mentioned they are considering switching jobs and wants to evaluate options in Q2."
]
}
  • Each facts[] entry is upserted to memory_entries (existing behavior) and written as a chat_extract chunk.
  • Each notes[] entry is written as a chat_extract chunk only — it doesn't fit the structured key/value model.

Multi-Person Households

Modes

ModeBehavior
singleSpeaker inference always returns the one household member. person parameters are hidden from AI tool schemas. find_person tool is hidden.
multiSpeaker inference uses device hints and first-person parsing. AI tools expose the person parameter for attribution.

Speaker Inference

Speaker inference runs on every turn (when RAG is enabled) and produces personHints[] — a list of person UUIDs used to boost retrieval for that person's chunks.

It does not attribute the speaker in conversations themselves; it only informs which chunks are most relevant. Attribution errors can be corrected after the fact via the memory UI.

First-Run Setup

Navigate to /setup to configure the household. The Household step:

  • Picks single vs. multi mode
  • Sets the household display name and timezone
  • For multi: adds household members with name, pronouns, role, and aliases

This can be edited any time at /setup or via PUT /api/household.

Mode-switch rules:

  • single → multi: existing person becomes role='primary'; add more members.
  • multi → single: only allowed when exactly one person remains. Remove others first.

Feature Flag

RAG retrieval is gated behind an environment variable during rollout:

Env varValueEffect
GENIE_RAG_ENABLED1Enable retrieval injection into the prompt
(unset / 0)Legacy behavior: only MEMORY.md in prompt
GENIE_RAG_DISABLED1Phase-5 kill switch — falls back to legacy even if GENIE_RAG_ENABLED=1

With GENIE_RAG_ENABLED=0 (default), chat behavior is byte-identical to pre-RAG behavior. The new tables exist but are not queried during chat.


Backfill & Migration

Running npm run db:migrate (or npm run db:generate && npm run db:migrate) applies the 0003_little_marrow.sql migration and then automatically runs the backfill:

  1. Seeds the households table from opengenie.json if empty
  2. Seeds persons from opengenie.json's household.members array (or creates a single default person "You")
  3. Imports all existing memory_entries rows as pinned memory_chunks with source='imported'
  4. Embeds each chunk via Ollama (nomic-embed-text)

The backfill is idempotent — safe to run multiple times. If Ollama is unreachable, the backfill logs a warning and the migrate command still exits 0. Re-run manually:

npm run backfill:rag

Note: opengenie.json's household block is now a seed-only value. Once a row exists in the households table, the database is the source of truth. Edit household settings via /setup or /api/household.


Promotion Job

lib/memory/promote.ts runs daily (register it with the scheduler) and clusters similar chunks:

  1. Self-joins memory_chunks on embedding <=> embedding < 0.15 (cosine distance)
  2. Finds clusters with ≥ 3 occurrences
  3. Upserts a canonical fact into memory_entries
  4. Sets promoted=true on all chunk members of the cluster

When GENIE_RAG_ENABLED=1, syncMemoryFile() only writes promoted facts to MEMORY.md, keeping the file small (≤ 20 entries) and the stable prompt cache tight.


AI Actions

ActionDescription
read_memoryQuery by category/key (exact) or semantic=true (vector search)
update_memoryCreate/update a structured fact in memory_entries
add_noteSave a free-form observation as a chunk
find_personResolve a household member by name, alias, or relationship

When to use read_memory vs add_note:

  • Use update_memory for atomic facts: coffee_order: oat milk latte.
  • Use add_note for narrative context: "User mentioned they are considering a kitchen renovation, scope TBD, budget ~$40k."
  • Use read_memory({ semantic: true, key: "kitchen" }) to recall notes by topic.
// Store a free-form note
{
"name": "add_note",
"arguments": {
"content": "User is planning a kitchen renovation. Budget approximately $40k, timeline unclear.",
"category": "plans"
}
}

// Recall semantically
{
"name": "read_memory",
"arguments": {
"key": "what home projects is the user planning?",
"semantic": true
}
}

REST Endpoints

Facts (structured memory)

MethodPathDescription
GET/api/memoryList entries (?category= or ?q= search)
POST/api/memoryCreate/upsert entry { category, key, value }
DELETE/api/memory/[id]Delete an entry
GET/api/memory/rawGet SOUL.md and MEMORY.md contents
PUT/api/memory/soulUpdate SOUL.md

Chunks (RAG long-tail)

MethodPathDescription
GET/api/memory/chunksList chunks (?q= for semantic search, ?personId= to filter)
POST/api/memory/chunksCreate a chunk (embeds automatically)
GET/api/memory/chunks/[id]Get a single chunk
PATCH/api/memory/chunks/[id]Update chunk (re-embeds if content/title changes)
DELETE/api/memory/chunks/[id]Delete a chunk

Household

MethodPathDescription
GET/api/householdGet current household + persons list
PUT/api/householdUpdate household settings + replace persons (transactional)

Prompt Cache Impact

The system prompt is split at <!-- OPENGENIE_CACHE_BOUNDARY -->:

  • Stable half (cached across turns): soul + facts + tool definitions + household member list
  • Volatile half (rebuilt per turn): inbound context + current datetime + ## Relevant Memory (retrieved chunks) + runtime info

Retrieved chunks land in the volatile half, so the expensive stable-half cache stays valid across all turns in a conversation. Only the volatile section — which changes anyway — includes the retrieval results.