RyMetrics

Commits (60 days)
Steps Today
--
Sync via iOS Shortcut
Heart Rate / HRV
--
Sync via iOS Shortcut
Supplement Protocol
AM
Vitamin D3 + K2
Lion's Mane (1/2 serving)
MitoCore (1/2 serving)
NAC, Acetyl-L-Carnitine, ALA, B-complex
Keto Electrolytes
CDP-Choline 250mg
NAG (3 capsules)
Midday
Omega-3 Fish Oil
Turmeric / Curcumin
NAC 600mg (standalone)
Mushroom Complex
Quercetin Complex
NAG (3 capsules)
PM
Alpha-Lipoic Acid 600mg
Away from minerals
Magnesium Glycinate (1/2)
Glycine 1g (before sleep)
NAG (3 capsules)

Building AI Memory with Supermemory

How I implemented persistent memory for RyBot using a hybrid architecture of session memory, user-specific RAG, and a shared knowledge base.

Memory is the hardest part of building voice agents. Users expect AI to remember things across conversations, but most implementations start fresh every session. When I built RyBot, I wanted it to feel like talking to someone who actually knows you.

Here's how I built a four-tier memory system using Supermemory: session memory, user profiles, episodic memories, and a shared knowledge base for personality emulation.


The Problem with AI Memory

Most AI assistants have two memory modes:

  1. No memory - Every conversation starts fresh
  2. Context window stuffing - Dump everything into the prompt until you hit token limits

Neither works well for production voice agents. No memory feels impersonal. Context stuffing is expensive, slow, and eventually overflows.

The fix is semantic retrieval (RAG). The AI remembers relevant things based on what the user is talking about, not everything it knows.


The Four-Tier Memory Architecture

RyBot uses four distinct memory layers:

Layer Scope Persistence Use Case
Session Memory Current conversation In-memory (Map) Hot data, immediate context
User Profile Individual user Supermemory (Profiles) Static facts: name, location, preferences
User Memory Individual user Supermemory RAG Episodic memories, dynamic context
RyBot Knowledge All users Supermemory RAG Personality, facts about Ryan

The key distinction between Profile and User Memory:

Hat tip to Dhravya Shah (Supermemory founder) for suggesting this architectural refinement.


Why Supermemory?

Supermemory is a managed RAG service that handles the hard parts of memory:

The documentation is solid and the npm package is straightforward to integrate.


Implementation: User-Specific Memory

Each user gets their own memory container, tagged by session ID. When a user mentions something worth remembering, Claude decides to save it:

import { Supermemory } from 'supermemory';

const supermemory = new Supermemory({
  apiKey: process.env.SUPERMEMORY_API_KEY,
});

// Store a memory for a specific user
async function storeLongTermMemory(content, userId = 'default') {
  await supermemory.memories.add({
    content: content,
    containerTags: [`user_${userId}`],
  });
}

// Retrieve relevant memories before responding
async function getLongTermMemories(query, userId = 'default') {
  const results = await supermemory.search.execute({
    q: query,
    containerTags: [`user_${userId}`],
    limit: 5,
    searchMode: 'hybrid', // Combines semantic + keyword matching
  });

  return results.results
    .filter(r => r.score > 0.7)
    .map(r => r.content);
}
Pro tip: Always set searchMode: 'hybrid' on your search queries. It combines semantic search with keyword matching for 10-15% better context retrieval. No migration needed - just add the parameter.

The trick: query Supermemory with the user's message before calling Claude. You get relevant context without stuffing the entire history into the prompt.

Memory Storage via Tool Use

I give Claude a save_memory tool using Anthropic's tool use API. When Claude detects something worth remembering, it calls the tool:

{
  "name": "save_memory",
  "description": "Save important information about the user",
  "input_schema": {
    "type": "object",
    "properties": {
      "key": { "type": "string" },
      "value": { "type": "string" }
    },
    "required": ["key", "value"]
  }
}

Claude might call this with {"key": "coffee_preference", "value": "Loves pour-over, uses a Chemex"} when the user mentions their coffee setup.


Implementation: Shared Knowledge Base

RyBot needs to emulate me authentically, which means it needs to know facts about Ryan. This is different from user memory - it's shared across all conversations.

I seeded the knowledge base with ~25 facts:

const RYAN_FACTS = [
  "Ryan is from Western NY, specifically the Rochester/Buffalo area.",
  "Ryan now lives in NYC/Jersey City. He loves the TriBeCa neighborhood.",
  "Ryan has two dogs - pitbull mixes he affectionately calls 'the puppies'.",
  "Ryan is a coffee nerd. He uses a Chemex and Kalita Wave pour-over.",
  "Ryan's favorite coffee roaster is Hydrangea Coffee.",
  "Ryan plays D&D - he's DM'd the Curse of Strahd campaign.",
  // ... more facts
];

async function seedKnowledge() {
  for (const fact of RYAN_FACTS) {
    await supermemory.memories.add({
      content: fact,
      containerTags: ['rybot_knowledge'],
    });
  }
}

When a user asks something, I query both the user's memories AND the shared knowledge base in parallel:

const [userMemories, rybotKnowledge] = await Promise.all([
  getLongTermMemories(userMessage, sessionId),
  getRybotKnowledge(userMessage),
]);

// Inject both into Claude's context
const systemPrompt = buildPromptWithContext({
  userMemories,
  rybotKnowledge,
});

The Knowledge Graph Visualization

Supermemory provides a React component called @supermemory/memory-graph that visualizes the relationships between memories. I exposed this on the Knowledge Graph page.

The widget below shows RyBot's actual memory network. Drag to pan around. Open the full Knowledge Graph to click into individual memories and see their contents.
RyBot's Knowledge Graph Open full view
Loading knowledge graph...

The visualization uses the same data endpoint as the chat - it's fetching real memories from Supermemory and computing similarity relationships server-side.

Server-Side Graph Computation

Computing relationships between ~100 memories on every request would be slow. Instead, I pre-compute the graph in a background job and cache it:

async function refreshKnowledgeGraphCache() {
  // List all documents from Supermemory
  const documents = await supermemory.documents.list();

  // Compute similarity relationships
  for (const doc of documents) {
    const similar = await supermemory.search.execute({
      q: doc.content,
      limit: 10,
    });

    doc.relations = similar.results
      .filter(r => r.id !== doc.id && r.score > 0.5)
      .map(r => ({ targetId: r.id, weight: r.score }));
  }

  // Cache for 15 minutes
  knowledgeGraphCache.data = documents;
  knowledgeGraphCache.timestamp = Date.now();
}

The cache persists to Supermemory itself, so it survives server restarts and Railway redeploys.


Memory Injection Into Context

Here's how memories flow into the system prompt:

let memoryContext = '';

// Session memories (hot data from current conversation)
if (Object.keys(sessionMemories).length > 0) {
  memoryContext += 'SESSION MEMORIES:\\n';
  for (const [key, value] of Object.entries(sessionMemories)) {
    memoryContext += `- ${key}: ${value}\\n`;
  }
}

// Long-term memories (from Supermemory)
if (longTermMemories?.length > 0) {
  memoryContext += '\\nLONG-TERM MEMORIES:\\n';
  longTermMemories.forEach(m => {
    memoryContext += `- ${m}\\n`;
  });
}

// RyBot knowledge (shared personality facts)
if (rybotKnowledge?.length > 0) {
  memoryContext += '\\nRYAN FACTS (for authentic emulation):\\n';
  rybotKnowledge.forEach(k => {
    memoryContext += `- ${k}\\n`;
  });
}

This gets prepended to Claude's system prompt. Semantic search means only relevant memories show up, not everything RyBot knows.


Performance Considerations

Voice agents are latency-sensitive. Some things I figured out:

// Non-blocking storage (fire and forget)
storeLongTermMemory(`${key}: ${value}`, sessionId)
  .catch(err => console.error('Memory save failed:', err.message));

Zero-Latency Memory Saving with Webhooks

Even fire-and-forget API calls add ~50-200ms of server-side processing. For voice agents where every millisecond matters, I moved to a deferred batch approach using Hume's chat_ended webhook.

The idea: queue memories in-memory during the conversation (instant), then flush everything to Supermemory when the call ends.

// Queue structure per chat session
const pendingMemoriesQueue = new Map();
// Key: chatId, Value: { sessionId, profiles: [], memories: [] }

// During conversation - instant, no API call
function queuePendingMemory(chatId, sessionId, type, key, value) {
  if (!pendingMemoriesQueue.has(chatId)) {
    pendingMemoriesQueue.set(chatId, {
      sessionId,
      profiles: [],
      memories: [],
      createdAt: Date.now()
    });
  }
  const queue = pendingMemoriesQueue.get(chatId);
  if (type === 'profile') {
    queue.profiles.push({ key, value });
  } else {
    queue.memories.push({ key, value });
  }
}

// When call ends - Hume webhook triggers this
app.post('/webhook/hume', async (req, res) => {
  const { event_type, chat_id } = req.body;

  if (event_type === 'chat_ended' && chat_id) {
    await flushPendingMemories(chat_id);
  }

  res.json({ success: true });
});
Safety net: If the webhook fails (network issues, server restart), a background interval flushes stale queues after 2 hours. Better late than never.

Result: zero added latency during conversation. Users still get immediate benefit within the same call (session memory works instantly), and everything persists when they hang up.

API Usage Optimization

After deploying memory features, I noticed my Supermemory usage spiking to 6,500+ API calls in a few days. Investigation revealed the culprits:

The fix was a multi-layer caching strategy:

// 1. Cache RyBot knowledge (shared across all users, rarely changes)
const rybotKnowledgeCache = { data: null, timestamp: 0 };
const RYBOT_CACHE_TTL = 1000 * 60 * 10; // 10 minutes

// 2. Cache user profiles per session (static facts)
const userProfileCache = new Map(); // sessionId -> { data, timestamp }
const PROFILE_CACHE_TTL = 1000 * 60 * 30; // 30 minutes

// 3. Skip semantic search for short messages (no value)
const MIN_QUERY_LENGTH = 10;

async function getMemoryContext(message, sessionId) {
  // Skip search for short messages like "hi", "ok", "thanks"
  if (message.length < MIN_QUERY_LENGTH) {
    return { memories: [], knowledge: getCachedRybotKnowledge() };
  }

  // Parallel fetch with caching
  const [memories, knowledge] = await Promise.all([
    getLongTermMemories(message, sessionId),
    getCachedRybotKnowledge(message)
  ]);

  return { memories, knowledge };
}

Results: ~98% reduction in API calls. From ~1,584 calls/day down to ~33 calls/day. Knowledge graph refresh moved from 30 minutes to 24 hours (manually triggerable from my dashboard when I add new facts).


Advanced: LLM Filtering

Supermemory has an optional LLM Filtering feature in Advanced Settings that processes content before storage. This is useful if your memories tend to be verbose or contain conversational filler.

To enable it, go to console.supermemory.ai → Advanced Settings → Enable LLM Filtering, then add a filter prompt.

Here's the prompt I use:

Extract and store only factual, specific information. Remove:
- Conversational filler ("I think", "maybe", "kind of")
- Redundant context ("As I mentioned before")
- Temporary/time-sensitive info ("today", "right now")

Keep:
- Personal preferences and opinions with specifics
- Facts about people, places, experiences
- Skills, interests, and background info

Format as concise declarative statements.

This cleans up memories before they're stored - removing filler while keeping the useful facts. Combined with deduplication, it keeps your memory store lean and relevant.


What I'd Do Differently

  1. Memory deduplication - Users sometimes repeat things. Need to detect and merge similar memories.
  2. Feature added Dec 30, 2025 @ 11:45pm EST
  3. Memory decay - Old memories should fade. Currently everything persists forever.
  4. Feature added Dec 30, 2025 @ 11:45pm EST
  5. User-controlled deletion - Let users see and delete their memories.
  6. Feature added Jan 3, 2026 @ 11:45pm EST
  7. *Users can't view stored memories directly, but can ask RyBot to delete all their data at any time.

Resources


Ryan Haigh

Ryan Haigh

Product Builder & AI Developer

Ryan builds AI voice experiences and writes about the intersection of voice technology, memory systems, and product development. RyBot is his AI voice clone powered by Hume EVI and Claude.


Frequently Asked Questions

What is Supermemory?

Supermemory is a managed RAG (Retrieval-Augmented Generation) service that handles memory storage, embedding, and semantic search for AI applications. It automatically chunks content, generates embeddings, and provides fast similarity search without requiring you to manage vector databases.

How does AI memory differ from context windows?

Context windows include everything in the current prompt, which has token limits and costs money per token. AI memory uses semantic search to pull only relevant info, keeping context small and focused. Way more scalable and cheaper in the long run.

Can I use Supermemory with any AI model?

Yes. Supermemory is model-agnostic - it stores and retrieves memories that you can inject into any LLM's context. I use it with Claude, but it works equally well with GPT-4, Llama, or any other model.

How do you prevent duplicate memories?

Currently, I rely on the similarity threshold during retrieval - if a memory is very similar to an existing one, only the most relevant gets returned. A better approach would be to check for duplicates before storing, using semantic similarity to detect near-matches.

What is the knowledge graph showing?

The knowledge graph visualizes relationships between RyBot's memories. Each node is a memory or fact, and edges connect semantically related memories. Clusters form around topics like "coffee preferences" or "work history" because those memories have high similarity scores.