How Noomachy Uses Vector Search to Find Relevant Memories
Saving memories is the easy part. Finding the right ones at the right moment is where the engineering happens.
The Problem
Your agent has 500 stored memories about you. The user asks: "What was that book my colleague recommended?"
A keyword search for "book" might match nothing — the original memory was "Sarah suggested The Lean Startup". There's no literal "book" in there. Semantic search needs to understand that suggest = recommend and The Lean Startup = book.
Vector embeddings solve this.
What Embeddings Actually Are
An embedding is a fixed-length list of numbers (typically 768 or 1536 dimensions) that captures the meaning of a piece of text. Two pieces of text with similar meaning produce vectors that are close together in this high-dimensional space.
So "Sarah suggested The Lean Startup" and "a book my colleague recommended" end up as nearby vectors, even though they share no actual words.
The Pipeline
Here's how Noomachy uses embeddings under the hood:
1. On Memory Creation
When a fact gets saved to semantic memory:
- The fact's text is sent to Vertex AI's
textembedding-gecko@003model - The model returns a 768-dimensional vector
- The vector is stored alongside the fact
2. On Memory Retrieval
When a new conversation starts (or the user asks something):
- The user's message is embedded
- Cosine similarity is computed between the query vector and every memory vector for this agent
- The top-K most similar memories are returned (default K = 10)
- They're injected into the system prompt
This happens in milliseconds for thousands of memories.
3. The Hybrid Approach
Pure vector search is great for semantic similarity but can miss exact matches. Noomachy uses a hybrid:
- Vector similarity for semantic relevance
- Tag filtering for explicit categorization
- Recency boost for recently accessed memories
- Confidence weighting for high-confidence facts
The combined score determines what gets injected.
Why Top-K and Not All Memories
You could load all of a user's memories into the prompt every time. It's simple and complete. It's also expensive and confusing — irrelevant memories crowd the context window and dilute the model's attention.
Top-K (typically 5-20) gives you the best of both worlds: enough relevant context, no noise.
The Validation Gate Connection
The validation gate (described in Sovereign Memory) uses the same vector search to detect duplicates. When a new fact arrives:
- Embed the new fact
- Search existing memories for the closest match
- If cosine similarity > 0.92 → duplicate, reject
- If similarity is between 0.7 and 0.92 → similar but not identical, queue for review
- Below 0.7 → genuinely new, auto-approve if confidence is high enough
This is how you prevent your memory store from filling up with slight rephrasings of the same fact.
Performance Notes
- Embedding generation: ~50ms per call
- Vector search over 10K memories: < 10ms with proper indexing
- Total memory hydration before each request: ~100-200ms
In production, the embedding step is the bottleneck. Caching embeddings aggressively (which Noomachy does) keeps it fast.
What This Means For You
You don't need to think about any of this. As a user, you just chat. Your agent quietly embeds, validates, stores, and retrieves — and the result is that it remembers things in a way that feels intelligent because it actually understands meaning, not just keywords.
Ready to try Noomachy?
Build AI agents with sovereign memory in minutes. Free tier, no credit card.
Get Started FreeRelated posts
Sovereign Memory: Why AI Agents Need Their Own Brain
Cloud-only AI forgets you. Sovereign memory means your agent keeps a private, persistent memory that you control. Here is how it works.
The Three-Layer Memory System Powering Smart AI Agents
Working memory, semantic memory, and episodic memory — how a three-layer architecture makes AI agents actually remember and learn.
Why Validation Gates Matter in AI Memory Systems
Naive AI memory becomes a junk drawer. Validation gates filter out duplicates, contradictions, and noise so memory stays useful.