Why Memory?
Every AI has a problem: it forgets. Here's why memory matters and how Tensorheart solves it.
The Problem
LLMs have a context window—a limit on how much text they can process at once. When you're building an AI agent or chatbot, this creates real challenges:
Your Agent's Reality:
┌─────────────────────────────────────────┐
│ Context Window: 128K tokens │
│ ┌─────────────────────────────────┐ │
│ │ System prompt 2K │ │
│ │ Conversation history 50K │ │
│ │ Retrieved documents 70K │ │
│ │ Available for response 6K │ │ ← Not much room left
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
You could just stuff everything into the context, but:
- It's expensive — More tokens = more cost
- It's slow — Larger contexts take longer to process
- It hurts quality — Irrelevant information confuses the model
The Solution
Tensorheart Memory acts as your AI's intelligent memory system. Instead of dumping everything into context, it:
- Stores information when you learn it
- Retrieves only what's relevant to each query
- Reduces context size automatically
Traditional Approach: With Memory:
┌──────────────────┐ ┌──────────────────┐
│ Send everything │ │ Query: "What's │
│ 50,000 tokens │ │ the user's name?"│
│ $0.15 per query │ │ │
│ Slow, noisy │ │ ↓ Find relevant │
└──────────────────┘ │ │
│ Return: "User's │
│ name is Sarah" │
│ 50 tokens, $0.02 │
└──────────────────┘
How It Works
You store information, then query for what's relevant:
Query: "What programming language does the user prefer?"
Memories: Returned:
├─ "User prefers Python" ✓ (relevant)
├─ "User works at Netflix" ✗ (not relevant)
├─ "User likes dark mode" ✗ (not relevant)
└─ "User mentioned JavaScript once" ✓ (relevant)
Only the relevant memories are returned for your LLM to use.
Why Tensorheart Memory?
Intelligent Retrieval
Memory returns what's actually relevant to your query—not just what's semantically similar.
| Query | What You Get |
|---|---|
| "user's email" | "User's email is john@acme.com" |
| "project deadline" | "Project due March 15" |
Cost-Effective
By sending only relevant context, you dramatically reduce token usage:
| Approach | Cost per Query |
|---|---|
| Full context (50K tokens) | ~$0.15 |
| Memory-filtered (2K tokens) | ~$0.02 |
| Savings | ~87% |
Works With Any LLM
Memory is provider-agnostic. Use it with OpenAI, Anthropic, local models, or any API:
# Works with any LLM you choose
answer = memory.query(
context="What does the user prefer?",
model="gpt-4o" # or claude-3, llama, etc.
)
Real-World Impact
Here's what Memory enables:
| Use Case | Without Memory | With Memory |
|---|---|---|
| Customer Support Bot | Forgets user history | Remembers past issues, preferences |
| Personal Assistant | Asks same questions | Knows your schedule, habits |
| Code Assistant | Searches entire codebase | Finds relevant functions instantly |
| Sales AI | Generic responses | Personalized based on CRM data |
Quick Example
from tensorheart import Memory
# Initialize
memory = Memory(api_key="mem_live_...")
# Store some facts
memory.add("User's name is Sarah")
memory.add("User prefers Python for data analysis")
memory.add("User works at Netflix as a product manager")
# Later, query naturally
result = memory.query("What programming language should I suggest?")
# Returns: "Python — the user prefers it for data analysis"
That's it. Your AI now has memory.
Next Steps
Ready to add memory to your AI?
- Quickstart — Get running in 5 minutes
- Building Agents — Add memory to your agent
- Use Cases — See real-world examples