Skip to main content

Why Memory?

Every AI has a problem: it forgets. Here's why memory matters and how Tensorheart solves it.

The Problem

LLMs have a context window—a limit on how much text they can process at once. When you're building an AI agent or chatbot, this creates real challenges:

Your Agent's Reality:
┌─────────────────────────────────────────┐
│ Context Window: 128K tokens │
│ ┌─────────────────────────────────┐ │
│ │ System prompt 2K │ │
│ │ Conversation history 50K │ │
│ │ Retrieved documents 70K │ │
│ │ Available for response 6K │ │ ← Not much room left
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘

You could just stuff everything into the context, but:

  • It's expensive — More tokens = more cost
  • It's slow — Larger contexts take longer to process
  • It hurts quality — Irrelevant information confuses the model

The Solution

Tensorheart Memory acts as your AI's intelligent memory system. Instead of dumping everything into context, it:

  1. Stores information when you learn it
  2. Retrieves only what's relevant to each query
  3. Reduces context size automatically
Traditional Approach:              With Memory:
┌──────────────────┐ ┌──────────────────┐
│ Send everything │ │ Query: "What's │
│ 50,000 tokens │ │ the user's name?"│
│ $0.15 per query │ │ │
│ Slow, noisy │ │ ↓ Find relevant │
└──────────────────┘ │ │
│ Return: "User's │
│ name is Sarah" │
│ 50 tokens, $0.02 │
└──────────────────┘

How It Works

You store information, then query for what's relevant:

Query: "What programming language does the user prefer?"

Memories: Returned:
├─ "User prefers Python" ✓ (relevant)
├─ "User works at Netflix" ✗ (not relevant)
├─ "User likes dark mode" ✗ (not relevant)
└─ "User mentioned JavaScript once" ✓ (relevant)

Only the relevant memories are returned for your LLM to use.

Why Tensorheart Memory?

Intelligent Retrieval

Memory returns what's actually relevant to your query—not just what's semantically similar.

QueryWhat You Get
"user's email""User's email is john@acme.com"
"project deadline""Project due March 15"

Cost-Effective

By sending only relevant context, you dramatically reduce token usage:

ApproachCost per Query
Full context (50K tokens)~$0.15
Memory-filtered (2K tokens)~$0.02
Savings~87%

Works With Any LLM

Memory is provider-agnostic. Use it with OpenAI, Anthropic, local models, or any API:

# Works with any LLM you choose
answer = memory.query(
context="What does the user prefer?",
model="gpt-4o" # or claude-3, llama, etc.
)

Real-World Impact

Here's what Memory enables:

Use CaseWithout MemoryWith Memory
Customer Support BotForgets user historyRemembers past issues, preferences
Personal AssistantAsks same questionsKnows your schedule, habits
Code AssistantSearches entire codebaseFinds relevant functions instantly
Sales AIGeneric responsesPersonalized based on CRM data

Quick Example

from tensorheart import Memory

# Initialize
memory = Memory(api_key="mem_live_...")

# Store some facts
memory.add("User's name is Sarah")
memory.add("User prefers Python for data analysis")
memory.add("User works at Netflix as a product manager")

# Later, query naturally
result = memory.query("What programming language should I suggest?")
# Returns: "Python — the user prefers it for data analysis"

That's it. Your AI now has memory.

Next Steps

Ready to add memory to your AI?