Skip to main content

State of the Art Performance

Tensorheart Memory achieves state-of-the-art results on the HaluMem benchmark, the most comprehensive evaluation of AI memory systems available.

Benchmark Results

On the HaluMem-Long benchmark—testing memory extraction, updating, and question answering across 2,400+ dialogues with 107,000+ conversation turns—Tensorheart Memory outperforms competing systems:

Question Answering

SystemQA Correctness
Tensorheart Memory58.1%
Supermemory53.77%
Zep50.19%
Memobase33.60%
Mem0-Graph32.44%
Mem028.11%

Tensorheart Memory achieves 8% higher accuracy than the next best system on question answering tasks.

Memory Updates

The ability to correctly update memories when new information arrives is critical for long-running AI agents:

SystemUpdate AccuracyHallucination Rate
Tensorheart Memory57.2%0%
Zep37.35%0.48%
Supermemory17.01%0.58%
Memobase4.10%0.36%
Mem0-Graph1.47%0.04%
Mem01.45%0.03%

Tensorheart Memory is 53% more accurate than the next best system (Zep) at memory updates and achieves a perfect 0% hallucination rate—the system never fabricates information during updates.

Performance by Question Type

Tensorheart Memory excels at challenging memory scenarios:

Question TypeAccuracy
Memory Boundary (knowing what you don't know)87.8%
Memory Conflict (handling contradictions)69.8%
Generalization & Application45.2%
Basic Fact Recall41.1%
Multi-hop Inference34.9%
Dynamic Update20.8%

The 87.8% accuracy on Memory Boundary questions means Tensorheart Memory reliably knows the limits of its knowledge—critical for building trustworthy AI applications.

About the HaluMem Benchmark

HaluMem (November 2025) is the first benchmark to evaluate AI memory systems at the operation level—testing extraction, updating, and retrieval as separate components rather than just end-to-end performance. This methodology reveals where systems actually succeed and fail.

The HaluMem-Long variant tests memory systems with:

  • 1M+ token contexts
  • Distractor content to stress-test retrieval
  • Multiple question types requiring different reasoning capabilities

Citation

HaluMem: Hallucination in Memory Systems arXiv:2511.03506, November 2025 https://arxiv.org/abs/2511.03506

Competitor Results

All competitor results are from the HaluMem paper (arXiv:2511.03506). The benchmark evaluated Mem0, Mem0-Graph, Memobase, Supermemory, and Zep.

SystemQA CorrectnessUpdate AccuracyUpdate Hallucination
Tensorheart Memory58.1%57.2%0%
Supermemory53.77%17.01%0.58%
Zep50.19%37.35%0.48%
Memobase33.60%4.10%0.36%
Mem0-Graph32.44%1.47%0.04%
Mem028.11%1.45%0.03%

Source: HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Why This Matters

Zero Hallucination Updates

When your AI assistant updates its memory, you need confidence it won't fabricate information. Tensorheart Memory's 0% hallucination rate on memory updates means you can trust the system to accurately incorporate new information without creating false memories.

Reliable Retrieval

With 58.1% QA correctness—the highest reported on HaluMem-Long—Tensorheart Memory retrieves the right memories more often than any evaluated alternative.

Knowing What You Don't Know

The 87.8% accuracy on Memory Boundary questions is particularly significant. This measures the system's ability to correctly identify when information is not available rather than hallucinating an answer. For production AI applications, this self-awareness prevents confidently wrong responses.

Next Steps

Ready to use state-of-the-art memory in your application?