State of the Art Performance
Tensorheart Memory achieves state-of-the-art results on the HaluMem benchmark, the most comprehensive evaluation of AI memory systems available.
Benchmark Results
On the HaluMem-Long benchmark—testing memory extraction, updating, and question answering across 2,400+ dialogues with 107,000+ conversation turns—Tensorheart Memory outperforms competing systems:
Question Answering
| System | QA Correctness |
|---|---|
| Tensorheart Memory | 58.1% |
| Supermemory | 53.77% |
| Zep | 50.19% |
| Memobase | 33.60% |
| Mem0-Graph | 32.44% |
| Mem0 | 28.11% |
Tensorheart Memory achieves 8% higher accuracy than the next best system on question answering tasks.
Memory Updates
The ability to correctly update memories when new information arrives is critical for long-running AI agents:
| System | Update Accuracy | Hallucination Rate |
|---|---|---|
| Tensorheart Memory | 57.2% | 0% |
| Zep | 37.35% | 0.48% |
| Supermemory | 17.01% | 0.58% |
| Memobase | 4.10% | 0.36% |
| Mem0-Graph | 1.47% | 0.04% |
| Mem0 | 1.45% | 0.03% |
Tensorheart Memory is 53% more accurate than the next best system (Zep) at memory updates and achieves a perfect 0% hallucination rate—the system never fabricates information during updates.
Performance by Question Type
Tensorheart Memory excels at challenging memory scenarios:
| Question Type | Accuracy |
|---|---|
| Memory Boundary (knowing what you don't know) | 87.8% |
| Memory Conflict (handling contradictions) | 69.8% |
| Generalization & Application | 45.2% |
| Basic Fact Recall | 41.1% |
| Multi-hop Inference | 34.9% |
| Dynamic Update | 20.8% |
The 87.8% accuracy on Memory Boundary questions means Tensorheart Memory reliably knows the limits of its knowledge—critical for building trustworthy AI applications.
About the HaluMem Benchmark
HaluMem (November 2025) is the first benchmark to evaluate AI memory systems at the operation level—testing extraction, updating, and retrieval as separate components rather than just end-to-end performance. This methodology reveals where systems actually succeed and fail.
The HaluMem-Long variant tests memory systems with:
- 1M+ token contexts
- Distractor content to stress-test retrieval
- Multiple question types requiring different reasoning capabilities
Citation
HaluMem: Hallucination in Memory Systems arXiv:2511.03506, November 2025 https://arxiv.org/abs/2511.03506
Competitor Results
All competitor results are from the HaluMem paper (arXiv:2511.03506). The benchmark evaluated Mem0, Mem0-Graph, Memobase, Supermemory, and Zep.
| System | QA Correctness | Update Accuracy | Update Hallucination |
|---|---|---|---|
| Tensorheart Memory | 58.1% | 57.2% | 0% |
| Supermemory | 53.77% | 17.01% | 0.58% |
| Zep | 50.19% | 37.35% | 0.48% |
| Memobase | 33.60% | 4.10% | 0.36% |
| Mem0-Graph | 32.44% | 1.47% | 0.04% |
| Mem0 | 28.11% | 1.45% | 0.03% |
Source: HaluMem: Evaluating Hallucinations in Memory Systems of Agents
Why This Matters
Zero Hallucination Updates
When your AI assistant updates its memory, you need confidence it won't fabricate information. Tensorheart Memory's 0% hallucination rate on memory updates means you can trust the system to accurately incorporate new information without creating false memories.
Reliable Retrieval
With 58.1% QA correctness—the highest reported on HaluMem-Long—Tensorheart Memory retrieves the right memories more often than any evaluated alternative.
Knowing What You Don't Know
The 87.8% accuracy on Memory Boundary questions is particularly significant. This measures the system's ability to correctly identify when information is not available rather than hallucinating an answer. For production AI applications, this self-awareness prevents confidently wrong responses.
Next Steps
Ready to use state-of-the-art memory in your application?
- Quickstart — Get running in 5 minutes
- API Reference — Full endpoint documentation
- Building Agents — Add memory to your AI agents