State of the Art Performance
Tensorheart Memory achieves state-of-the-art results on the HaluMem benchmark, the most comprehensive evaluation of AI memory systems available.
Benchmark Results
On the HaluMem-Long benchmark—testing memory extraction, updating, and question answering across 2,400+ dialogues with 107,000+ conversation turns—Tensorheart Memory outperforms competing systems:
Question Answering
| System | QA Correctness |
|---|---|
| Tensorheart Memory | 58.1% |
| Supermemory | 53.77% |
Tensorheart achieves 8% higher accuracy on question answering tasks.
Memory Updates
The ability to correctly update memories when new information arrives is critical for long-running AI agents:
| System | Update Accuracy | Hallucination Rate |
|---|---|---|
| Tensorheart Memory | 57.2% | 0% |
| Supermemory | 17.01% | 0.58% |
Tensorheart is 3.4x more accurate at memory updates and achieves a perfect 0% hallucination rate—the system never fabricates information during updates.
Performance by Question Type
Tensorheart excels at challenging memory scenarios:
| Question Type | Accuracy |
|---|---|
| Memory Boundary (knowing what you don't know) | 87.8% |
| Memory Conflict (handling contradictions) | 69.8% |
| Generalization & Application | 45.2% |
| Basic Fact Recall | 41.1% |
| Multi-hop Inference | 34.9% |
| Dynamic Update | 20.8% |
The 87.8% accuracy on Memory Boundary questions means Tensorheart reliably knows the limits of its knowledge—critical for building trustworthy AI applications.
About the HaluMem Benchmark
HaluMem (November 2025) is the first benchmark to evaluate AI memory systems at the operation level—testing extraction, updating, and retrieval as separate components rather than just end-to-end performance. This methodology reveals where systems actually succeed and fail.
The HaluMem-Long variant tests memory systems with:
- 1M+ token contexts
- Distractor content to stress-test retrieval
- Multiple question types requiring different reasoning capabilities
Citation
HaluMem: Hallucination in Memory Systems arXiv:2511.03506, November 2025 https://arxiv.org/abs/2511.03506
Competitor Results
Supermemory's HaluMem results are documented in the benchmark paper and the company's public disclosures. Key metrics:
| Metric | Supermemory |
|---|---|
| QA Correctness | 53.77% |
| Memory Recall | 53.02% |
| Weighted Recall | 70.73% |
| Update Accuracy | 17.01% |
| Update Hallucination Rate | 0.58% |
Source: Supermemory HaluMem Evaluation (as reported in arXiv:2511.03506)
Why This Matters
Zero Hallucination Updates
When your AI assistant updates its memory, you need confidence it won't fabricate information. Tensorheart's 0% hallucination rate on memory updates means you can trust the system to accurately incorporate new information without creating false memories.
Reliable Retrieval
With 58.1% QA correctness—the highest reported on HaluMem-Long—Tensorheart retrieves the right memories more often than any evaluated alternative.
Knowing What You Don't Know
The 87.8% accuracy on Memory Boundary questions is particularly significant. This measures the system's ability to correctly identify when information is not available rather than hallucinating an answer. For production AI applications, this self-awareness prevents confidently wrong responses.
Evaluation Methodology
Our HaluMem evaluation used:
| Parameter | Value |
|---|---|
| Dialogues Evaluated | 2,417 |
| Total Conversation Turns | 107,032 |
| Ground Truth Memories | 14,948 |
| QA Pairs Tested | 3,467 |
| Update Scenarios | 3,788 |
| Total Runtime | 74 hours |
The full evaluation processed over 23,000 memories with embeddings, using relevance threshold scoring to retrieve the most appropriate context for each query.
Next Steps
Ready to use state-of-the-art memory in your application?
- Quickstart — Get running in 5 minutes
- API Reference — Full endpoint documentation
- Building Agents — Add memory to your AI agents