State of the Art Performance

Tensorheart Memory achieves state-of-the-art results on the HaluMem benchmark, the most comprehensive evaluation of AI memory systems available.

Benchmark Results

On the HaluMem-Long benchmark—testing memory extraction, updating, and question answering across 2,400+ dialogues with 107,000+ conversation turns—Tensorheart Memory outperforms competing systems:

Question Answering

System	QA Correctness
Tensorheart Memory	58.1%
Supermemory	53.77%
Zep	50.19%
Memobase	33.60%
Mem0-Graph	32.44%
Mem0	28.11%

Tensorheart Memory achieves 8% higher accuracy than the next best system on question answering tasks.

Memory Updates

The ability to correctly update memories when new information arrives is critical for long-running AI agents:

System	Update Accuracy	Hallucination Rate
Tensorheart Memory	57.2%	0%
Zep	37.35%	0.48%
Supermemory	17.01%	0.58%
Memobase	4.10%	0.36%
Mem0-Graph	1.47%	0.04%
Mem0	1.45%	0.03%

Tensorheart Memory is 53% more accurate than the next best system (Zep) at memory updates and achieves a perfect 0% hallucination rate—the system never fabricates information during updates.

Performance by Question Type

Tensorheart Memory excels at challenging memory scenarios:

Question Type	Accuracy
Memory Boundary (knowing what you don't know)	87.8%
Memory Conflict (handling contradictions)	69.8%
Generalization & Application	45.2%
Basic Fact Recall	41.1%
Multi-hop Inference	34.9%
Dynamic Update	20.8%

The 87.8% accuracy on Memory Boundary questions means Tensorheart Memory reliably knows the limits of its knowledge—critical for building trustworthy AI applications.

About the HaluMem Benchmark

HaluMem (November 2025) is the first benchmark to evaluate AI memory systems at the operation level—testing extraction, updating, and retrieval as separate components rather than just end-to-end performance. This methodology reveals where systems actually succeed and fail.

The HaluMem-Long variant tests memory systems with:

1M+ token contexts
Distractor content to stress-test retrieval
Multiple question types requiring different reasoning capabilities

Citation

HaluMem: Hallucination in Memory Systems arXiv:2511.03506, November 2025 https://arxiv.org/abs/2511.03506

Competitor Results

All competitor results are from the HaluMem paper (arXiv:2511.03506). The benchmark evaluated Mem0, Mem0-Graph, Memobase, Supermemory, and Zep.

System	QA Correctness	Update Accuracy	Update Hallucination
Tensorheart Memory	58.1%	57.2%	0%
Supermemory	53.77%	17.01%	0.58%
Zep	50.19%	37.35%	0.48%
Memobase	33.60%	4.10%	0.36%
Mem0-Graph	32.44%	1.47%	0.04%
Mem0	28.11%	1.45%	0.03%

Source: HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Why This Matters

Zero Hallucination Updates

When your AI assistant updates its memory, you need confidence it won't fabricate information. Tensorheart Memory's 0% hallucination rate on memory updates means you can trust the system to accurately incorporate new information without creating false memories.

Reliable Retrieval

With 58.1% QA correctness—the highest reported on HaluMem-Long—Tensorheart Memory retrieves the right memories more often than any evaluated alternative.

Knowing What You Don't Know

The 87.8% accuracy on Memory Boundary questions is particularly significant. This measures the system's ability to correctly identify when information is not available rather than hallucinating an answer. For production AI applications, this self-awareness prevents confidently wrong responses.

Next Steps

Ready to use state-of-the-art memory in your application?

Quickstart — Get running in 5 minutes
API Reference — Full endpoint documentation
Building Agents — Add memory to your AI agents

Benchmark Results​

Question Answering​

Memory Updates​

Performance by Question Type​

About the HaluMem Benchmark​

Citation​

Competitor Results​

Why This Matters​

Zero Hallucination Updates​

Reliable Retrieval​

Knowing What You Don't Know​

Next Steps​