Skip to main content

State of the Art Performance

Tensorheart Memory achieves state-of-the-art results on the HaluMem benchmark, the most comprehensive evaluation of AI memory systems available.

Benchmark Results

On the HaluMem-Long benchmark—testing memory extraction, updating, and question answering across 2,400+ dialogues with 107,000+ conversation turns—Tensorheart Memory outperforms competing systems:

Question Answering

SystemQA Correctness
Tensorheart Memory58.1%
Supermemory53.77%

Tensorheart achieves 8% higher accuracy on question answering tasks.

Memory Updates

The ability to correctly update memories when new information arrives is critical for long-running AI agents:

SystemUpdate AccuracyHallucination Rate
Tensorheart Memory57.2%0%
Supermemory17.01%0.58%

Tensorheart is 3.4x more accurate at memory updates and achieves a perfect 0% hallucination rate—the system never fabricates information during updates.

Performance by Question Type

Tensorheart excels at challenging memory scenarios:

Question TypeAccuracy
Memory Boundary (knowing what you don't know)87.8%
Memory Conflict (handling contradictions)69.8%
Generalization & Application45.2%
Basic Fact Recall41.1%
Multi-hop Inference34.9%
Dynamic Update20.8%

The 87.8% accuracy on Memory Boundary questions means Tensorheart reliably knows the limits of its knowledge—critical for building trustworthy AI applications.

About the HaluMem Benchmark

HaluMem (November 2025) is the first benchmark to evaluate AI memory systems at the operation level—testing extraction, updating, and retrieval as separate components rather than just end-to-end performance. This methodology reveals where systems actually succeed and fail.

The HaluMem-Long variant tests memory systems with:

  • 1M+ token contexts
  • Distractor content to stress-test retrieval
  • Multiple question types requiring different reasoning capabilities

Citation

HaluMem: Hallucination in Memory Systems arXiv:2511.03506, November 2025 https://arxiv.org/abs/2511.03506

Competitor Results

Supermemory's HaluMem results are documented in the benchmark paper and the company's public disclosures. Key metrics:

MetricSupermemory
QA Correctness53.77%
Memory Recall53.02%
Weighted Recall70.73%
Update Accuracy17.01%
Update Hallucination Rate0.58%

Source: Supermemory HaluMem Evaluation (as reported in arXiv:2511.03506)

Why This Matters

Zero Hallucination Updates

When your AI assistant updates its memory, you need confidence it won't fabricate information. Tensorheart's 0% hallucination rate on memory updates means you can trust the system to accurately incorporate new information without creating false memories.

Reliable Retrieval

With 58.1% QA correctness—the highest reported on HaluMem-Long—Tensorheart retrieves the right memories more often than any evaluated alternative.

Knowing What You Don't Know

The 87.8% accuracy on Memory Boundary questions is particularly significant. This measures the system's ability to correctly identify when information is not available rather than hallucinating an answer. For production AI applications, this self-awareness prevents confidently wrong responses.

Evaluation Methodology

Our HaluMem evaluation used:

ParameterValue
Dialogues Evaluated2,417
Total Conversation Turns107,032
Ground Truth Memories14,948
QA Pairs Tested3,467
Update Scenarios3,788
Total Runtime74 hours

The full evaluation processed over 23,000 memories with embeddings, using relevance threshold scoring to retrieve the most appropriate context for each query.

Next Steps

Ready to use state-of-the-art memory in your application?