Skip to main content

Context Extension API

Enable unlimited conversation length with automatic context management.

Overview

Context Extension automatically manages long conversations by:

  • Tracking all messages in a session
  • Compressing old context when approaching token limits
  • Retrieving relevant historical context for new messages

This allows conversations of any length while staying within LLM context limits.

Create Session

POST /v1/chat/sessions

Request Body

FieldTypeDefaultDescription
session_idstringauto-generatedCustom session ID
modelstring"gpt-4o"LLM model to use
context_budgetint8000Max tokens to maintain (1000-128000)
system_promptstringnullSystem prompt
space_idstringnullLink to memory space

Example

curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"context_budget": 16000,
"system_prompt": "You are a helpful assistant."
}'

Response

{
"success": true,
"data": {
"id": "sess_abc123def456",
"model": "gpt-4o",
"context_budget": 16000,
"total_tokens": 0,
"message_count": 0,
"created_at": "2024-01-15T10:30:00Z",
"last_message_at": null
}
}

Send Message

Send a message and get an AI response with automatic context management.

POST /v1/chat/sessions/{session_id}/messages

Request Body

FieldTypeRequiredDescription
contentstringYesMessage content (1-100,000 chars)

Example

curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions/sess_abc123/messages \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Let me tell you about the project requirements..."
}'

Response

{
"success": true,
"data": {
"message": {
"id": "msg_xyz789",
"role": "assistant",
"content": "I understand. Please share the project requirements...",
"token_count": 45,
"created_at": "2024-01-15T10:31:00Z"
},
"session": {
"id": "sess_abc123",
"total_tokens": 120,
"message_count": 2
},
"context_stats": {
"tokens_in_context": 120,
"tokens_saved": 0,
"chunks_retrieved": 0,
"compression_ratio": 1.0
}
}
}

The system automatically compresses old context when approaching the token budget.


Get Session Messages

Retrieve messages from a session.

GET /v1/chat/sessions/{session_id}/messages

Query Parameters

ParameterTypeDefaultDescription
limitint50Max messages to return
offsetint0Number of messages to skip

List Sessions

GET /v1/chat/sessions

Get Session

GET /v1/chat/sessions/{session_id}

Delete Session

DELETE /v1/chat/sessions/{session_id}

How Context Management Works

Messages added → Token count tracked → Budget exceeded?

Compress old messages

Store as searchable chunks

Retrieve relevant chunks for new queries

This ensures you always have the most relevant context, regardless of conversation length.