Context Extension API
Enable unlimited conversation length with automatic context management.
Overview
Context Extension automatically manages long conversations by:
- Tracking all messages in a session
- Compressing old context when approaching token limits
- Retrieving relevant historical context for new messages
This allows conversations of any length while staying within LLM context limits.
Create Session
POST /v1/chat/sessions
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
session_id | string | auto-generated | Custom session ID |
model | string | "gpt-4o" | LLM model to use |
context_budget | int | 8000 | Max tokens to maintain (1000-128000) |
system_prompt | string | null | System prompt |
space_id | string | null | Link to memory space |
Example
curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"context_budget": 16000,
"system_prompt": "You are a helpful assistant."
}'
Response
{
"success": true,
"data": {
"id": "sess_abc123def456",
"model": "gpt-4o",
"context_budget": 16000,
"total_tokens": 0,
"message_count": 0,
"created_at": "2024-01-15T10:30:00Z",
"last_message_at": null
}
}
Send Message
Send a message and get an AI response with automatic context management.
POST /v1/chat/sessions/{session_id}/messages
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Message content (1-100,000 chars) |
Example
curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions/sess_abc123/messages \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Let me tell you about the project requirements..."
}'
Response
{
"success": true,
"data": {
"message": {
"id": "msg_xyz789",
"role": "assistant",
"content": "I understand. Please share the project requirements...",
"token_count": 45,
"created_at": "2024-01-15T10:31:00Z"
},
"session": {
"id": "sess_abc123",
"total_tokens": 120,
"message_count": 2
},
"context_stats": {
"tokens_in_context": 120,
"tokens_saved": 0,
"chunks_retrieved": 0,
"compression_ratio": 1.0
}
}
}
The system automatically compresses old context when approaching the token budget.
Get Session Messages
Retrieve messages from a session.
GET /v1/chat/sessions/{session_id}/messages
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | int | 50 | Max messages to return |
offset | int | 0 | Number of messages to skip |
List Sessions
GET /v1/chat/sessions
Get Session
GET /v1/chat/sessions/{session_id}
Delete Session
DELETE /v1/chat/sessions/{session_id}
How Context Management Works
Messages added → Token count tracked → Budget exceeded?
↓
Compress old messages
↓
Store as searchable chunks
↓
Retrieve relevant chunks for new queries
This ensures you always have the most relevant context, regardless of conversation length.