Context Extension API

Enable unlimited conversation length with automatic context management.

Overview

Context Extension automatically manages long conversations by:

Tracking all messages in a session
Compressing old context when approaching token limits
Retrieving relevant historical context for new messages

This allows conversations of any length while staying within LLM context limits.

Create Session

POST /v1/chat/sessions

Request Body

Field	Type	Default	Description
`session_id`	string	auto-generated	Custom session ID
`model`	string	"gpt-4o"	LLM model to use
`context_budget`	int	8000	Max tokens to maintain (1000-128000)
`system_prompt`	string	null	System prompt
`space_id`	string	null	Link to memory space

Example

curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "context_budget": 16000,
    "system_prompt": "You are a helpful assistant."
  }'

Response

{
  "success": true,
  "data": {
    "id": "sess_abc123def456",
    "model": "gpt-4o",
    "context_budget": 16000,
    "total_tokens": 0,
    "message_count": 0,
    "created_at": "2024-01-15T10:30:00Z",
    "last_message_at": null
  }
}

Send Message

Send a message and get an AI response with automatic context management.

POST /v1/chat/sessions/{session_id}/messages

Request Body

Field	Type	Required	Description
`content`	string	Yes	Message content (1-100,000 chars)

Example

curl -X POST https://memoryapi.tensorheart.com/v1/chat/sessions/sess_abc123/messages \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Let me tell you about the project requirements..."
  }'

Response

{
  "success": true,
  "data": {
    "message": {
      "id": "msg_xyz789",
      "role": "assistant",
      "content": "I understand. Please share the project requirements...",
      "token_count": 45,
      "created_at": "2024-01-15T10:31:00Z"
    },
    "session": {
      "id": "sess_abc123",
      "total_tokens": 120,
      "message_count": 2
    },
    "context_stats": {
      "tokens_in_context": 120,
      "tokens_saved": 0,
      "chunks_retrieved": 0,
      "compression_ratio": 1.0
    }
  }
}

The system automatically compresses old context when approaching the token budget.

Get Session Messages

Retrieve messages from a session.

GET /v1/chat/sessions/{session_id}/messages

Query Parameters

Parameter	Type	Default	Description
`limit`	int	50	Max messages to return
`offset`	int	0	Number of messages to skip

List Sessions

GET /v1/chat/sessions

Get Session

GET /v1/chat/sessions/{session_id}

Delete Session

DELETE /v1/chat/sessions/{session_id}

How Context Management Works

Messages added → Token count tracked → Budget exceeded?
                                              ↓
                                    Compress old messages
                                              ↓
                                    Store as searchable chunks
                                              ↓
                        Retrieve relevant chunks for new queries

This ensures you always have the most relevant context, regardless of conversation length.

Overview​

Create Session​

Request Body​

Example​

Response​

Send Message​

Request Body​

Example​

Response​

Get Session Messages​

Query Parameters​

List Sessions​

Get Session​

Delete Session​

How Context Management Works​

Overview

Create Session

Request Body

Example

Response

Send Message

Request Body

Example

Response

Get Session Messages

Query Parameters

List Sessions

Get Session

Delete Session

How Context Management Works