Memory Extraction
Automatically extract and store memories from text content, files, and URLs.
Overview
The Memory API provides two approaches for extracting memories:
- Text Extraction - Extract memories from text content (conversations, documents, notes)
- File Ingestion - Upload and process files directly (PDFs, images, audio, etc.)
Text Extraction
Extract memories from text content using the extraction endpoint.
Basic Usage
curl -X POST https://memoryapi.tensorheart.com/v1/query/extract \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "User: Hi, I am Sarah and I work at Netflix as a product manager.",
"content_type": "conversation"
}'
Extracted memories:
- "User's name is Sarah"
- "User works at Netflix"
- "User is a product manager"
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
content | string | Yes | The text content to extract memories from (max 500,000 characters) |
content_type | string | No | Type of content: conversation, document, or notes (default: conversation) |
metadata | object | No | Custom metadata to attach to all extracted memories |
space_id | string | No | Memory space to store extracted memories in |
Content Types
| Type | Best For |
|---|---|
conversation | Chat logs, transcripts, dialogue |
document | Articles, reports, long-form text |
notes | Meeting notes, summaries, bullet points |
Adding Metadata
Tag extracted memories with source information for better organization:
{
"content": "Meeting with client discussed Q4 goals...",
"content_type": "notes",
"metadata": {
"source": "client_meeting",
"date": "2024-01-15",
"session_id": "abc123"
}
}
File Ingestion
Upload files directly for processing into memories. The API automatically extracts text content and creates searchable memories.
Supported File Formats
| Format | Extensions | Description |
|---|---|---|
.pdf | Text extraction with optional OCR for scanned pages | |
| Images | .png, .jpg, .jpeg, .gif, .webp | Text and content extraction from images |
| Audio | .mp3, .wav, .m4a, .ogg | Speech-to-text transcription |
| Plain Text | .txt, .md, .csv | Direct text processing |
Upload a File
curl -X POST https://memoryapi.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "chunk_size=500" \
-F "overlap=50" \
-F "ocr=true"
File Upload Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file | file | Required | The file to upload and process |
chunk_size | integer | 500 | Target tokens per chunk |
overlap | integer | 50 | Overlap tokens between chunks |
ocr | boolean | true | Enable OCR for scanned PDF pages |
space_id | string | null | Memory space to store memories in |
Response
{
"success": true,
"data": {
"document_id": "doc_abc123def456",
"status": "completed",
"memories_created": 12,
"chunks_created": 12,
"tokens_extracted": 5840
}
}
URL Ingestion
Ingest content directly from a URL. Supports web pages and PDF links.
Ingest from URL
curl -X POST https://memoryapi.tensorheart.com/v1/ingest/url \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"chunk_size": 500,
"overlap": 50
}'
URL Support
| Type | Description |
|---|---|
| Web pages | Extracts main article/content from HTML pages |
| PDF URLs | Downloads and processes as PDF |
Managing Documents
Get Document Status
curl https://memoryapi.tensorheart.com/v1/ingest/{document_id} \
-H "Authorization: Bearer $API_KEY"
List Documents
curl "https://memoryapi.tensorheart.com/v1/ingest?limit=50&offset=0" \
-H "Authorization: Bearer $API_KEY"
Get Document Chunks
View individual chunks from a processed document:
curl https://memoryapi.tensorheart.com/v1/ingest/{document_id}/chunks \
-H "Authorization: Bearer $API_KEY"
Best Practices
- Choose the right method - Use text extraction for structured text, file ingestion for documents
- Clean input - Remove irrelevant content before text extraction
- Use metadata - Tag memories with source information for better retrieval
- Tune chunk size - Larger chunks preserve more context, smaller chunks enable finer retrieval
- Enable OCR - For scanned PDFs, keep OCR enabled to extract text from images
- Batch wisely - Extract from complete conversations or documents, not fragments
- Use spaces - Organize memories into spaces for different projects or contexts
- Review results - Periodically audit extracted memories for quality