Document Ingestion API
Import documents and convert them to searchable memories.
Overview
Document Ingestion processes various file formats and extracts content as memories:
- PDFs - Text extraction with optional OCR
- Images - Visual content analysis
- Audio - Automatic transcription
- URLs - Web page content extraction
Ingest File
Upload and process a file.
POST /v1/ingest/file
Form Data
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | File to ingest |
chunk_size | int | No | Tokens per chunk (default: 500) |
overlap | int | No | Chunk overlap (default: 50) |
space_id | string | No | Store in this space |
ocr | bool | No | Enable OCR (default: true) |
Supported Formats
- Documents: PDF
- Images: PNG, JPG, GIF, WebP
- Audio: MP3, WAV, M4A
- Text: TXT, MD, CSV
Example
curl -X POST https://memoryapi.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "space_id=documents" \
-F "chunk_size=500"
Response
{
"success": true,
"data": {
"document_id": "doc_abc123",
"status": "completed",
"memories_created": 15,
"chunks_created": 15,
"tokens_extracted": 7500
}
}
Ingest URL
Extract content from a web page.
POST /v1/ingest/url
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to ingest |
chunk_size | int | No | Tokens per chunk (50-5000, default: 500) |
overlap | int | No | Overlap between chunks (0-1000, default: 50) |
space_id | string | No | Store in this space |
Security
Only public URLs are allowed. Private IPs, localhost, and cloud metadata endpoints are blocked.
Example
curl -X POST "https://memoryapi.tensorheart.com/v1/ingest/url?url=https://example.com/article&space_id=research" \
-H "Authorization: Bearer $API_KEY"
Get Document Status
GET /v1/ingest/{document_id}
Example
curl https://memoryapi.tensorheart.com/v1/ingest/doc_abc123 \
-H "Authorization: Bearer $API_KEY"
List Documents
GET /v1/ingest
Lists all ingested documents with their processing status.
Processing Pipeline
Upload → Extract Text → Chunk Content → Create Memories → Index for Search
Each chunk becomes a searchable memory, making your documents instantly queryable.