Document Ingestion API
Import documents and convert them to searchable memories.
Overview
Document Ingestion processes various file formats and extracts content as memories:
- PDFs - Text extraction with optional OCR
- Images - Visual content analysis
- Audio - Automatic transcription
- URLs - Web page content extraction
Ingest File
Upload and process a file.
POST /v1/ingest/file
Form Data
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | File to ingest |
chunk_size | int | No | Tokens per chunk (default: 500) |
overlap | int | No | Chunk overlap (default: 50) |
space_id | string | No | Store in this space |
ocr | bool | No | Enable OCR (default: true) |
Supported Formats
- Documents: PDF
- Images: PNG, JPG, GIF, WebP
- Audio: MP3, WAV, M4A
- Text: TXT, MD, CSV
Example
curl -X POST https://api.memory.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "space_id=documents" \
-F "chunk_size=500"
Response
{
"success": true,
"data": {
"document_id": "doc_abc123",
"status": "completed",
"memories_created": 15,
"chunks_created": 15,
"tokens_extracted": 7500
}
}
Ingest URL
Extract content from a web page.
POST /v1/ingest/url
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to ingest |
chunk_size | int | No | Tokens per chunk (default: 500) |
space_id | string | No | Store in this space |
Example
curl -X POST https://api.memory.tensorheart.com/v1/ingest/url \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"space_id": "research"
}'
Get Document Status
GET /v1/ingest/{document_id}
Example
curl https://api.memory.tensorheart.com/v1/ingest/doc_abc123 \
-H "Authorization: Bearer $API_KEY"
List Documents
GET /v1/ingest
Lists all ingested documents with their processing status.
Processing Pipeline
Upload → Extract Text → Chunk Content → Create Memories → Index for Search
Each chunk becomes a searchable memory, making your documents instantly queryable.