Skip to main content

Document Ingestion API

Import documents and convert them to searchable memories.

Overview

Document Ingestion processes various file formats and extracts content as memories:

  • PDFs - Text extraction with optional OCR
  • Images - Visual content analysis
  • Audio - Automatic transcription
  • URLs - Web page content extraction

Ingest File

Upload and process a file.

POST /v1/ingest/file

Form Data

FieldTypeRequiredDescription
filefileYesFile to ingest
chunk_sizeintNoTokens per chunk (default: 500)
overlapintNoChunk overlap (default: 50)
space_idstringNoStore in this space
ocrboolNoEnable OCR (default: true)

Supported Formats

  • Documents: PDF
  • Images: PNG, JPG, GIF, WebP
  • Audio: MP3, WAV, M4A
  • Text: TXT, MD, CSV

Example

curl -X POST https://memoryapi.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "space_id=documents" \
-F "chunk_size=500"

Response

{
"success": true,
"data": {
"document_id": "doc_abc123",
"status": "completed",
"memories_created": 15,
"chunks_created": 15,
"tokens_extracted": 7500
}
}

Ingest URL

Extract content from a web page.

POST /v1/ingest/url

Query Parameters

ParameterTypeRequiredDescription
urlstringYesURL to ingest
chunk_sizeintNoTokens per chunk (50-5000, default: 500)
overlapintNoOverlap between chunks (0-1000, default: 50)
space_idstringNoStore in this space
Security

Only public URLs are allowed. Private IPs, localhost, and cloud metadata endpoints are blocked.

Example

curl -X POST "https://memoryapi.tensorheart.com/v1/ingest/url?url=https://example.com/article&space_id=research" \
-H "Authorization: Bearer $API_KEY"

Get Document Status

GET /v1/ingest/{document_id}

Example

curl https://memoryapi.tensorheart.com/v1/ingest/doc_abc123 \
-H "Authorization: Bearer $API_KEY"

List Documents

GET /v1/ingest

Lists all ingested documents with their processing status.

Processing Pipeline

Upload → Extract Text → Chunk Content → Create Memories → Index for Search

Each chunk becomes a searchable memory, making your documents instantly queryable.