Skip to main content

Document Ingestion API

Import documents and convert them to searchable memories.

Overview

Document Ingestion processes various file formats and extracts content as memories:

  • PDFs - Text extraction with optional OCR
  • Images - Visual content analysis
  • Audio - Automatic transcription
  • URLs - Web page content extraction

Ingest File

Upload and process a file.

POST /v1/ingest/file

Form Data

FieldTypeRequiredDescription
filefileYesFile to ingest
chunk_sizeintNoTokens per chunk (default: 500)
overlapintNoChunk overlap (default: 50)
space_idstringNoStore in this space
ocrboolNoEnable OCR (default: true)

Supported Formats

  • Documents: PDF
  • Images: PNG, JPG, GIF, WebP
  • Audio: MP3, WAV, M4A
  • Text: TXT, MD, CSV

Example

curl -X POST https://api.memory.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "space_id=documents" \
-F "chunk_size=500"

Response

{
"success": true,
"data": {
"document_id": "doc_abc123",
"status": "completed",
"memories_created": 15,
"chunks_created": 15,
"tokens_extracted": 7500
}
}

Ingest URL

Extract content from a web page.

POST /v1/ingest/url

Request Body

FieldTypeRequiredDescription
urlstringYesURL to ingest
chunk_sizeintNoTokens per chunk (default: 500)
space_idstringNoStore in this space

Example

curl -X POST https://api.memory.tensorheart.com/v1/ingest/url \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"space_id": "research"
}'

Get Document Status

GET /v1/ingest/{document_id}

Example

curl https://api.memory.tensorheart.com/v1/ingest/doc_abc123 \
-H "Authorization: Bearer $API_KEY"

List Documents

GET /v1/ingest

Lists all ingested documents with their processing status.

Processing Pipeline

Upload → Extract Text → Chunk Content → Create Memories → Index for Search

Each chunk becomes a searchable memory, making your documents instantly queryable.