Skip to main content

Memory Extraction

Automatically extract and store memories from text content, files, and URLs.

Overview

The Memory API provides two approaches for extracting memories:

  1. Text Extraction - Extract memories from text content (conversations, documents, notes)
  2. File Ingestion - Upload and process files directly (PDFs, images, audio, etc.)

Text Extraction

Extract memories from text content using the extraction endpoint.

Basic Usage

curl -X POST https://memoryapi.tensorheart.com/v1/query/extract \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "User: Hi, I am Sarah and I work at Netflix as a product manager.",
"content_type": "conversation"
}'

Extracted memories:

  • "User's name is Sarah"
  • "User works at Netflix"
  • "User is a product manager"

Request Parameters

ParameterTypeRequiredDescription
contentstringYesThe text content to extract memories from (max 500,000 characters)
content_typestringNoType of content: conversation, document, or notes (default: conversation)
metadataobjectNoCustom metadata to attach to all extracted memories
space_idstringNoMemory space to store extracted memories in

Content Types

TypeBest For
conversationChat logs, transcripts, dialogue
documentArticles, reports, long-form text
notesMeeting notes, summaries, bullet points

Adding Metadata

Tag extracted memories with source information for better organization:

{
"content": "Meeting with client discussed Q4 goals...",
"content_type": "notes",
"metadata": {
"source": "client_meeting",
"date": "2024-01-15",
"session_id": "abc123"
}
}

File Ingestion

Upload files directly for processing into memories. The API automatically extracts text content and creates searchable memories.

Supported File Formats

FormatExtensionsDescription
PDF.pdfText extraction with optional OCR for scanned pages
Images.png, .jpg, .jpeg, .gif, .webpText and content extraction from images
Audio.mp3, .wav, .m4a, .oggSpeech-to-text transcription
Plain Text.txt, .md, .csvDirect text processing

Upload a File

curl -X POST https://memoryapi.tensorheart.com/v1/ingest/file \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "chunk_size=500" \
-F "overlap=50" \
-F "ocr=true"

File Upload Parameters

ParameterTypeDefaultDescription
filefileRequiredThe file to upload and process
chunk_sizeinteger500Target tokens per chunk
overlapinteger50Overlap tokens between chunks
ocrbooleantrueEnable OCR for scanned PDF pages
space_idstringnullMemory space to store memories in

Response

{
"success": true,
"data": {
"document_id": "doc_abc123def456",
"status": "completed",
"memories_created": 12,
"chunks_created": 12,
"tokens_extracted": 5840
}
}

URL Ingestion

Ingest content directly from a URL. Supports web pages and PDF links.

Ingest from URL

curl -X POST https://memoryapi.tensorheart.com/v1/ingest/url \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"chunk_size": 500,
"overlap": 50
}'

URL Support

TypeDescription
Web pagesExtracts main article/content from HTML pages
PDF URLsDownloads and processes as PDF

Managing Documents

Get Document Status

curl https://memoryapi.tensorheart.com/v1/ingest/{document_id} \
-H "Authorization: Bearer $API_KEY"

List Documents

curl "https://memoryapi.tensorheart.com/v1/ingest?limit=50&offset=0" \
-H "Authorization: Bearer $API_KEY"

Get Document Chunks

View individual chunks from a processed document:

curl https://memoryapi.tensorheart.com/v1/ingest/{document_id}/chunks \
-H "Authorization: Bearer $API_KEY"

Best Practices

  1. Choose the right method - Use text extraction for structured text, file ingestion for documents
  2. Clean input - Remove irrelevant content before text extraction
  3. Use metadata - Tag memories with source information for better retrieval
  4. Tune chunk size - Larger chunks preserve more context, smaller chunks enable finer retrieval
  5. Enable OCR - For scanned PDFs, keep OCR enabled to extract text from images
  6. Batch wisely - Extract from complete conversations or documents, not fragments
  7. Use spaces - Organize memories into spaces for different projects or contexts
  8. Review results - Periodically audit extracted memories for quality