What can you do with it?
The/ocr
command enables you to extract text from images, PDFs, and scanned documents using optical character recognition. You can convert photos of text into editable content, digitize printed documents, extract text from PDF files, extract form data from scanned forms, and process images or PDFs containing text in various formats.
How to use it?
Basic Command Structure
Parameters
Required:file
- The image, PDF, or scanned document to process (URL, uploaded file, or artifact)
languageHints
- Array of language codes for better OCR accuracy (e.g., [“en”, “es”])extractTextOnly
- When true (default), returns only the extracted text. When false, includes detailed OCR data like bounding boxes, word positions, page structure, and text annotations for precise layout analysiscollectionId
- Specific collection ID to store results. If not provided, uses default MultimediaArtifact collectionasync
- When true, processes files asynchronously in background. Automatically enabled for large files
Response Format
Basic Response (extractTextOnly: true, synchronous):signedUrl
will be immediately available with a placeholder file that gets overwritten with OCR results when processing completes. The status
can be “queued”, “processing”, “completed”, or “failed”. Large files are automatically processed asynchronously regardless of the async
parameter setting.
Examples
Basic Usage
Advanced Usage
Specific Use Case
Notes
Supported File Formats:- Image files: JPEG/JPG (.jpg, .jpeg), PNG (.png), GIF (.gif), BMP (.bmp), TIFF (.tiff, .tif), WebP (.webp), SVG (.svg)
- PDF files: Single and multi-page PDF documents (.pdf)