What can you do with it?

The /ocr command enables you to extract text from images, PDFs, and scanned documents using optical character recognition. You can convert photos of text into editable content, digitize printed documents, extract text from PDF files, extract form data from scanned forms, and process images or PDFs containing text in various formats.

How to use it?

Basic Command Structure

/ocr [file] [options]

Parameters

Required:
  • file - The image, PDF, or scanned document to process (URL, uploaded file, or artifact)
Optional:
  • languageHints - Array of language codes for better OCR accuracy (e.g., [“en”, “es”])
  • extractTextOnly - When true (default), returns only the extracted text. When false, includes detailed OCR data like bounding boxes, word positions, page structure, and text annotations for precise layout analysis
  • collectionId - Specific collection ID to store results. If not provided, uses default MultimediaArtifact collection
  • async - When true, processes files asynchronously in background. Automatically enabled for large files

Response Format

Basic Response (extractTextOnly: true, synchronous):
{
  "success": true,
  "async": false,
  "collectionId": "collection-id-123_artifact",
  "capability": "ocr",
  "totalFiles": 1,
  "successfulFiles": 1,
  "failedFiles": 0,
  "totalOcrPages": 3,
  "results": [
    {
      "fileUrl": "https://example.com/document.pdf",
      "inputFileName": "document.pdf",
      "inputMimeType": "application/pdf",
      "outputFileName": "ocr_document.json",
      "outputMimeType": "application/json",
      "extractedText": "Sample extracted text from the document...",
      "pageResults": [
        {
          "page": 1,
          "extractedText": "Page 1 content here..."
        },
        {
          "page": 2,
          "extractedText": "Page 2 content here..."
        },
        {
          "page": 3,
          "extractedText": "Page 3 content here..."
        }
      ],
      "fileId": "fileId123",
      "signedUrl": "https://skills.pinkfish.ai/files/collection-id-123/ocr_document.json?signed-key=dummy-key",
      "ocrPages": 3,
      "totalPages": 3,
      "successRate": "3/3"
    }
  ]
}
Detailed Response (extractTextOnly: false, synchronous):
{
  "success": true,
  "capability": "ocr",
  "totalFiles": 1,
  "successfulFiles": 1,
  "failedFiles": 0,
  "totalOcrPages": 1,
  "results": [
    {
      "fileUrl": "file_url_here",
      "inputFileName": "filename.ext",
      "inputMimeType": "file/type",
      "outputFileName": "processed_filename.txt",
      "outputMimeType": "text/plain",
      "extractedText": "text content here",
      "fileId": "storage_file_id",
      "signedUrl": "https://...",
      "ocrPages": 1,
      "totalPages": 1,
      "successRate": "1/1",
      "detailedResults": {
        "textAnnotations": [
          {
            "description": "text",
            "boundingPoly": { "vertices": [...] },
            "locale": "en"
          }
        ],
        "fullTextAnnotation": {
          "text": "full text",
          "pages": [
            {
              "width": 1024,
              "height": 768,
              "blocks": [...]
            }
          ]
        }
      }
    }
  ]
}
Async Response (async: true OR file is big):
{
  "success": true,
  "capability": "ocr",
  "async": true,
  "collectionId": "collection-id-123_artifact",
  "responseId": "ocr-processing-queue-1757091370603-abc123",
  "status": "queued",
  "createdAt": "2025-01-06T16:56:10.604Z",
  "results": [
    {
      "inputFileName": "document.pdf",
      "outputFileName": "ocr_document.json",
      "fileId": "fileId123",
      "signedUrl": "https://skills.pinkfish.ai/files/collection-id-123/ocr_document.json?signed-key=dummy-key"
    }
  ],
  "totalPages": 3,
  "totalImages": 0,
  "forceAsync": false,
  "message": "OCR processing started in background. Results will be saved to file storage when complete."
}
Note: The signedUrl will be immediately available with a placeholder file that gets overwritten with OCR results when processing completes. The status can be “queued”, “processing”, “completed”, or “failed”. Large files are automatically processed asynchronously regardless of the async parameter setting.

Examples

Basic Usage

/ocr
file: receipt.jpg
Extracts all text from an image of a receipt.
/ocr
file: document.pdf
Extracts all text from a PDF document.

Advanced Usage

/ocr
file: application-form.png
languageHints: ["en"]
extractTextOnly: false
Processes a scanned form with detailed annotations and language hints.
/ocr
file: contract.pdf
languageHints: ["en", "es"]
Processes a multi-page PDF contract with support for English and Spanish text.
/ocr
file: large-report.pdf
async: true
collectionId: "my-collection"
Processes a large PDF report asynchronously and stores results in a specific collection.

Specific Use Case

/ocr
file: business-card.jpg
extractTextOnly: false
Extracts text from a business card image with detailed bounding box information, word positions, and page structure - useful for building custom layout analysis or data extraction logic.
/ocr
file: invoice.pdf
languageHints: ["en"]
Extracts text from a PDF invoice with English language optimization.

Notes

Supported File Formats:
  • Image files: JPEG/JPG (.jpg, .jpeg), PNG (.png), GIF (.gif), BMP (.bmp), TIFF (.tiff, .tif), WebP (.webp), SVG (.svg)
  • PDF files: Single and multi-page PDF documents (.pdf)