Skip to main content
Server path: /docprocess | Type: Embedded | PCID required: No Document processing tools: convert between formats (CSV, XLSX, HTML, PDF, Markdown, Word, XML, JSON), extract text (PDF, DOCX, PPTX), OCR images and PDFs, fill PDF forms, fill Word templates, create Word documents, validate CSVs, extract invoice line items with AI, and process Word documents with AI.

Tools

ToolDescription
docprocess_csv_to_xlsxConvert CSV files to Excel
docprocess_xlsx_to_csvConvert Excel files to CSV
docprocess_html_to_pdfConvert HTML to PDF
docprocess_md_to_docxConvert Markdown to Word
docprocess_md_to_pdfConvert Markdown to PDF
docprocess_xml_to_jsonConvert XML to JSON
docprocess_docx_to_txtExtract text from Word documents
docprocess_pdf_to_txtExtract text from PDF files
docprocess_pptx_to_txtExtract text from PowerPoint files
docprocess_ocrExtract text from images/PDFs with OCR
docprocess_ocr_pollPoll OCR job status
docprocess_fill_pdfFill a PDF form with data
docprocess_fill_word_tplFill a Word template with data
docprocess_create_wordCreate a Word document from JSON spec
docprocess_word_aiEdit Word documents with AI
docprocess_word_ai_pollPoll Word AI processing status
docprocess_validate_csvValidate CSV structure and data quality
docprocess_invoice_extractExtract line items from invoices (PDF/image) with AI
docprocess_invoice_extract_pollPoll invoice extraction job status

Conversion tools — common pattern

The nine conversion and text-extraction tools below all share the same base parameters and response structure. Common parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesURLs of the source files to convert
file_links_expire_in_daysnumberNo7Days until output file links expire (1–30)
Common response fields:
FieldTypeDescription
messagestringSummary message
filesobject[]Array of conversion results
files[].file_urlstringURL of the original file
files[].file_sizenumberSize of the original file in bytes
files[].mime_typestringMIME type of the original file
files[].filenamestringName of the original file
files[].converted.file_urlstringURL of the converted file
files[].converted.file_sizenumberSize of the converted file in bytes
files[].converted.mime_typestringMIME type of the converted file
files[].converted.filenamestringName of the converted file
totalnumberTotal number of files processed

docprocess_csv_to_xlsx

Convert one or more CSV files to Excel format. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_xlsx_to_csv

Convert Excel files to CSV. Multi-sheet workbooks produce a separate CSV per sheet. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_html_to_pdf

Convert HTML files to PDF. URLs must point directly to .html files. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_md_to_docx

Convert Markdown files to Word documents. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_md_to_pdf

Convert Markdown files to PDF with configurable page size and orientation. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesURLs of the Markdown files to convert
file_links_expire_in_daysnumberNo7Days until output file links expire (1–30)
pdf_formatstringNo"a4"Page size — "a4" or "letter"
pdf_orientationstringNo"portrait"Page orientation — "portrait" or "landscape"
Response fields: Common response fields — see conversion tools common pattern.

docprocess_xml_to_json

Convert XML files to JSON. Can return data inline or store as a file. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesURLs of the XML files to convert
file_links_expire_in_daysnumberNo7Days until output file links expire (1–30)
store_xml_jsonbooleanNofalsetrue to store the JSON as a file, false to return data inline
Response fields: Common response fields — see conversion tools common pattern.

docprocess_docx_to_txt

Extract plain text from Word documents. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_pdf_to_txt

Extract plain text from PDF files. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_pptx_to_txt

Extract plain text from PowerPoint files. Parameters: Common parameters only — see conversion tools common pattern. Response fields: Common response fields — see above.

docprocess_ocr

Extract text from images or PDFs using OCR. Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG, and PDF. Can run synchronously or asynchronously. Parameters:
ParameterTypeRequiredDefaultDescription
fileUrlsstring[]YesURLs of images or PDFs to OCR
languageHintsstring[]NoLanguage hints for OCR (e.g. ["en", "es"])
extractTextOnlybooleanNotrueExtract plain text only
collectionIdstringNoFile storage collection ID for output files
asyncbooleanNofalseRun asynchronously — poll with docprocess_ocr_poll
Response fields:
FieldTypeDescription
successbooleanWhether the request succeeded
asyncbooleanWhether the job is running asynchronously
collectionIdstringCollection ID (if provided)
capabilitystringOCR capability used
totalFilesnumberTotal files submitted
successfulFilesnumberFiles processed successfully
failedFilesnumberFiles that failed
totalOcrPagesnumberTotal pages processed
resultsobject[]Array of per-file results
results[].fileUrlstringURL of the input file
results[].inputFileNamestringOriginal file name
results[].outputFileNamestringOutput file name
results[].outputMimeTypestringMIME type of the output
results[].extractedTextstringExtracted text content
results[].fileIdstringFile ID in storage
results[].signedUrlstringSigned URL for the output file
statusstringJob status
responseIdstringID for polling async jobs
messagestringStatus message

docprocess_ocr_poll

Poll the status of an asynchronous OCR job. Parameters:
ParameterTypeRequiredDescription
responseIdstringYesResponse ID from the original docprocess_ocr call
Response fields:
FieldTypeDescription
statusstringJob status — "completed", "failed", "processing", or "queued"
responseIdstringThe response ID
resultsobject[]Per-file results (same structure as docprocess_ocr results)
errorstringError message (when status is "failed")
messagestringStatus message

docprocess_fill_pdf

Fill a PDF form with field values. Parameters:
ParameterTypeRequiredDefaultDescription
pdf_urlstringYesURL of the PDF form to fill
form_dataobjectYesField-name-to-value pairs for form fields
output_filenamestringNo"filled_form.pdf"Name for the output file
file_links_expire_in_daysnumberNo7Days until the output link expires
Response fields:
FieldTypeDescription
successbooleanWhether the operation succeeded
statusstringStatus message
file_urlstringURL of the filled PDF
filenamestringOutput file name
sizenumberFile size in bytes
fields_fillednumberNumber of fields filled
field_summarystringSummary of filled fields

docprocess_fill_word_tpl

Fill a Word template with data. Supports simple placeholders ({name}), nested paths ({user.firstName}), loops ({#items}...{/items}), and conditionals ({#condition}...{/condition}). Parameters:
ParameterTypeRequiredDefaultDescription
template_urlstringYesURL of the Word template
dataobjectYesData object with values for template placeholders
output_filenamestringNo"filled_template.docx"Name for the output file
file_links_expire_in_daysnumberNo7Days until the output link expires
Response fields:
FieldTypeDescription
successbooleanWhether the operation succeeded
statusstringStatus message
file_urlstringURL of the filled document
filenamestringOutput file name
sizenumberFile size in bytes
placeholders_fillednumberNumber of placeholders filled

docprocess_create_word

Create a Word document from a JSON specification. The spec defines sections with children containing heading, paragraph, bullet, and table elements. Parameters:
ParameterTypeRequiredDefaultDescription
document_specobjectYesJSON specification with sections and child elements
output_filenamestringNo"created_document.docx"Name for the output file
file_links_expire_in_daysnumberNo7Days until the output link expires
Response fields:
FieldTypeDescription
successbooleanWhether the operation succeeded
statusstringStatus message
file_urlstringURL of the created document
filenamestringOutput file name
sizenumberFile size in bytes
element_countnumberNumber of elements in the document

docprocess_word_ai

Process a Word document with AI. Supports tasks like translation, summarization, and content editing. Runs asynchronously — poll with docprocess_word_ai_poll. Parameters:
ParameterTypeRequiredDefaultDescription
documentUrlstringYesURL of the .docx file to process
taskstringYesTask description (e.g. "translate to Spanish")
modelstringNoAI model — claude-sonnet-4-5-20250929, gpt-4.1, gpt-4o, or gemini-2.5-flash
strategystringNoProcessing strategy — "SPARSE_CHANGES" or "DENSE_CHANGES" (auto-detected if omitted)
Response fields:
FieldTypeDescription
statusstringJob status
responseIdstringID for polling with docprocess_word_ai_poll
messagestringStatus message
taskstringThe task description

docprocess_word_ai_poll

Poll the status of an asynchronous Word AI processing job. Parameters:
ParameterTypeRequiredDescription
responseIdstringYesResponse ID from the original docprocess_word_ai call
Response fields:
FieldTypeDescription
statusstringJob status — "completed", "failed", "queued", or "processing"
responseIdstringThe response ID
resultFileobjectThe processed document (when completed)
resultFile.urlstringURL of the processed document
resultFile.filenamestringOutput file name
resultFile.sizenumberFile size in bytes
errorstringError message (when status is "failed")
messagestringStatus message

docprocess_validate_csv

Validate the structure and data quality of CSV files. Reports errors, warnings, and statistics including encoding, delimiter, column/row counts, and detected data types. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesURLs of the CSV files to validate
file_links_expire_in_daysnumberNo7Days until output file links expire (1–30)
Response fields:
FieldTypeDescription
messagestringSummary message
filesobject[]Array of validation results
files[].validation.isValidbooleanWhether the CSV is valid
files[].validation.errorsstring[]Validation errors found
files[].validation.warningsstring[]Validation warnings
files[].validation.statistics.encodingstringDetected file encoding
files[].validation.statistics.delimiterstringDetected delimiter character
files[].validation.statistics.hasHeadersbooleanWhether headers were detected
files[].validation.statistics.columnCountnumberNumber of columns
files[].validation.statistics.rowCountnumberNumber of rows
files[].validation.statistics.emptyRowCountnumberNumber of empty rows
files[].validation.statistics.totalCellsnumberTotal number of cells
files[].validation.statistics.emptyCellsnumberNumber of empty cells
files[].validation.dataTypesobjectDetected data types per column
totalnumberTotal number of files validated

docprocess_invoice_extract

Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document — different invoice types (legal, product, etc.) produce different column names. Supports multi-page PDFs with automatic sharding and parallel processing. Runs asynchronously — poll with docprocess_invoice_extract_poll to retrieve results. Parameters:
ParameterTypeRequiredDefaultDescription
fileUrlstringYesURL to the invoice file. Supported: PDF, JPEG, PNG
pagesPerShardnumberNo3Pages per processing shard for PDFs (1–20). Smaller values may improve accuracy for dense invoices
Response fields:
FieldTypeDescription
jobIdstringJob ID for polling with docprocess_invoice_extract_poll
statusstringInitial status — "queued"
createdAtstringISO timestamp when the job was created
messagestringHuman-readable status message with polling instructions

docprocess_invoice_extract_poll

Check the status of an invoice line-item extraction job. Call after docprocess_invoice_extract; poll every 10–15 seconds until status is "completed" or "failed". When completed, returns artifact URLs for the extracted CSV, JSON, and summary files. Parameters:
ParameterTypeRequiredDescription
jobIdstringYesThe jobId returned by docprocess_invoice_extract
Response fields:
FieldTypeDescription
jobIdstringJob ID
statusstring"queued", "in_progress", "completed", or "failed"
createdAtstringISO timestamp when the job was created
completedAtstringISO timestamp when the job completed (when status is "completed")
artifactsobjectOutput artifact URLs (when status is "completed")
artifacts.csvUrlstringURL to download the extracted line items as CSV
artifacts.jsonUrlstringURL to download the extracted line items as JSON
artifacts.summaryUrlstringURL to download the processing summary
errorstringError message (when status is "failed")
messagestringHuman-readable status message