/form-processing
command allows you to extract form fields and their values using a machine learning model specifically trained for this purpose. Over the last year, LLM models like Claude and OpenAI have eclipsed custom models like this. But every now and then, a custom trained model like this may have greater accuracy. So if you’re struggling with a LLM to do document processing on scanned or PDF forms, give this a try.
Basic Usage
Process a form by either uploading a file or providing a URL:Input Formats
Supported file types include:- PDF documents
- Scanned images (PNG, JPEG)
- Digital forms
Output Format
The command returns a JSON object containing:-
Field Extractions:
- Field names and their extracted values
- Confidence scores for each extraction
- Field types (e.g., checkbox status, text fields)
-
Metadata:
- Processing timestamp
- File information
- Model version
Understanding the Results
Field Values
- Text fields: Contains the extracted text
- Checkboxes: Reports as either “filled_checkbox” or “unfilled_checkbox”
- Empty fields: May return ”-” or be omitted
Confidence Scores
- Range from 0 to 1 (0% to 100% confidence)
- Higher scores indicate greater confidence in the extraction
- Generally:
- greater than 0.9: Very high confidence
- 0.7-0.9: Good confidence
- less than 0.7: Lower confidence, may need review