Why Use Dex?
Around 80-90% of enterprise data lives within unstructured formats such as PDFs and DOCX files. Dex solves the most common challenges of programmatic document processing:- Format Diversity: Process any document type with a single APIโbusiness reports, financial documents, legal contracts, healthcare records, and more.
- Unstructured Data: Convert complex layouts into structured JSON with semantic understanding, including text, tables, charts, and infographics.
- Quality Variations: Handle scanned, handwritten, and low-quality documents with high accuracy across multiple languages.
- Scalability: Process thousands of documents efficiently with built-in scalable infrastructure.
- Flexibility: Choose from multiple OCR engines and customize extraction with your own tools and workflows.
Core Primitives
Dex is designed as a capability rather than a standalone product, centered around composable primitives that can be used, extended, and combined:File Management
Upload, retrieve, and securely store confidential documents with fine-grained access control. Supports persistent storage with metadata tracking, secure access patterns, and configurable data retention policies for automatic lifecycle management.Parse
Convert documents into machine-readable formats using multiple OCR engines. Dex extracts:- Plain text in multiple languages (English, Spanish, Arabic, German, and more)
- Tables including small and large tabular data (up to 500+ rows)
- Checkboxes for form processing
- Images and figures with bounding box information
- Charts for data visualization analysis
- Block-level: Low-level separation of layout blocks
- Section-level: Semantic separation by detecting titles and subtitles
- Page-level: Page-by-page analysis and processing
Vector Stores
Vectorize and index parsed documents for semantic search and retrieval. Vector stores enable:- Semantic search over document chunks with embedding-based similarity
- Context management for multi-file processing and large documents
- Regex search for pattern-based extraction (dates, IDs, emails, etc.)
- Document summarization for quick overview generation
Extract
Extract structured data from parse results or document collections using:- Custom schemas defined with Pydantic models
- Natural language prompts to guide extraction
- Citations that link extracted data to source locations
- Confidence scores for quality assessment
- RAG-enhanced extraction using vector store context
- Agentic extraction with custom MCP tools for advanced workflows
Ways to Interact with Dex
Dex provides multiple interfaces to support different use cases:- REST API: OpenAPI-documented endpoints for direct integration
- Python SDK: High-level wrapper for rapid development with both sync and async support
- MCP Server: Model Context Protocol integration for agent-based workflows (coming soonโฆ)
Common Use Cases
- Financial Services: Automate invoice processing, tax document analysis, and financial report extraction.
- Healthcare: Extract patient information from medical records, insurance claims, and healthcare forms.
- Legal: Analyze contracts, process discovery documents, and extract key clauses and obligations.
- Business Operations: Process HR documents, supply chain orders, customer service tickets, and business reports.
Understanding Industry Document Challenges
Different industries face unique document processing challenges based on their document types and layouts. For a comprehensive overview of typical document formats and layout challenges across finance, healthcare, insurance, and legal sectors, see Industry Document Types and Layout Challenges. This guide covers:- Finance: SEC filings, research reports, and financial statements with multi-column layouts, complex footnotes, and embedded visualizations
- Healthcare: Medical records and clinical documentation with handwritten elements, scanned materials, and variable form structures
- Insurance: Claims forms (CMS-1500, UB-04) combining typed prompts with handwritten responses on poor-quality scans
- Legal: Contracts and court filings requiring hierarchical structure preservation through complex sections and redlined annotations
Language Support
Dex supports multi-language document processing with good support for germanic languages. For non-germanic, there are 35 languages including but not limited to: Afrikaans: ๐ฟ๐ฆ - Albanian: ๐ฆ๐ฑ - Arabic: ๐ธ๐ฆ - Armenian: ๐ฆ๐ฒ - Belarusian: ๐ง๐พ - Bengali: ๐ง๐ฉ - Bulgarian: ๐ง๐ฌ - Catalan: ๐ช๐ธ - Chinese: ๐จ๐ณ - Croatian: ๐ญ๐ท - Czech: ๐จ๐ฟ - Danish: ๐ฉ๐ฐ - Dutch: ๐ณ๐ฑ - English: ๐ฌ๐ง - Estonian: ๐ช๐ช - Filipino: ๐ต๐ญ - Finnish: ๐ซ๐ฎ - French: ๐ซ๐ท - German: ๐ฉ๐ช - Greek: ๐ฌ๐ท - Gujarati: ๐ฎ๐ณ - Hebrew: ๐ฎ๐ฑ - Hindi: ๐ฎ๐ณ - Hungarian: ๐ญ๐บ - Icelandic: ๐ฎ๐ธ - Indonesian: ๐ฎ๐ฉ - Italian: ๐ฎ๐น - Japanese: ๐ฏ๐ต - Kannada: ๐ฎ๐ณ - Khmer: ๐ฐ๐ญ - Korean: ๐ฐ๐ท - Lao: ๐ฑ๐ฆ - Latvian: ๐ฑ๐ป - Lithuanian: ๐ฑ๐น - Macedonian: ๐ฒ๐ฐ - Malay: ๐ฒ๐พ - Malayalam: ๐ฎ๐ณ - Marathi: ๐ฎ๐ณ - Nepali: ๐ณ๐ต - Norwegian: ๐ณ๐ด - Persian: ๐ฎ๐ท - Polish: ๐ต๐ฑ - Portuguese: ๐ต๐น - Punjabi: ๐ฎ๐ณ - Romanian: ๐ท๐ด - Russian: ๐ท๐บ - Serbian: ๐ท๐ธ - Slovak: ๐ธ๐ฐ - Slovenian: ๐ธ๐ฎ - Spanish: ๐ช๐ธ - Swedish: ๐ธ๐ช - Tagalog: ๐ต๐ญ - Tamil: ๐ฎ๐ณ - Telugu: ๐ฎ๐ณ - Thai: ๐น๐ญ - Turkish: ๐น๐ท - Ukrainian: ๐บ๐ฆ - Vietnamese: ๐ป๐ณ - Yiddish: ๐ฎ๐ฑKey Features
Citations and Traceability
Every extracted field can be associated with its source location (page number, bounding box, text snippet), enabling auditability and human review.Confidence Scoring
Assigns confidence scores to extracted fields based on model outputs, helping you filter and prioritize results for downstream review.Flexible OCR Engine Support
Choose the best OCR engine for your use caseโReducto for English and Latin-script documents, Iris for non-English and non-Latin scripts (Arabic, Hebrew, CJK, etc.), or integrate your own custom engine.Access Control
Fine-grained authorization with ReBAC (Relationship-Based Access Control) for projects, files, parse results, and vector stores.Data Lifecycle Management
Configurable retention policies automatically manage the lifecycle of files and processing artifacts, helping you meet compliance requirements and optimize storage costs.Getting Started
To begin using Dex, youโll need a Scale account with SGP access. Quick Links:- Getting Started Guide: Step-by-step tutorial for your first extraction
- Quick Reference: Cheat sheet for common patterns and imports
- Advanced Features: Vector stores, batch processing, and optimization
- API Reference: Complete SDK documentation

