Dex is Scaleโs document understanding service that transforms unstructured documents into actionable, structured data. It is a comprehensive platform that combines advanced OCR, natural language processing, and machine learning to extract meaningful information from PDFs, images, spreadsheets, and more.
Around 80-90% of enterprise data lives within unstructured formats such as PDFs and DOCX files. Dex solves the most common challenges of programmatic document processing:
Format Diversity: Process any document type with a single APIโbusiness reports, financial documents, legal contracts, healthcare records, and more.
Unstructured Data: Convert complex layouts into structured JSON with semantic understanding, including text, tables, charts, and infographics.
Quality Variations: Handle scanned, handwritten, and low-quality documents with high accuracy across multiple languages.
Scalability: Process thousands of documents efficiently with built-in scalable infrastructure.
Flexibility: Choose from multiple OCR engines and customize extraction with your own tools and workflows.
Upload, retrieve, and securely store confidential documents with fine-grained access control. Supports persistent storage with metadata tracking, secure access patterns, and configurable data retention policies for automatic lifecycle management.
Choose the best OCR engine for your use caseโReducto for English and Latin-script documents, Iris for non-English and non-Latin scripts (Arabic, Hebrew, CJK, etc.), or integrate your own custom engine.
Configurable retention policies automatically manage the lifecycle of files and processing artifacts, helping you meet compliance requirements and optimize storage costs.