DEX is the SDK that runs document parsing, extraction, retrieval and research flows. There are currently two options for the underlying parsing engine.Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
- IRIS - Currently on V2, IRIS is Scale’s proprietary OCR & document extraction model that has the best performance across both arabic & english texts. This is highly customisable with different models available for layout and parsing steps with high parallelism across pages or sections available in later versions.
- REDUCTO - Currently being deprecated - Reducto is a third party provider that provides end to end parsing of a wide range of documents without the performance & customisation available with IRIS V2
Quick Decision Guide
Use Iris (Default engine)
Iris(v2) is the recommended default DEX engine for all production use cases:- You can maximize accuracy through testing any OCR model uploaded to SGP
- You can minimize latency through specialized optimization
- You need greater transparency of the processes within Temporal
- You can use specialized models that are either open source or have been built or specifically for unique document types
- You can **customise and control pipelines ** for layout detection, OCR, and assembly
Use Reducto (Soon to be Deprecated)
Reducto is the legacy OCR DEX engine for most production use cases:- You can accept a third party service for OCR and parsing integrated into your application
- You need proven stability and do not need future functionality or updates from the DEX team
- You’re doing batch processing where somewhat high latency is acceptable
- You want automatic scaling with your workload
- You’re processing English or Germanic language documents
- You need a low-complexity, managed solution
Understanding the Relationship Between Iris and Dex
Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:- Iris: Better accuracy & latency for all languages including Arabic & English
- Custom engines: Integrate your own OCR solution
- Reducto: (Deprecated)
What is Iris?
Iris is Scale’s OCR capability that provides a flexible, modular pipeline for extracting text from documents. It’s designed for teams with custom OCR needs who want complete control over the processing pipeline. Iris offers:- Customisable OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and any other specialized models uploaded to SGP
- Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
- Inspection capabilities: Save and review intermediate results at each processing stage
- Extensibility: Add custom OCR models through SGP or layout detectors through simple adapters
- Optimization flexibility: Fine-tune for maximum accuracy or minimum latency based on your needs
- Custom pipeline experimentation: Test different combinations of layout detectors and OCR models
- Specialized model development: Build custom OCR for unique document types or languages
- Performance optimization: Tune for specific accuracy or latency requirements
- Non-Germanic language optimization: Experiment with models to find best accuracy for other languages such as Arabic.
What is Dex?
Dex is Scale’s document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:- File Management: Secure upload, storage, and retrieval with access control
- Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
- Vector Stores: Index and search parsed documents with semantic embeddings
- Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
- Project Management: Organize and isolate data with proper authorization
- Automatic scaling with your workload
- Multiple OCR engine options: Iris (Recommended default, V2) & Reducto (soon to be deprecated)
Feature Comparison
| Feature | Iris (✅ Recommended) | Reducto (⚠️ Soon to be Deprecated) |
|---|---|---|
| Engine | Scale proprietary OCR (V2) | Third-party end-to-end parsing |
| Best For | Custom OCR control, specialized requirements | Low-complexity managed solution, legacy production |
| Performance | Best accuracy and latency across Arabic and English; highly customizable | Wide document support without Iris V2 performance or customization |
| OCR Models | Customisable OCR models (e.g. Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and more) throguh SGP model engine | Managed Reducto engine |
| Pipeline Control | Configure layout detection and OCR as needed | Managed end-to-end |
| Languages | All languages, including Arabic | English or Germanic documents |
| Latency | Minimizable through high parallelism (V2) | Somewhat high (acceptable for batch) |
| Transparency | Greater visibility into processes within Temporal | Third-party service integrated into your application |
| Future updates | Active development from the DEX team | No future functionality or updates from the DEX team |
| Typical use case | Maximize accuracy, minimize latency, specialized or custom models | Batch processing where higher latency is acceptable |
Recommended Workflow
For Standard Production Applications
Use Iris as the DEX default model engine:- SOTA performance: Accuracy optimization, latency minimization, or specialized models
- Customisability: Test any OCR model hosted in SGP and pipeline configurations
- Pipeline control: Configure layout detection and OCR as separate steps
- Language coverage: Best accuracy and latency for Arabic, English, and other non-Germanic languages
- Transparency: Inspect and debug processing stages within Temporal
- Extensibility: Add custom OCR models or layout detectors through adapters
- Future-ready: Benefit from ongoing development and Iris V2 parallelism improvements from the DEX team
- Managed infrastructure: Rely on managed infrastructure for operational simplicity
- Language fit: Process English or Germanic-language documents where Iris customization is not required
- Low complexity: Use a third-party, end-to-end managed parsing service with minimal pipeline setup
Getting Started
Dex Documentation
Iris Documentation
Summary
- Dex is the platform: DEX runs parsing, extraction, retrieval, and research flows—you choose the underlying engine when you parse documents.
- Iris (V2) is the recommended default: Scale’s proprietary engine with the best Arabic and English performance, customisable OCR models, and full control over layout, OCR, and assembly.
- Choose Iris when you need customization: Optimize accuracy or latency, inspect pipelines in Temporal, use specialized or custom models, and support non-Germanic languages.
- Reducto is legacy and being deprecated: A third-party, managed end-to-end option with no future DEX updates—use only for existing low-complexity or English/Germanic batch workloads.
- Reducto still fits narrow cases: Proven stability, auto-scaling, and acceptable batch latency when you do not need Iris V2 performance or pipeline control.
#dex-help on Slack.
