DEX model engine choices

DEX is the SDK that runs document parsing, extraction, retrieval and research flows. There are currently two options for the underlying parsing engine.

IRIS - Currently on V2, IRIS is Scale’s proprietary OCR & document extraction model that has the best performance across both arabic & english texts. This is highly customisable with different models available for layout and parsing steps with high parallelism across pages or sections available in later versions.
REDUCTO - Currently being deprecated - Reducto is a third party provider that provides end to end parsing of a wide range of documents without the performance & customisation available with IRIS V2

Quick Decision Guide

Use Iris (Default engine)

Iris(v2) is the recommended default DEX engine for all production use cases:

You can maximize accuracy through testing any OCR model uploaded to SGP
You can minimize latency through specialized optimization
You need greater transparency of the processes within Temporal
You can use specialized models that are either open source or have been built or specifically for unique document types
You can **customise and control pipelines ** for layout detection, OCR, and assembly

Use Reducto (Soon to be Deprecated)

Reducto is the legacy OCR DEX engine for most production use cases:

You can accept a third party service for OCR and parsing integrated into your application
You need proven stability and do not need future functionality or updates from the DEX team
You’re doing batch processing where somewhat high latency is acceptable
You want automatic scaling with your workload
You’re processing English or Germanic language documents
You need a low-complexity, managed solution

Understanding the Relationship Between Iris and Dex

Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:

Iris: Better accuracy & latency for all languages including Arabic & English
Custom engines: Integrate your own OCR solution
Reducto: (Deprecated)

What is Iris?

Iris is Scale’s OCR capability that provides a flexible, modular pipeline for extracting text from documents. It’s designed for teams with custom OCR needs who want complete control over the processing pipeline. Iris offers:

Customisable OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and any other specialized models uploaded to SGP
Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
Inspection capabilities: Save and review intermediate results at each processing stage
Extensibility: Add custom OCR models through SGP or layout detectors through simple adapters
Optimization flexibility: Fine-tune for maximum accuracy or minimum latency based on your needs
Custom pipeline experimentation: Test different combinations of layout detectors and OCR models
Specialized model development: Build custom OCR for unique document types or languages
Performance optimization: Tune for specific accuracy or latency requirements
Non-Germanic language optimization: Experiment with models to find best accuracy for other languages such as Arabic.

What is Dex?

Dex is Scale’s document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:

File Management: Secure upload, storage, and retrieval with access control
Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
Vector Stores: Index and search parsed documents with semantic embeddings
Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
Project Management: Organize and isolate data with proper authorization
Automatic scaling with your workload
Multiple OCR engine options: Iris (Recommended default, V2) & Reducto (soon to be deprecated)

Feature Comparison

Feature	Iris (✅ Recommended)	Reducto (⚠️ Soon to be Deprecated)
Engine	Scale proprietary OCR (V2)	Third-party end-to-end parsing
Best For	Custom OCR control, specialized requirements	Low-complexity managed solution, legacy production
Performance	Best accuracy and latency across Arabic and English; highly customizable	Wide document support without Iris V2 performance or customization
OCR Models	Customisable OCR models (e.g. Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and more) throguh SGP model engine	Managed Reducto engine
Pipeline Control	Configure layout detection and OCR as needed	Managed end-to-end
Languages	All languages, including Arabic	English or Germanic documents
Latency	Minimizable through high parallelism (V2)	Somewhat high (acceptable for batch)
Transparency	Greater visibility into processes within Temporal	Third-party service integrated into your application
Future updates	Active development from the DEX team	No future functionality or updates from the DEX team
Typical use case	Maximize accuracy, minimize latency, specialized or custom models	Batch processing where higher latency is acceptable

Recommended Workflow

For Standard Production Applications

Use Iris as the DEX default model engine:

SOTA performance: Accuracy optimization, latency minimization, or specialized models
Customisability: Test any OCR model hosted in SGP and pipeline configurations
Pipeline control: Configure layout detection and OCR as separate steps
Language coverage: Best accuracy and latency for Arabic, English, and other non-Germanic languages
Transparency: Inspect and debug processing stages within Temporal
Extensibility: Add custom OCR models or layout detectors through adapters
Future-ready: Benefit from ongoing development and Iris V2 parallelism improvements from the DEX team

Use Reducto within Dex for reliable, production-ready document processing:

Managed infrastructure: Rely on managed infrastructure for operational simplicity
Language fit: Process English or Germanic-language documents where Iris customization is not required
Low complexity: Use a third-party, end-to-end managed parsing service with minimal pipeline setup

Getting Started

Dex Documentation

Iris Documentation

Summary

Dex is the platform: DEX runs parsing, extraction, retrieval, and research flows—you choose the underlying engine when you parse documents.
Iris (V2) is the recommended default: Scale’s proprietary engine with the best Arabic and English performance, customisable OCR models, and full control over layout, OCR, and assembly.
Choose Iris when you need customization: Optimize accuracy or latency, inspect pipelines in Temporal, use specialized or custom models, and support non-Germanic languages.
Reducto is legacy and being deprecated: A third-party, managed end-to-end option with no future DEX updates—use only for existing low-complexity or English/Germanic batch workloads.
Reducto still fits narrow cases: Proven stability, auto-scaling, and acceptable batch latency when you do not need Iris V2 performance or pipeline control.

Need help deciding? Contact the Dex team at #dex-help on Slack.

Documentation Index

​Quick Decision Guide

​Use Iris (Default engine)

​Use Reducto (Soon to be Deprecated)

​Understanding the Relationship Between Iris and Dex

​What is Iris?

​What is Dex?

​Feature Comparison

​Recommended Workflow

​For Standard Production Applications

​Getting Started

​Dex Documentation

​Iris Documentation

​Summary