Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt

Use this file to discover all available pages before exploring further.

DEX is the SDK that runs document parsing, extraction, retrieval and research flows. There are currently two options for the underlying parsing engine.
  1. IRIS - Currently on V2, IRIS is Scale’s proprietary OCR & document extraction model that has the best performance across both arabic & english texts. This is highly customisable with different models available for layout and parsing steps with high parallelism across pages or sections available in later versions.
  2. REDUCTO - Currently being deprecated - Reducto is a third party provider that provides end to end parsing of a wide range of documents without the performance & customisation available with IRIS V2

Quick Decision Guide

Use Iris (Default engine)

Iris(v2) is the recommended default DEX engine for all production use cases:
  • You can maximize accuracy through testing any OCR model uploaded to SGP
  • You can minimize latency through specialized optimization
  • You need greater transparency of the processes within Temporal
  • You can use specialized models that are either open source or have been built or specifically for unique document types
  • You can **customise and control pipelines ** for layout detection, OCR, and assembly

Use Reducto (Soon to be Deprecated)

Reducto is the legacy OCR DEX engine for most production use cases:
  • You can accept a third party service for OCR and parsing integrated into your application
  • You need proven stability and do not need future functionality or updates from the DEX team
  • You’re doing batch processing where somewhat high latency is acceptable
  • You want automatic scaling with your workload
  • You’re processing English or Germanic language documents
  • You need a low-complexity, managed solution

Understanding the Relationship Between Iris and Dex

Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:
  • Iris: Better accuracy & latency for all languages including Arabic & English
  • Custom engines: Integrate your own OCR solution
  • Reducto: (Deprecated)

What is Iris?

Iris is Scale’s OCR capability that provides a flexible, modular pipeline for extracting text from documents. It’s designed for teams with custom OCR needs who want complete control over the processing pipeline. Iris offers:
  • Customisable OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and any other specialized models uploaded to SGP
  • Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
  • Inspection capabilities: Save and review intermediate results at each processing stage
  • Extensibility: Add custom OCR models through SGP or layout detectors through simple adapters
  • Optimization flexibility: Fine-tune for maximum accuracy or minimum latency based on your needs
  • Custom pipeline experimentation: Test different combinations of layout detectors and OCR models
  • Specialized model development: Build custom OCR for unique document types or languages
  • Performance optimization: Tune for specific accuracy or latency requirements
  • Non-Germanic language optimization: Experiment with models to find best accuracy for other languages such as Arabic.

What is Dex?

Dex is Scale’s document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:
  • File Management: Secure upload, storage, and retrieval with access control
  • Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
  • Vector Stores: Index and search parsed documents with semantic embeddings
  • Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
  • Project Management: Organize and isolate data with proper authorization
  • Automatic scaling with your workload
  • Multiple OCR engine options: Iris (Recommended default, V2) & Reducto (soon to be deprecated)

Feature Comparison

FeatureIris (✅ Recommended)Reducto (⚠️ Soon to be Deprecated)
EngineScale proprietary OCR (V2)Third-party end-to-end parsing
Best ForCustom OCR control, specialized requirementsLow-complexity managed solution, legacy production
PerformanceBest accuracy and latency across Arabic and English; highly customizableWide document support without Iris V2 performance or customization
OCR ModelsCustomisable OCR models (e.g. Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and more) throguh SGP model engineManaged Reducto engine
Pipeline ControlConfigure layout detection and OCR as neededManaged end-to-end
LanguagesAll languages, including ArabicEnglish or Germanic documents
LatencyMinimizable through high parallelism (V2)Somewhat high (acceptable for batch)
TransparencyGreater visibility into processes within TemporalThird-party service integrated into your application
Future updatesActive development from the DEX teamNo future functionality or updates from the DEX team
Typical use caseMaximize accuracy, minimize latency, specialized or custom modelsBatch processing where higher latency is acceptable

For Standard Production Applications

Use Iris as the DEX default model engine:
  1. SOTA performance: Accuracy optimization, latency minimization, or specialized models
  2. Customisability: Test any OCR model hosted in SGP and pipeline configurations
  3. Pipeline control: Configure layout detection and OCR as separate steps
  4. Language coverage: Best accuracy and latency for Arabic, English, and other non-Germanic languages
  5. Transparency: Inspect and debug processing stages within Temporal
  6. Extensibility: Add custom OCR models or layout detectors through adapters
  7. Future-ready: Benefit from ongoing development and Iris V2 parallelism improvements from the DEX team
Use Reducto within Dex for reliable, production-ready document processing:
  1. Managed infrastructure: Rely on managed infrastructure for operational simplicity
  2. Language fit: Process English or Germanic-language documents where Iris customization is not required
  3. Low complexity: Use a third-party, end-to-end managed parsing service with minimal pipeline setup

Getting Started

Dex Documentation

Iris Documentation

Summary

  • Dex is the platform: DEX runs parsing, extraction, retrieval, and research flows—you choose the underlying engine when you parse documents.
  • Iris (V2) is the recommended default: Scale’s proprietary engine with the best Arabic and English performance, customisable OCR models, and full control over layout, OCR, and assembly.
  • Choose Iris when you need customization: Optimize accuracy or latency, inspect pipelines in Temporal, use specialized or custom models, and support non-Germanic languages.
  • Reducto is legacy and being deprecated: A third-party, managed end-to-end option with no future DEX updates—use only for existing low-complexity or English/Germanic batch workloads.
  • Reducto still fits narrow cases: Proven stability, auto-scaling, and acceptable batch latency when you do not need Iris V2 performance or pipeline control.
Need help deciding? Contact the Dex team at #dex-help on Slack.