Skip to main content
This page tracks updates and additions to Scale’s Capabilities documentation.

Latest Updates

Dex Document Understanding: Version 0.3.2

Released: November 2025 Significant updates to the Dex document understanding service and SDK, including authentication changes, data retention policies, and comprehensive documentation improvements.

🔐 Authentication Changes

Breaking Change: The authentication method for Dex has been updated to improve security and simplify the API. What Changed:
  • SGP credentials are now passed directly to DexClient instead of being stored in project configurations
  • Every API request is now authenticated using your SGP API key and account ID
  • The ProjectCredentials and SGPCredentials types are deprecated and will be removed in a future version
Migration Required: Old way (deprecated):
from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

dex_client = DexClient(base_url="https://dex.sgp.scale.com")

project = await dex_client.create_project(
    name="My Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id=os.getenv("SGP_ACCOUNT_ID"),
            api_key=os.getenv("SGP_API_KEY"),
        ),
    ),
)
New way (version 0.3.2+):
import os
from dex_sdk import DexClient

# Pass credentials when creating the client
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project without credentials parameter
project = await dex_client.create_project(
    name="My Project",
)
Benefits:
  • Enhanced Security: SGP credentials are no longer stored in the Dex database
  • Simpler API: Credentials are set once at client initialization
  • Consistent Authentication: Every request is authenticated the same way

🗄️ Data Retention Policies

New Feature: Dex now supports configurable data retention policies for automatic lifecycle management of files and processing artifacts. What’s New:
  • Automatic cleanup: Set retention periods for files and result artifacts to automatically delete data after a specified time
  • Flexible configuration: Configure different retention periods for files vs. processing artifacts
  • Project-level control: Retention policies are configured per project and can be updated at any time
Usage: Configure retention when creating or updating a project:
from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

project = await dex_client.create_project(
    name="My Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),           # Files expire after 30 days
            result_artifacts=timedelta(days=7),  # Parse/extract results expire after 7 days
        )
    )
)
Use Cases:
  • Compliance: Meet regulatory requirements (GDPR, HIPAA) by enforcing data retention limits
  • Cost Management: Reduce storage costs by automatically cleaning up old files and artifacts
  • Security: Minimize data exposure by limiting how long sensitive documents are stored
See the Getting Started with Dex guide for detailed examples.

📚 Documentation Improvements

Major Update: Comprehensive expansion of the Dex SDK documentation with detailed type information and practical examples. API Reference Enhancements:
  • Common Types Section: Added comprehensive documentation for all core data models organized by category:
    • Project Types: ProjectEntity, ProjectStatus, ProjectConfiguration, RetentionPolicy
    • File Types: FileEntity, FileStatus, FileDownloadURL
    • Job Types: JobEntity, JobOperationType, JobStatus
    • Parse Types: ParseResultEntity, ParseResultMetadata, ParseChunk, ParseBlock, BoundingBox, ParseEngine, ReductoChunkingMethod
    • Extraction Types: ExtractionEntity, ExtractionResult, ExtractedField, ExtractionCitation, ExtractionParameters, UsageInfo
    • Vector Store Types: VectorStoreEntity, VectorStoreEngines, VectorStoreChunk, SearchConfig
  • Parse Job Parameters: New section documenting parsing configuration options
Enhanced Examples:
  • Working with Parse Results: How to access parse metadata, iterate through chunks and blocks, access bounding boxes and confidence scores
  • Working with Extraction Results: How to access extracted field values, citations, confidence scores, and token usage information
Updated Documentation:
  • Introduction: Updated File Management section to mention data retention policies
  • Getting Started Guide: Added “Configuring Data Retention” section with practical examples
  • API Reference: Enhanced type documentation and examples

🔧 Type System Updates

ParseResult Naming Changes: The ParseResult classes have been harmonized for consistency:
  • Request objects now end with *Request (e.g., CustomParseResultRequest)
  • Entity objects now end with *Entity (e.g., ParseResultEntity)
Migration:
# Old import (deprecated)
from dex_sdk.types import CustomParseResult

# New import
from dex_sdk.types import CustomParseResultRequest
Improved Type Safety:
  • Fixed typing inconsistencies for entities returned by the DEX API
  • Better type hints for all SDK methods
  • Enhanced autocomplete support in IDEs

How to Update

1. Install/Update SDK:
pip install --upgrade sdk/dex_core-xxx.whl sdk/dex_sdk-xxx.whl
2. Update Your Code:
  1. Add api_key and account_id parameters to DexClient() initialization
  2. Remove credentials parameter from create_project() calls
  3. Remove imports of ProjectCredentials and SGPCredentials
3. Test Your Integration: Run your scripts to ensure they work with the new authentication method. MCP Integration Status: Authentication via Model Context Protocol (MCP) is still in progress. If you encounter issues, please reach out to the Dex team at #dex.

New Capability: IRIS OCR

Added: October 2024 IRIS is Scale’s OCR capability that transforms document images and PDFs into structured text through an intelligent multi-stage pipeline.

What’s New

Two new documentation pages:
  1. Introduction to IRIS
    • Overview of IRIS OCR capability
    • Three-stage pipeline architecture (layout detection, OCR processing, assembly)
    • 15+ supported OCR models including open-source and vision-language models
    • Multi-language support with specialized Arabic models
    • Common use cases and key advantages
  2. Getting Started with IRIS
    • Comprehensive guide to using IRIS through Dex SDK
    • Prerequisites and setup instructions
    • Parsing PDFs and images with complete examples
    • Configuration options for parse engine
    • Understanding parse results and chunk structure
    • File management and error handling
    • Batch processing examples
    • Multi-language support details
    • Best practices for production use
    • Performance considerations and optimization tips

Key Features

  • Layout-Aware Processing: Automatically detects text, tables, and images before OCR
  • Multiple OCR Engines: Choose from Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and more
  • Table-Specific Processing: Specialized models optimized for accurate table extraction
  • Multi-Language Support: Process documents in 35+ languages including Arabic
  • Dex Integration: Seamless integration with Dex’s document understanding platform
  • Async Processing: Non-blocking parse jobs with project-based organization

How to Access

IRIS is available through the Dex SDK as a parse engine option:
from dex_core.models.parse_job import IrisParseEngineOptions, IrisParseJobParams

parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)

Configuration Updates

Updated: October 2024
  • Added explicit V5 (beta) version tags to all Capabilities navigation groups
  • Ensures proper scoping of Capabilities documentation to V5
  • Improved navigation organization for better user experience
Affected Sections:
  • Getting Started
  • Document Understanding
  • OCR
  • Workflows

Support and Feedback

Dex-Specific Questions

For questions or issues related to Dex:

General Capabilities Documentation

Have suggestions for improving our Capabilities documentation? Please contact the Scale AI team or submit feedback through your account dashboard.