Capabilities Changelog

This page tracks updates and additions to Scale’s Capabilities documentation.

Latest Updates

Dex Document Understanding: Version 0.3.2

Released: November 2025 Significant updates to the Dex document understanding service and SDK, including authentication changes, data retention policies, and comprehensive documentation improvements.

🔐 Authentication Changes

Breaking Change: The authentication method for Dex has been updated to improve security and simplify the API. What Changed:

SGP credentials are now passed directly to DexClient instead of being stored in project configurations
Every API request is now authenticated using your SGP API key and account ID
The ProjectCredentials and SGPCredentials types are deprecated and will be removed in a future version

Migration Required: Old way (deprecated):

from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

dex_client = DexClient(base_url="https://dex.sgp.scale.com")

project = await dex_client.create_project(
    name="My Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id=os.getenv("SGP_ACCOUNT_ID"),
            api_key=os.getenv("SGP_API_KEY"),
        ),
    ),
)

New way (version 0.3.2+):

import os
from dex_sdk import DexClient

# Pass credentials when creating the client
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project without credentials parameter
project = await dex_client.create_project(
    name="My Project",
)

Benefits:

Enhanced Security: SGP credentials are no longer stored in the Dex database
Simpler API: Credentials are set once at client initialization
Consistent Authentication: Every request is authenticated the same way

🗄️ Data Retention Policies

New Feature: Dex now supports configurable data retention policies for automatic lifecycle management of files and processing artifacts. What’s New:

Automatic cleanup: Set retention periods for files and result artifacts to automatically delete data after a specified time
Flexible configuration: Configure different retention periods for files vs. processing artifacts
Project-level control: Retention policies are configured per project and can be updated at any time

Usage: Configure retention when creating or updating a project:

from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

project = await dex_client.create_project(
    name="My Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),           # Files expire after 30 days
            result_artifacts=timedelta(days=7),  # Parse/extract results expire after 7 days
        )
    )
)

Use Cases:

Compliance: Meet regulatory requirements (GDPR, HIPAA) by enforcing data retention limits
Cost Management: Reduce storage costs by automatically cleaning up old files and artifacts
Security: Minimize data exposure by limiting how long sensitive documents are stored

See the Getting Started with Dex guide for detailed examples.

📚 Documentation Improvements

Major Update: Comprehensive expansion of the Dex SDK documentation with detailed type information and practical examples. API Reference Enhancements:

Common Types Section: Added comprehensive documentation for all core data models organized by category:
- Project Types: ProjectEntity, ProjectStatus, ProjectConfiguration, RetentionPolicy
- File Types: FileEntity, FileStatus, FileDownloadURL
- Job Types: JobEntity, JobOperationType, JobStatus
- Parse Types: ParseResultEntity, ParseResultMetadata, ParseChunk, ParseBlock, BoundingBox, ParseEngine, ReductoChunkingMethod
- Extraction Types: ExtractionEntity, ExtractionResult, ExtractedField, ExtractionCitation, ExtractionParameters, UsageInfo
- Vector Store Types: VectorStoreEntity, VectorStoreEngines, VectorStoreChunk, SearchConfig
Parse Job Parameters: New section documenting parsing configuration options

Enhanced Examples:

Working with Parse Results: How to access parse metadata, iterate through chunks and blocks, access bounding boxes and confidence scores
Working with Extraction Results: How to access extracted field values, citations, confidence scores, and token usage information

Updated Documentation:

Introduction: Updated File Management section to mention data retention policies
Getting Started Guide: Added “Configuring Data Retention” section with practical examples
API Reference: Enhanced type documentation and examples

🔧 Type System Updates

ParseResult Naming Changes: The ParseResult classes have been harmonized for consistency:

Request objects now end with *Request (e.g., CustomParseResultRequest)
Entity objects now end with *Entity (e.g., ParseResultEntity)

Migration:

# Old import (deprecated)
from dex_sdk.types import CustomParseResult

# New import
from dex_sdk.types import CustomParseResultRequest

Improved Type Safety:

Fixed typing inconsistencies for entities returned by the DEX API
Better type hints for all SDK methods
Enhanced autocomplete support in IDEs

How to Update

1. Install/Update SDK:

pip install --upgrade sdk/dex_core-xxx.whl sdk/dex_sdk-xxx.whl

2. Update Your Code:

Add api_key and account_id parameters to DexClient() initialization
Remove credentials parameter from create_project() calls
Remove imports of ProjectCredentials and SGPCredentials

3. Test Your Integration: Run your scripts to ensure they work with the new authentication method. MCP Integration Status: Authentication via Model Context Protocol (MCP) is still in progress. If you encounter issues, please reach out to the Dex team at #dex.

New Capability: IRIS OCR

Added: October 2024 IRIS is Scale’s OCR capability that transforms document images and PDFs into structured text through an intelligent multi-stage pipeline.

What’s New

Two new documentation pages:

Introduction to IRIS
- Overview of IRIS OCR capability
- Three-stage pipeline architecture (layout detection, OCR processing, assembly)
- 15+ supported OCR models including open-source and vision-language models
- Multi-language support with specialized Arabic models
- Common use cases and key advantages
Getting Started with IRIS
- Comprehensive guide to using IRIS through Dex SDK
- Prerequisites and setup instructions
- Parsing PDFs and images with complete examples
- Configuration options for parse engine
- Understanding parse results and chunk structure
- File management and error handling
- Batch processing examples
- Multi-language support details
- Best practices for production use
- Performance considerations and optimization tips

Key Features

Layout-Aware Processing: Automatically detects text, tables, and images before OCR
Multiple OCR Engines: Choose from Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and more
Table-Specific Processing: Specialized models optimized for accurate table extraction
Multi-Language Support: Process documents in 35+ languages including Arabic
Dex Integration: Seamless integration with Dex’s document understanding platform
Async Processing: Non-blocking parse jobs with project-based organization

How to Access

IRIS is available through the Dex SDK as a parse engine option:

from dex_core.models.parse_job import IrisParseEngineOptions, IrisParseJobParams

parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)

Configuration Updates

Updated: October 2024

Added explicit V5 (beta) version tags to all Capabilities navigation groups
Ensures proper scoping of Capabilities documentation to V5
Improved navigation organization for better user experience

Affected Sections:

Getting Started
Document Understanding
OCR
Workflows

Support and Feedback

Dex-Specific Questions

For questions or issues related to Dex:

Slack: #dex
Documentation: Getting Started with Dex
API Reference: Dex SDK API Reference

General Capabilities Documentation

Have suggestions for improving our Capabilities documentation? Please contact the Scale AI team or submit feedback through your account dashboard.

Getting Started

Document Understanding

OCR

Workflows

​Latest Updates

​Dex Document Understanding: Version 0.3.2

​🔐 Authentication Changes

​🗄️ Data Retention Policies

​📚 Documentation Improvements

​🔧 Type System Updates

​How to Update

​New Capability: IRIS OCR

​What’s New

​Key Features

​How to Access

​Configuration Updates

​Navigation Structure Improvements

​Support and Feedback

​Dex-Specific Questions

​General Capabilities Documentation