Getting Started with IRIS

This guide will help you get started with IRIS, Scale’s proprietary OCR capability integrated with Dex. You’ll learn how to parse documents with IRIS through the Dex SDK, work with parse results, and integrate IRIS into your document processing workflows.

Overview

IRIS is Scale’s proprietary OCR capability that provides complete pipeline control for custom document processing needs. When using IRIS through Dex, you get:

Use any OCR model: Experiment with different OCR engines to optimize for your specific documents. DEX can use any model hosted in SGP.
Complete Pipeline Control: Configure layout detection, OCR processing, and assembly independently
Unified Document Management: Upload and manage files through Dex’s file system
Async Processing: Non-blocking parse jobs that process documents in the background
Project Organization: Group files and parse results within Dex projects
Extensibility: Add custom OCR models or layout detectors for specialized needs

Prerequisites

Before using IRIS, ensure you have:

Dex SDK installed
SGP Account ID and API Key
Access to a Dex instance

Basic Usage

Initialize Dex Client

import os
from dex_sdk.client import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

# Initialize Dex client
dex_client = DexClient(
    base_url="your-dex-url",
    account_id = os.getenv("SGP_ACCOUNT_ID"),
    api_key = os.getenv("SGP_API_KEY"),
)

# Create a project
project = await dex_client.create_project(name="my-ocr-project")

Parse a Document with IRIS

What should I use for IrisParseEngineOptions? A common query when using Iris is “How can I get the best OCR results without spending time worrying about all the parameters in IrisParseEngineOptions?”. This section gives quick-start suggestions that you can use depending on your requirements that have been shown to perform consistently well across use-cases.

Option 1: I only need text, I don’t need bounding boxes.

Suggestion: text_ocr=“gpt-5.4” + layout=“whole_page” This bypasses the layout detection step, and instead passes the full page to the OCR model. This has been shown to consistently perform consistently well across benchmarks. Suggestion (open-source required): text_ocr="<Qwen/Qwen3.6-27B>" + layout="whole_page" This is the same as the previous suggestion, but with the open-source Qwen/Qwen3.6-27B model instead, which has been shown to almost match gpt-5.4 in English and often exceed gpt-5.4 in Arabic. Using the Qwen model with Iris requires a deployed model endpoint available through SGP.

from dex_sdk.types import IrisParseEngineOptions, IrisParseJobParams, ParseEngine

# Upload document
dex_file = await project.upload_file("document.pdf")

# Start IRIS parse job
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(
        engine=ParseEngine.IRIS,
        options=IrisParseEngineOptions(
            layout= "whole_page",
            text_ocr= "openai/gpt-5.4", #or text_ocr= "<path to Qwen/Qwen3.6-27B endpoint>" for the open-source suggestion.
        ),
    ),
)

# Get parse results
parse_result = await parse_job.get_result()

# Access parsed content
chunks = parse_result.data.content.chunks
for chunk in chunks:
    print(chunk.content)

Option 2: I need bounding boxes.

The default config values were designed for this case, so you only need to specify which OCR model to use. The suggestion and open-source suggestion is the same for this case, with the only difference is we do not specify layout so that it uses the default layout model option rt_detr_bce.

from dex_sdk.types import IrisParseEngineOptions, IrisParseJobParams, ParseEngine

# Upload document
dex_file = await project.upload_file("document.pdf")

# Start IRIS parse job
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(
        engine=ParseEngine.IRIS,
        options=IrisParseEngineOptions(
            text_ocr= "openai/gpt-5.4", #or text_ocr= "<path to Qwen/Qwen3.6-27B endpoint>" for the open-source suggestion.
        ),
    ),
)
# Get parse results
parse_result = await parse_job.get_result()

Custom evaluation

In order to get the highest OCR accuracy, you can run a custom evaluation workflow on your own data in SGP-Workflows. This includes evaluating the performance of the OCR models in Iris as well as comparing performance to Reducto. You can evaluate OCR results in 2 ways:

Evaluating the OCR performance against the ground truth using DEX & eval cards in SGP Compass.
Evaluate the impact of the OCR results on your downstream tasks by plugging in your own downstream tasks and evaluation workflow into our SGP-workflow template in Compass.

Supported Document Types

IRIS supports various document formats:

PDF documents (.pdf)
Images (.png, .jpg, .jpeg, .tiff)
Scanned documents with printed or handwritten text
Multi-page documents

Parsing PDFs

Basic PDF Parsing

from dex_sdk.types import IrisParseEngineOptions, IrisParseJobParams

# Upload PDF file
dex_file = await project.upload_file("path/to/document.pdf")

# Start IRIS parse job
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(
        options=IrisParseEngineOptions(),
    )
)

# Wait for and retrieve parse results
parse_result = await parse_job.get_result()

# Access parsed content
chunks = parse_result.data.content.chunks
for chunk in chunks:
    print(chunk.content)

Complete PDF Example

import os
from dex_sdk.client import DexClient
from dex_sdk.file import DexFile
from dex_sdk.types import ProjectCredentials, SGPCredentials, IrisParseEngineOptions, IrisParseJobParams
from dex_core.models.files import FileEntity


async def parse_pdf_example():
    # Initialize client
    dex_client = DexClient(
        base_url="your-dex-url",
        account_id = os.getenv("SGP_ACCOUNT_ID"),
        api_key = os.getenv("SGP_API_KEY"),
    )

    # Create project
    project = await dex_client.create_project(name="ocr-project")

    # Upload file
    dex_file = await project.upload_file("document.pdf")

    # Verify file upload
    assert isinstance(dex_file, DexFile)
    assert isinstance(dex_file.data, FileEntity)

    # Verify file is in project
    files = await project.list_files()
    assert len(files) > 0, "Project should contain at least one file"

    # Get download URL if needed
    download_url = await dex_file.get_download_url()
    assert download_url.startswith("http")

    # Start IRIS parse job
    parse_job = await dex_file.start_parse_job(
        IrisParseJobParams(
            options=IrisParseEngineOptions(),
        )
    )

    # Get parse results
    parse_result = await parse_job.get_result()

    # Access parsed content
    content_length = len(parse_result.data.content.chunks[0].content)
    print(f"Parsed content length: {content_length}")

Parsing Images

IRIS supports various image formats including PNG, JPG, and TIFF.

Basic Image Parsing

# Upload image file
dex_file = await project.upload_file("path/to/image.png")

# Start IRIS parse job
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(
        options=IrisParseEngineOptions(),
    )
)

# Get parse results
parse_result = await parse_job.get_result()

# Access chunks
chunks = parse_result.data.content.chunks
print(f"Number of chunks: {len(chunks)}")

Complete Image Example

async def parse_image_example():
    dex_client = DexClient(
        base_url="your-dex-url",
        account_id = os.getenv("SGP_ACCOUNT_ID"),
        api_key = os.getenv("SGP_API_KEY"),
    )

    project = await dex_client.create_project(name="ocr-project")

    # Upload PNG file
    dex_file = await project.upload_file("image.png")

    assert isinstance(dex_file, DexFile)
    assert isinstance(dex_file.data, FileEntity)

    # Verify file in project
    assert len(await project.list_files()) > 0

    # Get download URL
    download_url = await dex_file.get_download_url()
    assert download_url.startswith("http")

    # Start parse job
    parse_job = await dex_file.start_parse_job(
        IrisParseJobParams(
            options=IrisParseEngineOptions(),
        )
    )

    # Get results
    parse_result = await parse_job.get_result()

    # Access chunks
    print(f"Number of chunks: {len(parse_result.data.content.chunks)}")

Uploading Files from Memory

You can also upload files directly from memory streams:

import io

# Create file from bytes
file_bytes = b"your file content here"
file_stream = io.BytesIO(file_bytes)

# Upload from stream
dex_file = await project.upload_file(file_stream, filename="document.pdf")

# Parse as usual
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(
        options=IrisParseEngineOptions(),
    )
)

Configuration Options

IrisParseEngineOptions

Customize IRIS behavior using parse engine options:

from dex_sdk.types import IrisParseEngineOptions

# Configure IRIS options
options = IrisParseEngineOptions(
    # Configuration parameters available based on your needs
    # (specific options depend on your Dex version)
)

# Use custom options
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=options)
)

Understanding Parse Results

Result Structure

IRIS returns parsed content organized into chunks, where each chunk represents a section of the document:

parse_result = await parse_job.get_result()

# Access all chunks
chunks = parse_result.data.content.chunks

# A ParseResult contains a list of ParseChunks. Each chunk has the
# aggregated text plus the underlying ParseBlocks (one per detected
# region) with bounding boxes, page numbers, and region types.

for i, chunk in enumerate(parse_result.data.content.chunks):
    print(f"Chunk {i}: {chunk.content[:100]}...")
    for block in chunk.blocks:
        print(f"  [{block.type}] page {block.page_number} "
            f"bbox={block.bbox} conf={block.confidence:.2f}")
        print(f"    {block.content[:80]}...")

Working with Different Content Types

IRIS detects and processes different content regions: Each ParseBlock.type is one of the layout labels IRIS detects. The current label set:

Text-family: text, title, caption, list-item, section-header, page-header, page-footer, formula, footnote
Tables: table
Images / figures: picture

  TEXT_LABELS = {
      "text", "title", "caption", "list-item", "section-header",
      "page-header", "page-footer", "formula", "footnote",
  }

  text_blocks = [
      block
      for chunk in parse_result.data.content.chunks
      for block in chunk.blocks
      if block.type in TEXT_LABELS
  ]

  table_blocks = [
      block
      for chunk in parse_result.data.content.chunks
      for block in chunk.blocks
      if block.type == "table"
  ]

  image_blocks = [
      block
      for chunk in parse_result.data.content.chunks
      for block in chunk.blocks
      if block.type == "picture"
  ]

File Management

List Files in Project

# List all files
files = await project.list_files()

for file in files:
    print(f"File: {file.data.name}")
    print(f"  ID: {file.data.id}")
    print(f"  Size: {file.data.size_bytes} bytes")

Get File Download URL

# Get pre-signed S3 URL for downloading
download_url = await dex_file.get_download_url()

# Use URL with requests or other HTTP client
import requests
response = requests.get(download_url)
file_content = response.content

Check File Details

# Access file metadata
print(f"File name: {dex_file.data.name}")
print(f"File ID: {dex_file.data.id}")
print(f"File size: {dex_file.data.size_bytes} bytes")

# Check if file is a FileEntity
from dex_core.models.files import FileEntity
assert isinstance(dex_file.data, FileEntity)

Error Handling

Handle Upload Errors

try:
    dex_file = await project.upload_file("path/to/file.pdf")
except FileNotFoundError:
    print("File not found")
except Exception as e:
    print(f"Upload failed: {e}")

Handle Parse Errors

try:
    parse_job = await dex_file.start_parse_job(
        IrisParseJobParams(
            options=IrisParseEngineOptions(),
        )
    )
    parse_result = await parse_job.get_result()
except Exception as e:
    print(f"Parse failed: {e}")

Processing Multiple Documents

Batch Processing

import asyncio

async def process_documents(file_paths):
    # Upload all files
    dex_files = []
    for path in file_paths:
        dex_file = await project.upload_file(path)
        dex_files.append(dex_file)

    # Start all parse jobs
    parse_jobs = []
    for dex_file in dex_files:
        job = await dex_file.start_parse_job(
            IrisParseJobParams(options=IrisParseEngineOptions())
        )
        parse_jobs.append(job)

    # Wait for all to complete
    results = []
    for job in parse_jobs:
        result = await job.get_result()
        results.append(result)

    return results

# Usage
file_paths = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = await process_documents(file_paths)

Multi-Language Support

IRIS supports OCR in multiple languages, including:

Latin-based languages: English, Spanish, French, German, Italian, Portuguese, etc.
Arabic: With specialized model support
Asian languages: Chinese, Japanese, Korean
Other languages: Russian, Hebrew, Hindi, Thai, and more

The appropriate language models are selected automatically based on the document content.

Best Practices

Use Environment Variables for Credentials

import os

# Load credentials from environment
account_id = os.getenv("SGP_ACCOUNT_ID")
api_key = os.getenv("SGP_API_KEY")

if not account_id or not api_key:
    raise ValueError("SGP credentials not found in environment")

credentials = SGPCredentials(
    account_id=account_id,
    api_key=api_key
)

Organize Files by Project

# Create client
dex_client = DexClient(
    base_url="your-dex-url",
    api_key=sgp_credentials.api_key,
    account_id=sgp_credentials.account_id,
)

# Create separate projects for different document types
invoices_project = await dex_client.create_project(
    name="invoices"
)

contracts_project = await dex_client.create_project(
    name="contracts"
)

Wait for Parse Completion

# Parse jobs are async - ensure completion before accessing results
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)

# This waits for completion
parse_result = await parse_job.get_result()

# Now safe to access results
chunks = parse_result.data.content.chunks

Common Use Cases

Extract All Text from Document

def extract_all_text(parse_result):
    """Extract all text content from parse result."""
    all_text = []
    for chunk in parse_result.data.content.chunks:
        all_text.append(chunk.content)
    return "\n\n".join(all_text)

# Usage
parse_result = await parse_job.get_result()
full_text = extract_all_text(parse_result)
print(full_text)

Process Forms and Structured Documents

# IRIS automatically detects and extracts structured content
parse_result = await parse_job.get_result()

# Access structured chunks
for chunk in parse_result.data.content.chunks:
    # Process based on content type
    if hasattr(chunk, 'region_type'):
        if chunk.region_type == 'table':
            # Handle table data
            process_table(chunk.content)
        elif chunk.region_type == 'text':
            # Handle text data
            process_text(chunk.content)

Performance Considerations

Processing Time

Parse job duration depends on:

Document length (number of pages)
Image resolution and quality
Content complexity (tables, mixed layouts)
Selected OCR models

Optimization Tips

Batch processing: Process multiple documents concurrently when possible
Pre-processing: Ensure documents are properly oriented and of good quality
Project organization: Group related documents in the same project for better management

Next Steps

Now that you understand how to use IRIS:

Review the Introduction to IRIS to learn about the underlying OCR pipeline
Explore the Dex documentation for additional capabilities
Integrate IRIS into your document processing workflows
Use parsed results with Dex’s extraction and vector store features

For questions or support, please contact the Scale AI team.

Documentation Index

​Overview

​Prerequisites

​Basic Usage

​Initialize Dex Client

​Parse a Document with IRIS

​Option 1: I only need text, I don’t need bounding boxes.

​Option 2: I need bounding boxes.

​Custom evaluation

​Supported Document Types

​Parsing PDFs

​Basic PDF Parsing

​Complete PDF Example

​Parsing Images

​Basic Image Parsing

​Complete Image Example

​Uploading Files from Memory

​Configuration Options

​IrisParseEngineOptions

​Understanding Parse Results

​Result Structure

​Working with Different Content Types

​File Management

​List Files in Project

​Get File Download URL

​Check File Details

​Error Handling

​Handle Upload Errors

​Handle Parse Errors

​Processing Multiple Documents

​Batch Processing

​Multi-Language Support

​Best Practices

​Use Environment Variables for Credentials

​Organize Files by Project

​Wait for Parse Completion

​Common Use Cases

​Extract All Text from Document

​Process Forms and Structured Documents

​Performance Considerations

​Processing Time

​Optimization Tips

​Next Steps