Dex SDK API Reference

This reference documents the Python SDK methods for Scale’s Dex document understanding capability.

DexClient

The main client for interacting with the Dex service.

Project Management

create_project(name, configuration) - Create a new project with optional configuration
list_projects() - List all accessible projects
get_project(project_id) - Retrieve a specific project
update_project(project_id, updates) - Update project name, configuration, or status

Example:

import os
from datetime import timedelta
from dex_sdk import DexClient
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

# Initialize client with SGP credentials
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project (credentials are passed via client initialization)
project = await dex_client.create_project(
    name="My Project",
)

# Create project with data retention policy
project = await dex_client.create_project(
    name="My Compliant Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),
            result_artifacts=timedelta(days=7),
        )
    )
)

# Update project configuration
await dex_client.update_project(
    project_id=project.id,
    updates={
        "configuration": ProjectConfiguration(
            retention=RetentionPolicy(
                files=timedelta(days=90),
                result_artifacts=timedelta(days=30),
            )
        )
    }
)

Project

Represents a Dex project with isolated data and credentials.

File Operations

upload_file(file_path) - Upload a document to the project
list_files() - List all uploaded files
get_file(file_id) - Get file metadata
download_file(file_id) - Download file content

Example:

# Upload a file
dex_file = await project.upload_file("path/to/document.pdf")

# List all files
files = await project.list_files()

Vector Store Operations

create_vector_store(name, engine, embedding_model) - Create a vector store with SGP Knowledge Base engine
list_vector_stores() - List all vector stores
get_vector_store(vector_store_id) - Get vector store details
delete_vector_store(vector_store_id) - Delete a vector store

Example:

from dex_sdk.types import VectorStoreEngines

vector_store = await project.create_vector_store(
    name="My Vector Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)

DexFile

Represents an uploaded file in Dex.

Parsing

parse(params) - Parse document to structured format

Example:

from dex_sdk.types import (
    ParseEngine,
    ReductoParseJobParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

parse_result = await dex_file.parse(
    ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    )
)

Working with Parse Results

After parsing, you can access the structured content including chunks and blocks. Example:

# Parse the file
parse_result = await dex_file.parse(parse_params)

# Access metadata
metadata = parse_result.parse_metadata
print(f"Source: {metadata.filename} ({metadata.pages_processed} pages, engine: {parse_result.engine})")

# Access content chunks
for i, chunk in enumerate(parse_result.content.chunks):
    print(f"\nChunk {i}: {chunk.content[:100]}... ({len(chunk.blocks)} blocks)")

    for block in chunk.blocks:
        print(f"  [{block.type}] Page {block.page_number}, "
              f"confidence: {block.confidence:.2f}, "
              f"pos: ({block.bbox.left:.2f}, {block.bbox.top:.2f})")

ParseResult

Represents the result of a document parsing operation.

Extraction

extract(extraction_schema, user_prompt, model, generate_citations, generate_confidence) - Extract structured data with user prompt, schema, model, and options

Parameters:

extraction_schema (BaseModel): Pydantic model class for extraction (pass the class directly, not model_json_schema())
user_prompt (str): Natural language instructions for extraction
model (str): LLM model to use (e.g., “openai/gpt-4o”)
generate_citations (bool): Include source citations in results
generate_confidence (bool): Include confidence scores in results

Example:

from pydantic import BaseModel, Field

class InvoiceData(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    total_amount: float = Field(description="Total amount in dollars")
    date: str = Field(description="Invoice date")

extract_result = await parse_result.extract(
    extraction_schema=InvoiceData,
    user_prompt="Extract invoice details from this document.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

Working with Extraction Results

After extraction, you can access the structured data, citations, and confidence scores. Example:

# Extract data
extract_result = await parse_result.extract(
    extraction_schema=InvoiceData,
    user_prompt="Extract invoice details from this document.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

# Access the extraction result
result = extract_result.result

# Access structured data
for field_name, field in result.data.items():
    print(f"{field_name}: {field.value} (confidence: {field.confidence:.2f})")

    if field.citations:
        for cite in field.citations:
            loc = f", pos: ({cite.bbox.left:.2f}, {cite.bbox.top:.2f})" if cite.bbox else ""
            print(f"  → Page {cite.page}: {cite.content[:50]}...{loc}")

# Access usage information
if result.usage_info:
    usage = result.usage_info
    print(f"\nTokens: {usage.total_tokens} total ({usage.prompt_tokens} prompt + {usage.completion_tokens} completion)")

VectorStore

Represents a vector store for semantic search and RAG-enhanced extraction.

Indexing

add_parse_results(parse_result_ids) - Add parsed documents to vector store by parse result IDs
remove_files(file_ids) - Remove files from index

Example:

# Add parsed documents to vector store
await vector_store.add_parse_results([parse_result.id])

# Remove files from vector store
await vector_store.remove_files([file_id])

Search

search(query, top_k, filters) - Semantic search across all documents in the vector store
search_in_file(file_id, query, top_k, filters) - Search within a specific file with optional filters

Example:

# Search across all documents
results = await vector_store.search(
    query="What is the total revenue?",
    top_k=5,
)

# Search within a specific file
file_results = await vector_store.search_in_file(
    file_id=dex_file.id,
    query="What is the total revenue?",
    top_k=5,
    filters=None,
)

Extraction

extract(extraction_schema, user_prompt, model, generate_citations, generate_confidence) - Extract structured data from entire vector store with RAG context

Example:

# Extract from vector store with RAG context
extract_result = await vector_store.extract(
    extraction_schema=FinancialData,
    user_prompt="Extract financial summary from all documents.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

Parse Job Parameters

When parsing documents, you can specify different engines and options to customize the parsing behavior.

Reducto Parse Parameters

ReductoParseJobParams - Parameters for the Reducto OCR engine. Best for: English and Latin-script documents (Spanish, French, German, Italian, Portuguese, etc.) with tables, figures, and complex layouts. Fields:

engine (ParseEngine): Set to ParseEngine.REDUCTO
options (ReductoParseEngineOptions): Parsing options
advanced_options (dict): Advanced options for fine-tuning
experimental_options (dict): Experimental features
priority (bool): Whether to prioritize this job (default: False)

ReductoParseEngineOptions:

chunking (ReductoChunkingOptions | None): Chunking configuration

ReductoChunkingOptions:

chunk_mode (ReductoChunkingMethod): Chunking method (default: VARIABLE)
- DISABLED: No chunking
- BLOCK: Block-level chunks
- PAGE: Page-level chunks
- PAGE_SECTIONS: Page sections
- SECTION: Section-level chunks
- VARIABLE: Variable-size chunks based on content
chunk_size (int | None): Custom chunk size

Example:

from dex_sdk.types import (
    ParseEngine,
    ReductoParseJobParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

parse_params = ReductoParseJobParams(
    engine=ParseEngine.REDUCTO,
    options=ReductoParseEngineOptions(
        chunking=ReductoChunkingOptions(
            chunk_mode=ReductoChunkingMethod.VARIABLE,
            chunk_size=None,
        )
    ),
    priority=False,
)

parse_result = await dex_file.parse(parse_params)

Iris Parse Parameters

IrisParseJobParams - Parameters for the Iris OCR engine. Best for: Non-English, non-Latin scripts including Arabic, Hebrew, Chinese (CJK), Japanese, Korean, Thai, Hindi, and other Indic languages. Fields:

engine (ParseEngine): Set to ParseEngine.IRIS
options (IrisParseEngineOptions): Parsing options

IrisParseEngineOptions:

layout (str | None): Layout detection model to use
text_ocr (str | None): Text OCR model to use
table_ocr (str | None): Table OCR model to use
text_prompt (str | None): Custom prompt for text extraction (VLMs only)
table_prompt (str | None): Custom prompt for table extraction (VLMs only)
left_to_right (bool | None): Sort regions left-to-right instead of right-to-left (default: False)
confidence_threshold (float | None): Minimum confidence threshold for layout detection
containment_threshold (float | None): Containment threshold for filtering overlapping boxes

Example:

from dex_sdk.types import (
    ParseEngine,
    IrisParseJobParams,
    IrisParseEngineOptions,
)

parse_params = IrisParseJobParams(
    engine=ParseEngine.IRIS,
    options=IrisParseEngineOptions(
        layout="layout_model_v1",
        text_ocr="text_ocr_v1",
        confidence_threshold=0.5,
    )
)

parse_result = await dex_file.parse(parse_params)

Common Types

This section documents the core data models and types used throughout the Dex SDK.

Type Categories

Importable Types - Types you can import from dex_sdk.types to configure your requests:

Configuration types (ProjectConfiguration, RetentionPolicy)
Parse parameter types (ReductoParseJobParams, IrisParseJobParams, etc.)
Enum types (ParseEngine, ReductoChunkingMethod, VectorStoreEngines)

Response Types - Types returned by the SDK, accessible via the .data attribute on wrapper objects:

When you call SDK methods, you get wrapper objects (DexProject, DexFile, DexParseResult, etc.)
Access the underlying data via .data: project.data.id, file.data.filename
These entities are automatically validated but don’t need to be imported

Configuration Types

ProjectConfiguration

Configuration options for a Dex project. Import: from dex_sdk.types import ProjectConfiguration Fields:

retention (RetentionPolicy | None): Data retention policy for the project

Example:

from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

config = ProjectConfiguration(
    retention=RetentionPolicy(
        files=timedelta(days=30),
        result_artifacts=timedelta(days=7),
    )
)

RetentionPolicy

Defines data retention periods for automatic cleanup of files and processing artifacts. Import: from dex_sdk.types import RetentionPolicy Fields:

files (timedelta | None): Retention period for uploaded files. Files older than this period are automatically deleted. If None, files are retained indefinitely.
result_artifacts (timedelta | None): Retention period for parse results, extraction results, and job artifacts. If None, artifacts are retained indefinitely.

Example:

from datetime import timedelta
from dex_sdk.types import RetentionPolicy

# 30-day file retention, 7-day artifact retention
policy = RetentionPolicy(
    files=timedelta(days=30),
    result_artifacts=timedelta(days=7),
)

# Keep files indefinitely, but clean up artifacts after 14 days
policy = RetentionPolicy(
    files=None,
    result_artifacts=timedelta(days=14),
)

Use Cases:

Compliance: Meet regulatory requirements (GDPR, HIPAA, etc.)
Cost Management: Automatically clean up old data to reduce storage costs
Security: Limit exposure of sensitive documents by enforcing retention limits

Note: The retention period is calculated from the creation time of the file or artifact. Retention policies can be updated at any time using update_project().

ExtractionParameters

Parameters for extraction operations. Import: from dex_sdk.types import ExtractionParameters Fields:

model (str): LLM model to use (e.g., "openai/gpt-4o")
model_kwargs (dict | None): Additional kwargs for the LLM model
extraction_schema (dict): JSON schema defining the desired output structure
system_prompt (str | None): High-level instructions for the extraction model
user_prompt (str | None): Specific hints about the current document
generate_citations (bool): Whether to return bounding boxes for extracted values (default: True)
generate_confidence (bool): Whether to return confidence scores (default: True)

Parse Configuration Types

ParseEngine

Enum of available OCR engines. Import: from dex_sdk.types import ParseEngine Values:

REDUCTO = “reducto”
IRIS = “iris”
CUSTOM = “custom”

ReductoParseJobParams

Parameters for the Reducto OCR engine. Import: from dex_sdk.types import ReductoParseJobParams See the Parse Job Parameters section for detailed usage.

IrisParseJobParams

Parameters for the Iris OCR engine. Import: from dex_sdk.types import IrisParseJobParams See the Parse Job Parameters section for detailed usage.

ReductoChunkingMethod

Enum of chunking methods for Reducto parser. Import: from dex_sdk.types import ReductoChunkingMethod Values:

DISABLED = “disabled”
BLOCK = “block”
PAGE = “page”
PAGE_SECTIONS = “page_sections”
SECTION = “section”
VARIABLE = “variable”

ReductoChunkingOptions

Chunking configuration for Reducto parser. Import: from dex_sdk.types import ReductoChunkingOptions Fields:

chunk_mode (ReductoChunkingMethod): Chunking method
chunk_size (int | None): Custom chunk size

ReductoParseEngineOptions

Options for Reducto parser. Import: from dex_sdk.types import ReductoParseEngineOptions Fields:

chunking (ReductoChunkingOptions | None): Chunking configuration

IrisParseEngineOptions

Options for Iris parser. Import: from dex_sdk.types import IrisParseEngineOptions Fields:

layout (str | None): Layout detection model
text_ocr (str | None): Text OCR model
table_ocr (str | None): Table OCR model
text_prompt (str | None): Custom prompt for text extraction
table_prompt (str | None): Custom prompt for table extraction
left_to_right (bool | None): Sort regions left-to-right
confidence_threshold (float | None): Minimum confidence threshold
containment_threshold (float | None): Containment threshold for filtering

Vector Store Types

VectorStoreEngines

Enum of available vector store engines. Import: from dex_sdk.types import VectorStoreEngines Values:

SGP_KNOWLEDGE_BASE = “sgp_knowledge_base”

VectorStoreSearchResult

Result from vector store search operations. Import: from dex_sdk.types import VectorStoreSearchResult

Response Entity Types

These types are returned by SDK methods and accessed via the .data attribute on wrapper objects. You typically don’t need to import these directly.

Working with Response Data

When you call SDK methods, you receive wrapper objects with a .data attribute:

# Create a project
project = await dex_client.create_project(name="My Project")
print(f"Project: {project.data.name} ({project.data.id}) created at {project.data.created_at}")

# Upload a file
dex_file = await project.upload_file("document.pdf")
print(f"File: {dex_file.data.filename} ({dex_file.data.size_bytes} bytes) → {dex_file.data.id}")

# Parse a file
parse_result = await dex_file.parse(parse_params)
metadata = parse_result.data.parse_metadata
print(f"Parsed: {metadata.pages_processed} pages with {parse_result.data.engine} → {parse_result.data.id}")

Common Response Entity Fields

ProjectEntity (accessed via project.data):

id (str): Project ID with proj_ prefix
name (str): Project readable name
status (str): Project status ("active" or "archived")
configuration (ProjectConfiguration | None): Project configuration
created_at (datetime): When the project was created
archived_at (datetime | None): When the project was archived

FileEntity (accessed via dex_file.data):

id (str): File ID with file_ prefix
project_id (str): Project ID that the file belongs to
filename (str): Original filename
size_bytes (int): File size in bytes
mime_type (str): MIME type of the file
status (str): Current file status
created_at (datetime): When the file was uploaded

ParseResultEntity (accessed via parse_result.data):

id (str): Parse result ID with pres_ prefix
project_id (str): Project ID
source_document_id (str): Source document ID that was parsed
engine (str): Engine used for parsing
parse_metadata (object): Metadata including filename, pages_processed
content (object): Parsed content with chunks
created_at (datetime): When the parse result was created

ExtractionEntity (accessed via extract_result or in extraction results):

id (str): Extraction result ID
source_id (str): Source ID that was extracted from
result (object): The extraction result with data and usage_info
parameters (ExtractionParameters): Parameters used for extraction
created_at (datetime): When the extraction was completed
processing_time_ms (int | None): Processing time in milliseconds

VectorStoreEntity (accessed via vector_store.data):

id (str): Vector store ID with vs_ prefix
project_id (str): Project ID
name (str): Name of the vector store
engine (str): Engine used for vector store
created_at (datetime): When the vector store was created

Deprecated Types

The following types are deprecated as of version 0.3.2 and should no longer be used:

ProjectCredentials - No longer used; credentials are passed to DexClient constructor
SGPCredentials - No longer used; credentials are passed to DexClient constructor

See the Changelog for migration instructions.

Error Handling

The SDK raises exceptions for various error conditions. For detailed troubleshooting guidance, see the Troubleshooting Guide.

from dex_sdk.exceptions import DexException

try:
    parse_result = await dex_file.parse(...)
except DexException as e:
    print(f"Error: {e}")

Async/Await Pattern

The Dex SDK is fully async. Use await with all SDK methods:

import asyncio
import os

async def main():
    # Initialize client with credentials
    dex_client = DexClient(
        base_url="https://dex.sgp.scale.com",
        api_key=os.getenv("SGP_API_KEY"),
        account_id=os.getenv("SGP_ACCOUNT_ID"),
    )

    project = await dex_client.create_project(name="My Project")
    dex_file = await project.upload_file("document.pdf")
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)

# Run in Jupyter/IPython
await main()

# Run in regular Python script
asyncio.run(main())

Getting Started

Document Understanding

OCR

Workflows

​DexClient

​Project Management

​Project

​File Operations

​Vector Store Operations

​DexFile

​Parsing

​Working with Parse Results

​ParseResult

​Extraction

​Working with Extraction Results

​VectorStore

​Indexing

​Search

​Extraction

​Parse Job Parameters

​Reducto Parse Parameters

​Iris Parse Parameters

​Common Types

​Type Categories

​Configuration Types

​ProjectConfiguration

​RetentionPolicy

​ExtractionParameters

​Parse Configuration Types

​ParseEngine

​ReductoParseJobParams

​IrisParseJobParams

​ReductoChunkingMethod

​ReductoChunkingOptions

​ReductoParseEngineOptions

​IrisParseEngineOptions

​Vector Store Types

​VectorStoreEngines

​VectorStoreSearchResult

​Response Entity Types

​Working with Response Data

​Common Response Entity Fields

​Deprecated Types

​Error Handling

​Async/Await Pattern

​See Also

DexClient

Project Management

Project

File Operations

Vector Store Operations

DexFile

Parsing

Working with Parse Results

ParseResult

Extraction

Working with Extraction Results

VectorStore

Indexing

Search

Extraction

Parse Job Parameters

Reducto Parse Parameters

Iris Parse Parameters

Common Types

Type Categories

Configuration Types

ProjectConfiguration

RetentionPolicy

ExtractionParameters

Parse Configuration Types

ParseEngine

ReductoParseJobParams

IrisParseJobParams

ReductoChunkingMethod

ReductoChunkingOptions

ReductoParseEngineOptions

IrisParseEngineOptions

Vector Store Types

VectorStoreEngines

VectorStoreSearchResult

Response Entity Types

Working with Response Data

Common Response Entity Fields

Deprecated Types

Error Handling

Async/Await Pattern

See Also