Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt

Use this file to discover all available pages before exploring further.

Convert documents to structured format for extraction. Dex supports multiple parse engines and async job monitoring.

Parse Document (Default)

parse_result = await dex_file.parse(
    IrisParseJobParams(
        engine=ParseEngine.IRIS,
        options=IrisParseEngineOptions(
            layout="rt_detr_bce",
            text_ocr="openai/gpt-5.4",
            table_ocr="openai/gpt-5.4",
            confidence_threshold=0.5,
            img_method="description",
        ),
    )
)

Which OCR Engine?

See When to choose Iris? for custom OCR needs.

Async Job Monitoring

New in v0.4.0: Use start_parse_job for better control over async operations and access to SGP traces.

Monitor Parse Job

import asyncio
from dex_sdk.types import (
    ParseEngine,
    IrisParseJobParams,
    IrisParseEngineOptions,
)

# Start a parse job (returns immediately)
parse_job = await project.start_parse_job(
    dex_file=dex_file,
    parameters=IrisParseJobParams(
        engine=ParseEngine.IRIS,
        options=IrisParseEngineOptions(
            layout="rt_detr_bce"
        )
    )
)

# Monitor job progress
while parse_job.data.status not in [JobStatus.SUCCEEDED, JobStatus.FAILED]:
    await asyncio.sleep(1)
    await parse_job.refresh()
    print(f"Job status: {parse_job.data.status}")

# Get result
if parse_job.data.status == JobStatus.SUCCEEDED:
    parse_result = await parse_job.get_result()
    print("Parse completed successfully")
else:
    print(f"Parse failed: {parse_job.data.error_message}")

Retrieving SGP Traces for Debugging

from scale_gp_beta import SGPClient

sgp_client = SGPClient(
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Search for job traces
spans = list(sgp_client.spans.search(
    sort_by="created_at",
    sort_order="desc",
    extra_metadata={"job_id": parse_job.data.id},
    parents_only=True,
))

if spans:
    trace_id = spans[0].trace_id
    all_spans = list(sgp_client.spans.search(trace_ids=[trace_id]))
    for span in all_spans:
        print(f"Span: {span.name}, Duration: {span.duration_ms}ms")

Process Multiple Files

import asyncio
from dex_sdk.types import (
    ParseEngine,
    IrisParseJobParams,
    IrisParseEngineOptions,
)

# Parse all files in parallel
parse_tasks = [
    dex_file.parse(
        IrisParseJobParams(
            engine=ParseEngine.IRIS,
            options=IrisParseEngineOptions(
                layout="rt_detr_bce",  # native region-level chunks (IRIS v2 default)
            ),
        )
    )
    for dex_file in dex_files
]
parse_results = await asyncio.gather(*parse_tasks)
print(f"Parsed {len(parse_results)} documents")

Multi-Language Support

Dex supports 35+ languages with automatic language detection.

Supported Languages

Germanic languages have excellent support. Additional 35+ languages include:
  • European: Spanish, French, German, Italian, Portuguese, Dutch, Swedish, Norwegian, Danish, Polish, Russian, Ukrainian
  • Asian: Chinese, Japanese, Korean, Thai, Vietnamese, Khmer, Lao
  • Middle Eastern: Arabic, Hebrew, Persian, Turkish
  • Indian: Hindi, Bengali, Tamil, Telugu, Malayalam, Kannada, Gujarati, Marathi, Punjabi
  • And many more…
See the Introduction guide for the complete list.

Next Steps

  • Chunking: Choose chunking strategies for your documents
  • Extract: Extract structured data from parse results
  • Vector Stores: Use vector stores for large documents