Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
Convert documents to structured format for extraction. Dex supports multiple parse engines and async job monitoring.
Parse Document (Default)
parse_result = await dex_file.parse(
IrisParseJobParams(
engine=ParseEngine.IRIS,
options=IrisParseEngineOptions(
layout="rt_detr_bce",
text_ocr="openai/gpt-5.4",
table_ocr="openai/gpt-5.4",
confidence_threshold=0.5,
img_method="description",
),
)
)
Which OCR Engine?
See When to choose Iris? for custom OCR needs.
Async Job Monitoring
New in v0.4.0: Use start_parse_job for better control over async operations and access to SGP traces.
Monitor Parse Job
import asyncio
from dex_sdk.types import (
ParseEngine,
IrisParseJobParams,
IrisParseEngineOptions,
)
# Start a parse job (returns immediately)
parse_job = await project.start_parse_job(
dex_file=dex_file,
parameters=IrisParseJobParams(
engine=ParseEngine.IRIS,
options=IrisParseEngineOptions(
layout="rt_detr_bce"
)
)
)
# Monitor job progress
while parse_job.data.status not in [JobStatus.SUCCEEDED, JobStatus.FAILED]:
await asyncio.sleep(1)
await parse_job.refresh()
print(f"Job status: {parse_job.data.status}")
# Get result
if parse_job.data.status == JobStatus.SUCCEEDED:
parse_result = await parse_job.get_result()
print("Parse completed successfully")
else:
print(f"Parse failed: {parse_job.data.error_message}")
Retrieving SGP Traces for Debugging
from scale_gp_beta import SGPClient
sgp_client = SGPClient(
api_key=os.getenv("SGP_API_KEY"),
account_id=os.getenv("SGP_ACCOUNT_ID"),
)
# Search for job traces
spans = list(sgp_client.spans.search(
sort_by="created_at",
sort_order="desc",
extra_metadata={"job_id": parse_job.data.id},
parents_only=True,
))
if spans:
trace_id = spans[0].trace_id
all_spans = list(sgp_client.spans.search(trace_ids=[trace_id]))
for span in all_spans:
print(f"Span: {span.name}, Duration: {span.duration_ms}ms")
Process Multiple Files
import asyncio
from dex_sdk.types import (
ParseEngine,
IrisParseJobParams,
IrisParseEngineOptions,
)
# Parse all files in parallel
parse_tasks = [
dex_file.parse(
IrisParseJobParams(
engine=ParseEngine.IRIS,
options=IrisParseEngineOptions(
layout="rt_detr_bce", # native region-level chunks (IRIS v2 default)
),
)
)
for dex_file in dex_files
]
parse_results = await asyncio.gather(*parse_tasks)
print(f"Parsed {len(parse_results)} documents")
Multi-Language Support
Dex supports 35+ languages with automatic language detection.
Supported Languages
Germanic languages have excellent support. Additional 35+ languages include:
- European: Spanish, French, German, Italian, Portuguese, Dutch, Swedish, Norwegian, Danish, Polish, Russian, Ukrainian
- Asian: Chinese, Japanese, Korean, Thai, Vietnamese, Khmer, Lao
- Middle Eastern: Arabic, Hebrew, Persian, Turkish
- Indian: Hindi, Bengali, Tamil, Telugu, Malayalam, Kannada, Gujarati, Marathi, Punjabi
- And many more…
See the Introduction guide for the complete list.
Next Steps
- Chunking: Choose chunking strategies for your documents
- Extract: Extract structured data from parse results
- Vector Stores: Use vector stores for large documents