Overview
Dex is Scale’s document understanding capability that provides composable primitives for:- File Management - Secure file upload, storage, and retrieval with fine-grained access control
- Document Parsing - Convert any document (PDFs, DOCX, images, etc.) into structured JSON format with multiple OCR engines
- Vector Stores - Index and search parsed documents with semantic embeddings
- Data Extraction - Extract specific information using custom schemas, prompts, and RAG-enhanced context
- Project Management - Organize and isolate data with proper credential management and authorization
Prerequisites
Before using Dex, ensure you have:- ✅ A valid Scale account with SGP (Scale General Platform) access
- ✅ Your SGP account ID and API key set as environment variables:
- ✅ Python 3.8+ installed
- ✅ Dex SDK installed (see Installation section)
Installation
Install Dex SDK from CodeArtifact
With access to Scale CodeArtifact, install the Dex SDK (version 0.3.2 or higher):Note: Version 0.3.2 introduces a new authentication method. Ensure you update to this version or higher. See the Changelog for details.
Quick Start
1. Initialize Dex Client
2. Create a Project
Projects isolate your data and credentials for tracing, billing, and SGP model calls. Every operation is tied to a project.Tip: Keep one project per use case or group of related files for clean traceability.
3. Upload a Document
Upload your document to the project. Dex supports PDFs, images, spreadsheets, and more.4. Parse the Document
Parse converts your document into a structured format with text, tables, and layout information.Note: Parsing is asynchronous. The SDK automatically polls for completion.
5. Extract Structured Data
Define a schema and extract specific information from your document.Complete Example
Here’s a complete working example you can copy and run:Next Steps
Now that you’ve completed the basics, explore these topics:Learn Advanced Features
- Advanced Features Guide: Vector stores, chunking strategies, batch processing, and more
- Quick Reference: Cheat sheet for common patterns and imports
Deep Dive into the API
- API Reference: Complete SDK documentation with all methods and types
- Troubleshooting Guide: Common issues and solutions
- Changelog: Latest updates and breaking changes
Additional Resources
- REST API: For non-Python integrations, see the REST API Reference
- Support: Questions? Ask in Slack channel
#dex - Examples: More examples in the Introduction guide
Common Questions
Q: How do I process multiple documents? A: Upload multiple files to the same project, parse each one, then optionally use vector stores for cross-document search. See Advanced Features. Q: Can I use a synchronous client? A: Yes! UseDexSyncClient from dex_sdk for synchronous operations. See Advanced Features.
Q: How do I configure data retention policies?
A: Set retention policies when creating a project. See Advanced Features.
Q: What OCR engines are available?
A: Reducto (for English and Latin scripts) and Iris (for non-English, non-Latin scripts like Arabic, Hebrew, CJK, Indic languages). See API Reference for details.
