Skip to main content
This guide walks you through creating your first evaluation dashboard from scratch, adding widgets, and organizing your layout.

Using Sample Data

To follow along with this tutorial, you can use our sample evaluation dataset containing 40 agent evaluation items with realistic scores and metadata. Upload the sample data to your account and create an evaluation with it. Download Sample Data:

Download sample-evaluation.csv

Sample dataset with 40 evaluation items across 4 agents (GPT-4, Claude-3, Gemini-Pro, Llama-3)
Data Structure: The sample data contains evaluation items with this structure:
{
  "id": "eval_001",
  "agent_name": "GPT-4-Turbo-Agent",
  "agent_version": "1.0",
  "judged_evaluation": {
    "overall_score": 87,
    "accuracy_score": 92,
    "relevance_score": 85,
    "coherence_score": 89,
    "helpfulness_score": 84,
    "fluency_score": 91
  },
  "timestamp": "2026-01-15T10:30:00Z",
  "task_type": "question_answering",
  "prompt_category": "technical",
  "response_length": 256,
  "model_temperature": 0.7
}
Key Fields:
  • agent_name: Model being evaluated (GPT-4-Turbo-Agent, Claude-3-Sonnet-Agent, etc.)
  • *: Nested scores (overall_score, accuracy_score, relevance_score, coherence_score, helpfulness_score, fluency_score)
  • task_type: Type of task (question_answering, summarization, code_generation, analysis, translation, creative_writing)
  • prompt_category: Category (technical, general, business, language, creative)
How to Use:
  1. Download the JSON file
  2. Create a new evaluation via the API or SDK:
from scale_gp_beta import SGPClient
import json

# Using api.dev-gp.scale.com
client = SGPClient(
    api_key="your-api-key",
    account_id="your-account-id",
    environment="development"
)

# Load sample data
with open('sample-evaluation.csv', 'r') as f:
    sample_items = json.load(f)

# Create evaluation with sample data
evaluation = client.evaluations.create(
    name="Agent Performance Comparison",
    data=sample_items
)

print(f"Created evaluation: {evaluation.id}")
The client uses the environment parameter to connect to different Scale GP deployments. Available options: "production", "production-multitenant", "development", "staging", "local". For custom endpoints, use base_url instead.
  1. Follow the rest of this guide to create dashboards and widgets using this evaluation
The examples throughout this guide reference fields from this sample dataset. If using your own data, adjust the column names accordingly.

Prerequisites

Before creating a dashboard, you need either:
  • An existing evaluation with completed results, OR
  • An evaluation group containing multiple evaluations (Coming soon)
If you don’t have an evaluation yet, see Next Gen Evaluation Getting Started to create one.

Step 1: Create a New Dashboard

Via the UI

  1. Navigate your version of SGP (Dev SGP)
  2. Make sure the evaluation-dashboards-enabled feature flag is enabled for your account
    1. (Instructions to enable the feature flag)
  3. Click the “Dashboards” tab
  4. Click the “New Dashboard” button
New Dashboard Button
  1. Fill in the dashboard details:
    • Name: Give your dashboard a descriptive name (e.g., “Model Performance Overview”)
    • Description: Optional description explaining the dashboard’s purpose
    • Tags: Optional tags for organization and filtering
    • Evaluation: Select the evaluation you want to create a dashboard for
  2. Click “Create” to save your dashboard
New Dashboard Form

Via the SDK

from scale_gp_beta import SGPClient

client = SGPClient(
    api_key="your-api-key",
    account_id="your-account-id",
    environment="development"
)

# Create dashboard for a single evaluation
dashboard = client.evaluation_dashboards.create(
    name="Demo Dashboard",
    description="A demo dashboard for the demo evaluation",
    evaluation_id="eval-123",
    tags=["demo", "documentation"]
)

Step 2: Add Your First Widget (Metric)

Let’s add a metric widget to display the average score across all evaluation items.

Via the UI

  1. From your dashboard page, click “Add Widget”
  2. Select “Query Value” as the widget type
  3. Configure the widget:
    • Title: “Average Score”
    • Query: Select the average of the “score” column
  4. Click “Add”
Average Score Widget Form
Average Score Widget Result
Widget results are automatically computed when you create or update a widget. The response includes both the widget configuration and the computed result.

Via the API

# Add a metric widget showing average score
widget = client.evaluation_dashboards.widgets.create(
    dashboard_id=dashboard.id,
    title="Average Score",
    type="metric",
    query={
        "select": [
            {
                "expression": {
                    "type": "AGGREGATION",
                    "function": "AVG",
                    "column": "overall_score",
                    "source": "data"
                }
            }
        ]
    }
)

print(f"Computed result: {widget.result.computed_result}")
# Output: {'type': 'metric', 'data': 0.873}

Step 3: Add a Chart Widget (Bar Chart)

Now let’s add a bar chart to show score distribution across different models.

Via the UI

  1. Click “Add Widget” again
  2. Select “Bar Chart” as the widget type
  3. Configure the widget:
    • Title: “Score by Agent”
    • Group By: Select “agent_name”
    • Under the Advanced Options
      • Add an aggregation, select “Average” on “overall_score”
  4. Click “Add”
Score by Agent Bar Chart Form
Score by Agent Bar Chart Result

Via the SDK

# Add a bar chart widget showing average score by category
widget = client.evaluation_dashboards.widgets.create(
    dashboard_id=dashboard.id,
    title="Score by Agent",
    type="bar",
    query={
        "select": [
            {
                "expression": {
                    "type": "COLUMN",
                    "column": "agent_name",
                    "source": "data"
                }
            },
            {
                "expression": {
                    "type": "AGGREGATION",
                    "function": "AVG",
                    "column": "overall_score",
                    "source": "data"
                }
            }
        ],
        "groupBy": ["agent_name"]
    },
    config={
        "x_column": "agent_name"
    }
)

Step 4: Add Section Headers

Use heading widgets to organize your dashboard into logical sections.

Via the UI

  1. Click “Add Widget”
  2. Select “Heading” as the widget type
  3. Configure the widget:
    • Title: “Graphs”
  4. Click “Add”
Header Widget Form
Header Widget Result

Via the SDK

# Add a heading widget
heading = client.evaluation_dashboards.widgets.create(
    dashboard_id=dashboard.id,
    title="Graphs",
    type="heading"
)

Step 5: Organize and Configure Layout

Reorder Widgets

Arrange widgets in your preferred order by dragging and dropping in the UI, or update the widget order via the API:
# Reorder widgets - first widget appears at top
client.evaluation_dashboards.update(
    dashboard_id=dashboard.id,
    widget_order=[heading.id, widget1.id, widget2.id]
)

Next Steps