Skip to main content
The QueryAST (Query Abstract Syntax Tree) is a SQL-like query language for aggregating and filtering evaluation data in dashboard widgets. This guide covers the complete query syntax and advanced features.

Query Basics

When Are Queries Required?

Queries are required for data-driven widgets:
  • METRIC: Requires a MetricQuery (single aggregation)
  • TABLE: Requires a SeriesQuery (multiple rows/columns)
  • BAR: Requires a SeriesQuery (grouped data)
  • HISTOGRAM: Requires a SeriesQuery (distribution data)
Queries are NOT needed for:
  • MARKDOWN: Static content only
  • HEADING: Text display only

Data Sources

QueryAST can access two data sources for each evaluation item: data.* - User-defined data
  • Custom fields you provided when creating the evaluation
  • Examples: data.input, data.expected_output, data.category, data.score
task_result_cache.* - Task execution results
  • Results from evaluation tasks (chat completions, agent runs, etc.)
  • Examples: task_result_cache.llm_score, task_result_cache.latency_ms, task_result_cache.metadata.confidence
Both data sources support nested fields using dot notation. For example, data.metadata.model_version or task_result_cache.output.reasoning.

Query Structure Overview

A query is a JSON object defining what data to select, how to aggregate it, and what filters to apply:
{
  "select": [...],      // Columns and aggregations to return
  "filter": {...},      // Optional WHERE clause conditions
  "groupBy": [...],     // Optional GROUP BY columns
  "orderBy": [...],     // Optional sorting
  "limit": 100          // Optional row limit
}

Query Types

MetricQuery

Returns a single scalar value. Used exclusively for METRIC widgets. Structure:
{
  "select": [
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "AVG|SUM|COUNT|...",
        "column": "column_name",
        "source": "data|task_result_cache"
      },
      "alias": "optional_alias"
    }
  ],
  "filter": { /* optional */ }
}
Constraints:
  • Select clause must have exactly one item
  • That item must be an AGGREGATION (not a raw COLUMN)
  • No GROUP BY, ORDER BY, or LIMIT allowed

Example: Average Score

Example:
{
  "select": [
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "AVG",
        "column": "score",
        "source": "data"
      },
      "alias": "avg_score"
    }
  ],
  "filter": {
    "conditions": [
      {
        "column": "category",
        "source": "data",
        "operator": "=",
        "value": "test"
      }
    ]
  }
}
SQL Equivalent:
SELECT AVG(data->>'$.score') AS avg_score
FROM evaluation_items
WHERE data->>'$.category' = 'test'

SeriesQuery

Returns multiple rows and columns. Used for TABLE, BAR, and HISTOGRAM widgets. Structure:
{
  "select": [
    // Mix of columns and aggregations
  ],
  "filter": { /* optional */ },
  "groupBy": [ /* optional column names */ ],
  "orderBy": [ /* optional sort specs */ ],
  "limit": 100 // optional
}
Constraints:
  • Select clause must have at least one item
  • Can mix COLUMN and AGGREGATION expressions
  • When using GROUP BY, non-aggregated columns must be in the groupBy list

Example: Performance by Category

Example:
{
  "select": [
    {
      "expression": {
        "type": "COLUMN",
        "column": "category",
        "source": "data"
      }
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "AVG",
        "column": "score",
        "source": "data"
      },
      "alias": "avg_score"
    }
  ],
  "groupBy": ["category"],
  "orderBy": [
    {
      "column": "avg_score",
      "direction": "DESC"
    }
  ],
  "limit": 10
}
SQL Equivalent:
SELECT 
  data->>'$.category' AS category,
  AVG(data->>'$.score') AS avg_score
FROM evaluation_items
GROUP BY data->>'$.category'
ORDER BY avg_score DESC
LIMIT 10

SELECT Clause

The SELECT clause defines what columns or aggregations to return.

Column Expression

Reference a raw column from the data without aggregation.
{
  "expression": {
    "type": "COLUMN",
    "column": "category",
    "source": "data"
  },
  "alias": "category_name"  // Optional alias
}
Fields:
  • type: Always “COLUMN”
  • column: Name of the column to select
  • source: “data” or “task_result_cache” (optional, defaults to “data”)
  • alias: Optional name for the result column

Aggregation Expression

Apply an aggregation function to compute a derived value.
{
  "expression": {
    "type": "AGGREGATION",
    "function": "AVG",
    "column": "score",
    "source": "data",
    "params": {}  // Optional parameters for certain functions
  },
  "alias": "average_score"
}
Fields:
  • type: Always “AGGREGATION”
  • function: One of the supported aggregation functions (see below)
  • column: Column to aggregate, or "*" for COUNT(*)
  • source: “data” or “task_result_cache” (optional)
  • params: Function-specific parameters (for PERCENTILE and PERCENTAGE)
  • alias: Optional name for the result

Aggregation Functions

COUNT

Count the number of rows.
{
  "type": "AGGREGATION",
  "function": "COUNT",
  "column": "*"
}
Use Cases:
  • Total number of items
  • Items per category (with GROUP BY)
  • Items matching filter criteria

SUM

Sum all values in a column.
{
  "type": "AGGREGATION",
  "function": "SUM",
  "column": "cost",
  "source": "task_result_cache"
}
Use Cases:
  • Total cost across all items
  • Total tokens used
  • Cumulative metrics

AVG

Calculate the mean value.
{
  "type": "AGGREGATION",
  "function": "AVG",
  "column": "score",
  "source": "data"
}
Use Cases:
  • Average score
  • Mean latency
  • Average confidence

MIN / MAX

Find the minimum or maximum value.
{
  "type": "AGGREGATION",
  "function": "MAX",
  "column": "latency_ms",
  "source": "task_result_cache"
}
Use Cases:
  • Worst-case latency
  • Best score achieved
  • Range calculations

STDDEV / VARIANCE

Calculate standard deviation or variance.
{
  "type": "AGGREGATION",
  "function": "STDDEV",
  "column": "score",
  "source": "data"
}
Use Cases:
  • Measure score consistency
  • Identify high-variance categories
  • Quality control metrics

PERCENTILE

Calculate a specific percentile value.
{
  "type": "AGGREGATION",
  "function": "PERCENTILE",
  "column": "latency_ms",
  "source": "task_result_cache",
  "params": {
    "percentile": 95
  }
}
Parameters:
  • percentile: Number between 0 and 100 (required)
Use Cases:
  • P95/P99 latency tracking
  • Tail performance analysis
  • SLA monitoring

COUNT_DISTINCT

Count unique values in a column.
{
  "type": "AGGREGATION",
  "function": "COUNT_DISTINCT",
  "column": "user_id",
  "source": "data"
}
Use Cases:
  • Number of unique users
  • Distinct categories present
  • Cardinality checks

PERCENTAGE

Calculate percentage of rows matching filter criteria.
{
  "type": "AGGREGATION",
  "function": "PERCENTAGE",
  "column": "*",
  "params": {
    "percentage_filters": {
      "conditions": [
        {
          "column": "score",
          "source": "data",
          "operator": ">=",
          "value": 0.7
        }
      ]
    }
  }
}
Parameters:
  • percentage_filters: Filter object defining which rows to count (required)
Use Cases:
  • Pass/fail rates
  • Percentage above threshold
  • Compliance metrics

Filter Clause (WHERE)

Filter clauses narrow down which rows to include in computations.

Basic Filter

Single condition:
{
  "filter": {
    "conditions": [
      {
        "column": "category",
        "source": "data",
        "operator": "=",
        "value": "quality"
      }
    ]
  }
}

Multiple Conditions

Combine conditions with logical operators:
{
  "filter": {
    "conditions": [
      {
        "column": "score",
        "source": "data",
        "operator": ">=",
        "value": 0.5
      },
      {
        "column": "category",
        "source": "data",
        "operator": "=",
        "value": "production"
      }
    ],
    "logicalOperators": ["AND"]
  }
}
The number of logical operators must be exactly len(conditions) - 1. For 3 conditions, you need 2 operators.

Comparison Operators

OperatorDescriptionExample Value
=Equals"test" or 42
!=Not equals"draft"
>Greater than0.5
<Less than100
>=Greater or equal0.7
<=Less or equal1.0
INIn list["cat1", "cat2"]
NOT INNot in list["archived", "deleted"]
LIKESubstring match"%error%"
NOT LIKENot substring match"%deprecated%"

Complex Filter Example

Combine multiple conditions with AND/OR logic:
{
  "filter": {
    "conditions": [
      {
        "column": "score",
        "source": "data",
        "operator": ">=",
        "value": 0.7
      },
      {
        "column": "category",
        "source": "data",
        "operator": "IN",
        "value": ["quality", "accuracy"]
      },
      {
        "column": "status",
        "source": "data",
        "operator": "!=",
        "value": "archived"
      }
    ],
    "logicalOperators": ["AND", "AND"]
  }
}
This evaluates as: (score >= 0.7) AND (category IN ['quality', 'accuracy']) AND (status != 'archived')
Logical operators are evaluated left-to-right without precedence. The expression A AND B OR C evaluates as ((A AND B) OR C).

GROUP BY Clause

Group rows by one or more columns before applying aggregations.

Single Column Grouping

{
  "select": [
    {
      "expression": {"type": "COLUMN", "column": "category", "source": "data"}
    },
    {
      "expression": {"type": "AGGREGATION", "function": "COUNT", "column": "*"},
      "alias": "count"
    }
  ],
  "groupBy": ["category"]
}
SQL Equivalent:
SELECT 
  data->>'$.category' AS category,
  COUNT(*) AS count
FROM evaluation_items
GROUP BY data->>'$.category'

Multi-Column Grouping

Group by multiple dimensions:
{
  "select": [
    {
      "expression": {"type": "COLUMN", "column": "model", "source": "data"}
    },
    {
      "expression": {"type": "COLUMN", "column": "category", "source": "data"}
    },
    {
      "expression": {"type": "AGGREGATION", "function": "AVG", "column": "score", "source": "data"},
      "alias": "avg_score"
    }
  ],
  "groupBy": ["model", "category"]
}
SQL Equivalent:
SELECT 
  data->>'$.model' AS model,
  data->>'$.category' AS category,
  AVG(data->>'$.score') AS avg_score
FROM evaluation_items
GROUP BY data->>'$.model', data->>'$.category'
When using GROUP BY, all non-aggregated columns in SELECT must appear in the groupBy array.

ORDER BY Clause

Sort results by one or more columns.

Basic Sorting

{
  "orderBy": [
    {
      "column": "avg_score",
      "direction": "DESC"
    }
  ]
}
Directions:
  • ASC: Ascending (smallest to largest)
  • DESC: Descending (largest to smallest)

Multi-Column Sorting

Sort by multiple columns with different directions:
{
  "orderBy": [
    {
      "column": "category",
      "direction": "ASC"
    },
    {
      "column": "avg_score",
      "direction": "DESC"
    }
  ]
}
This sorts by category alphabetically, then by avg_score descending within each category.

LIMIT Clause

Restrict the number of rows returned.
{
  "limit": 20
}
Use Cases:
  • Top-N queries (e.g., “Top 10 Items”)
  • Prevent overwhelming table displays
  • Performance optimization for large datasets
Example: Top 5 Categories
{
  "select": [
    {"expression": {"type": "COLUMN", "column": "category", "source": "data"}},
    {"expression": {"type": "AGGREGATION", "function": "AVG", "column": "score", "source": "data"}, "alias": "avg_score"}
  ],
  "groupBy": ["category"],
  "orderBy": [{"column": "avg_score", "direction": "DESC"}],
  "limit": 5
}

Advanced Examples

Example 1: Multi-Dimensional Performance Table

Break down performance by model and category with multiple statistics.
{
  "select": [
    {
      "expression": {
        "type": "COLUMN",
        "column": "model",
        "source": "data"
      }
    },
    {
      "expression": {
        "type": "COLUMN",
        "column": "category",
        "source": "data"
      }
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "COUNT",
        "column": "*"
      },
      "alias": "count"
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "AVG",
        "column": "score",
        "source": "data"
      },
      "alias": "avg_score"
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "STDDEV",
        "column": "score",
        "source": "data"
      },
      "alias": "stddev_score"
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "MIN",
        "column": "score",
        "source": "data"
      },
      "alias": "min_score"
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "MAX",
        "column": "score",
        "source": "data"
      },
      "alias": "max_score"
    }
  ],
  "groupBy": ["model", "category"],
  "orderBy": [
    {
      "column": "model",
      "direction": "ASC"
    },
    {
      "column": "avg_score",
      "direction": "DESC"
    }
  ]
}

Example 2: Filtered Percentile Calculation

Calculate P95 score for only production items.
{
  "select": [
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "PERCENTILE",
        "column": "latency_ms",
        "source": "task_result_cache",
        "params": {
          "percentile": 95
        }
      },
      "alias": "p95_latency"
    }
  ],
  "filter": {
    "conditions": [
      {
        "column": "environment",
        "source": "data",
        "operator": "=",
        "value": "production"
      },
      {
        "column": "latency_ms",
        "source": "task_result_cache",
        "operator": ">",
        "value": 0
      }
    ],
    "logicalOperators": ["AND"]
  }
}

Example 3: Complex Filter with IN and LIKE

Find items in specific categories containing certain text.
{
  "select": [
    {
      "expression": {"type": "COLUMN", "column": "input", "source": "data"}
    },
    {
      "expression": {"type": "COLUMN", "column": "category", "source": "data"}
    },
    {
      "expression": {"type": "COLUMN", "column": "score", "source": "data"}
    }
  ],
  "filter": {
    "conditions": [
      {
        "column": "category",
        "source": "data",
        "operator": "IN",
        "value": ["quality", "accuracy", "helpfulness"]
      },
      {
        "column": "input",
        "source": "data",
        "operator": "LIKE",
        "value": "%customer support%"
      }
    ],
    "logicalOperators": ["AND"]
  },
  "orderBy": [
    {
      "column": "score",
      "direction": "DESC"
    }
  ],
  "limit": 20
}

Example 4: Pass/Fail Rate by Model

Calculate success rates grouped by model.
{
  "select": [
    {
      "expression": {"type": "COLUMN", "column": "model", "source": "data"}
    },
    {
      "expression": {"type": "AGGREGATION", "function": "COUNT", "column": "*"},
      "alias": "total"
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "PERCENTAGE",
        "column": "*",
        "params": {
          "percentage_filters": {
            "conditions": [
              {
                "column": "score",
                "source": "data",
                "operator": ">=",
                "value": 0.7
              }
            ]
          }
        }
      },
      "alias": "pass_rate"
    }
  ],
  "groupBy": ["model"],
  "orderBy": [
    {
      "column": "pass_rate",
      "direction": "DESC"
    }
  ]
}

Example 5: Nested Data Access

Query nested fields using dot notation.
{
  "select": [
    {
      "expression": {
        "type": "COLUMN",
        "column": "metadata.model_version",
        "source": "data"
      }
    },
    {
      "expression": {
        "type": "AGGREGATION",
        "function": "AVG",
        "column": "output.confidence",
        "source": "task_result_cache"
      },
      "alias": "avg_confidence"
    }
  ],
  "groupBy": ["metadata.model_version"]
}
This accesses:
  • data.metadata.model_version - nested field in user data
  • task_result_cache.output.confidence - nested field in task results

Debugging Failed Queries

If your widget shows computation_status: "failed", check:
  1. Column names: Ensure columns exist in your evaluation data
  2. Source specification: Verify correct source (“data” vs “task_result_cache”)
  3. GROUP BY alignment: All non-aggregated SELECT columns must be in groupBy
  4. Operator/value types: Match operator to value type (IN requires list, LIKE requires string)
  5. Required params: PERCENTILE needs percentile param, PERCENTAGE needs percentage_filters
Check the error_message field in the widget result for specific error details.

Next Steps