Standard SGP evaluations assume that your AI application has one input and one output, optionally with some retrieval chunks attached. This kind of evaluation fits chatbots and retrieval augmented generation (RAG) applications well. However, as our engineers evaluated agents, we found that complex AI applications often have more than one input or output, or will have multiple intermediate steps that need to be evaluated independently. We built Flexible Evaluations from our learnings to provide a powerful and customizable way of evaluating agents and other complex AI applications. Flexible evaluations enable you to:Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
- Evaluate applications with multiple inputs and outputs.
- Evaluate applications that have multiple steps, and enable users to evaluate each step of the application.
- Surface the exactly the right data to human evaluators, so you can increase evaluation velocity.

