Available Metrics

When creating an evaluation run, these five metrics are available out of the box:

Bleu - Measure quality of translation
Rouge - Measure quality of summary or translation
Meteor - Measure quality of translation using semantic matching
Cosine Similarity - Assess similarity by measuring their distance in vector space
F1 Score - Measure token-level precision and recall

The available fields to compare for the metrics are defined by the schema of the dataset. For example, summarization datasets will have document, summary, expected_summary as choices for comparison.

Bleu

Library: nltk.sentence_bleu Non-configurable parameters:

weights - 0.25 for all n-grams
tokenizer - nltk.tokenize.word_tokenize

Rouge

Library:rouge_score.rouge_scorer Configurable parameters:

score_types: List[str] - defines which rouge-n metrics will be outputted. Defaults to ["rouge1", "rouge2", "rougeL"]

Meteor

Library: nltk.translate.meteor_score Non-configurable parameters:

stemmer -PorterStemmer
wordnet -nltk.corpus.wordnet
alpha=0.9, beta=3.0, gamma=0.5

Cosine Similarity

Library: sklearn.metrics.pairwise.cosine_similarity Non-configurable parameters:

embedding model - sentence-transformers/all-MiniLM-L12-v2

F1 Score

Matching algorithm: tokenize the case-insensitive ground truth and predicted answer, then do exact matching without considering the order of the tokens. Non-configurable parameters:

tokenizer - nltk.tokenize.word_tokenize

Scale Confidence Score Auto Evaluations Overview

⌘I

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

Evaluation Metrics

Available Metrics

Bleu

Rouge

Meteor

Cosine Similarity

F1 Score

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

​Available Metrics

​Bleu

​Rouge

​Meteor

​Cosine Similarity

​F1 Score

Available Metrics

Bleu

Rouge

Meteor

Cosine Similarity

F1 Score