FAQに戻る
Enterprise Applications

What is the BLEU metric

The BLEU (Bilingual Evaluation Understudy) metric is an algorithm for automatically evaluating the quality of machine-translated text. It measures the similarity between the machine-generated translation and one or more high-quality human reference translations.

BLEU calculates a precision score by comparing overlapping sequences of words (n-grams, typically up to 4-grams) between the machine output and the reference(s). It incorporates a brevity penalty to penalize overly short translations that omit content present in the references. The score ranges from 0 to 1, where higher scores indicate closer resemblance to the reference translations. Its effectiveness relies on multiple, high-quality references.

BLEU is widely used in machine translation research and development to rapidly evaluate and compare the performance of different models or systems during training and experimentation. It provides an efficient, automated benchmark, enabling iterative improvement. However, while useful for system-level comparison, it correlates imperfectly with human judgments of fluency and adequacy and is best used alongside human evaluation.

関連する質問