Suitable data for zero-shot learning possesses descriptive attributes or semantic representations for its categories, especially for new, unseen categories not present in the training data. Crucially, these novel categories must be describable using the same semantic space or attribute vocabulary as the training categories.

The core principle requires a well-defined semantic embedding space (like text descriptions, attribute lists, or knowledge graph relations) that connects seen and unseen classes. Key considerations include the richness and discriminativeness of these descriptions, the consistency of the semantic space across all categories, and the feasibility of mapping raw data (images, text, etc.) into this space. The unseen categories should be distinct from training classes but relatable through the shared semantics. Underlying models must generalize from seen classes' patterns to unseen ones using the semantic bridge.

This approach is valuable for rapidly deploying models to recognize novel items or concepts where obtaining labeled examples is impractical. Implementation involves defining a robust semantic system, training a model on seen classes mapped to semantics, then inferring unseen classes based on their semantic descriptions alone. Typical applications include classifying new retail products using text descriptions or identifying rare species in biodiversity studies.

What kind of data is suitable for zero-shot learning?

関連する質問

Is there a big difference between fine-tuning and retraining a model?

What is the difference between zero-shot learning and few-shot learning?

What are the application scenarios of few-shot learning?

What are the differences between the BLEU metric and ROUGE?