Zero-shot learning does not strictly require large models, but their emergence significantly boosts effectiveness. Traditional methods achieve recognition on unseen classes without training data via attribute-label relationships or knowledge transfer. However, foundation models (LLMs, VLMs) substantially outperform these by leveraging vast pre-trained knowledge and strong generalization capabilities.

The capability for meaningful zero-shot inference is significantly enhanced by large models due to their inherent knowledge capacity and pattern recognition abilities. Smaller models struggle to generalize well without specific tuning. While not impossible, their zero-shot performance is typically inferior. Performance strongly correlates with the model's scale, quality of pre-training, and prompt design sensitivity.

Large models enable practical, high-performing ZSL applications across NLP (text classification, QA) and vision (object recognition) by directly leveraging prompts or embeddings. They unlock scalability for tasks with massive/unpredictable label sets where collecting labeled data is infeasible. Achieving optimal results often involves targeted prompt engineering or light-weight adapter tuning.

Is zero-shot learning highly dependent on large models?

関連する質問

Is there a big difference between fine-tuning and retraining a model?

What is the difference between zero-shot learning and few-shot learning?

What are the application scenarios of few-shot learning?

What are the differences between the BLEU metric and ROUGE?