Back to Models
Meta: Llama 3.2 11B Vision Instruct
llama-3.2-11b-vision-instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
Modalities
Input
textimage
Output
text
Pricing
Cost per 1 million tokens
Input
$0.414
Output
$0.414
Model Specs
Context Window
131,072Max Output
16,384Release Date
2024-09-25Knowledge Cutoff
2023-12-31Capabilities
Reasoning
Tool Calling
Vision
Last Updated: 2024-09-25
Provider: