Back to Models

Meta: Llama 3.2 11B Vision Instruct

llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Modalities

Input

textimage

Output

text
Pricing
Cost per 1 million tokens
Input
$0.414
Output
$0.414
Model Specs
Context Window
131,072
Max Output
16,384
Release Date
2024-09-25
Knowledge Cutoff
2023-12-31
Capabilities
Reasoning
Tool Calling
Vision

Last Updated: 2024-09-25

Provider: