Back to Models
ui-tars-1.5-7b
ui-tars-1.5-7b
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Modalities
Input
imagetext
Output
text
Pricing
Cost per 1 million tokens
Input
$0.12
Output
$0.24
Model Specs
Context Window
128,000Max Output
2,048Release Date
2025-07-23Knowledge Cutoff
Capabilities
Reasoning
Tool Calling
Vision
Last Updated: 2026-03-15
Provider: