Back to Models

ui-tars-1.5-7b

ui-tars-1.5-7b

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Modalities

Input

imagetext

Output

text
Pricing
Cost per 1 million tokens
Input
$0.12
Output
$0.24
Model Specs
Context Window
128,000
Max Output
2,048
Release Date
2025-07-23
Knowledge Cutoff
Capabilities
Reasoning
Tool Calling
Vision

Last Updated: 2026-03-15

Provider: