← Back to Directory
Multimodal2024-05-13

GPT-4o
GPT-4o ('omni') is OpenAI's versatile multimodal model for text, audio, and vision.
Overview
GPT-4o ('omni') is OpenAI's flagship multimodal model designed for real-time interaction. It integrates text, vision, and audio natively into a single transformer, allowing for low-latency responses and expressive vocal interactions.
Unique Factor
Native multimodality across all three modes (text, audio, vision) with extremely low latency.
Key Capabilities
●
Real-time audio
●
Native multimodal
●
High speed
Benchmarks
MMLU Score
88.7%
HumanEval (Coding)
90.2%
GPQA Diamond
82%
MATH Benchmark
85.1%
Top Use Cases
Real-time voice assistant
Interactive voice conversations with emotional nuance.
Example: “Talk to me about the stars in a soothing voice.”
Learn to Master This Model
Take our free structured GPT-4o course — from basics to advanced techniques.
Technical Specs
Context128,000 tokens
Paramsunknown
LicenseProprietary
ArchTransformer
Prompt Library
Browse Coding Prompts →
Previous Version
Gpt 4 Turbo →