← Back to Directory
OpenAI
Multimodal2024-05-13

GPT-4o

GPT-4o ('omni') is OpenAI's versatile multimodal model for text, audio, and vision.

#flagship#multimodal#fast

Overview

GPT-4o ('omni') is OpenAI's flagship multimodal model designed for real-time interaction. It integrates text, vision, and audio natively into a single transformer, allowing for low-latency responses and expressive vocal interactions.

Unique Factor

Native multimodality across all three modes (text, audio, vision) with extremely low latency.

Key Capabilities

Real-time audio
Native multimodal
High speed

Benchmarks

MMLU Score
88.7%
HumanEval (Coding)
90.2%
GPQA Diamond
82%
MATH Benchmark
85.1%

Top Use Cases

Real-time voice assistant

Interactive voice conversations with emotional nuance.

Example: “Talk to me about the stars in a soothing voice.

Learn to Master This Model

Take our free structured GPT-4o course — from basics to advanced techniques.

ChatGPT Course

Technical Specs

Context128,000 tokens
Paramsunknown
LicenseProprietary
ArchTransformer

API Pricing

$2.5 / 1M input tokens

Output: $10 / 1M tokens

✓ Free tier available
Access API

Developer

Creator of ChatGPT, GPT-4o, and o3 — the world's most widely used AI platform.

Prompt Library

Browse Coding Prompts

📋

Previous Version

Gpt 4 Turbo