First Reasoning Diffusion AI Model

Mercury 2

The Diffusion Revolution That's Rewriting AI Speed Rules

0 Tokens/Second
0K Context Window
0x Faster

The Problem: Why AI Has Been Slow

Traditional AI models think one word at a time, like a meticulous typist pecking at a vintage keyboard.

Autoregressive Models

Sequential Token Generation

T1
T2
T3
T4
T5
~150 tokens/sec
  • Each token waits for the previous one
  • Hard ceiling on speed
  • Error propagation risk
VS

Diffusion Models

Parallel Token Generation

T1
T2
T3
T4
T5
~1000 tokens/sec
  • All tokens generated simultaneously
  • Iterative refinement
  • Global context awareness

How Diffusion Text Generation Works

From noise to coherent response, refined in parallel passes

Step 0: Noise
0% coherent
Step 1: First Pass
Thequickbrown foxjumps
~30% coherent
Step 2: Refinement
Thequickbrownfox jumpsoverthe
~60% coherent
Step 3: Final
Thequickbrownfoxjumps overthelazydog.
100% coherent

Speed Comparison

Mercury 2 delivers 5-10x speedup for equivalent quality

Tokens Per Second

Response Time (100 tokens)

Pricing Comparison (per million tokens)

GPT-4o
Input: $2.50
Output: $10.00
Claude 3.5
Input: $3.00
Output: $15.00
Mercury 2
Input: $0.25
Output: $0.75
90%+ Savings

Why Speed Matters

Speed enables entirely new possibilities with AI

🎤

Real-Time Voice AI

Sub-500ms responses enable natural conversation flow. No more awkward pauses.

Before
3-5s
Mercury 2
0.3s
🤖

AI Agents

Multi-step agent workflows that feel instant instead of sluggish.

Understand
Plan
Execute
Report
📝

Content at Scale

Generate a month's worth of content in seconds, not hours.

18 min Traditional
2 min Mercury 2
📊

High-Throughput Processing

Process 10,000 personalized emails in 4 hours instead of 28.

Key Features

01

128K Context Window

Hold entire documents, codebases, or books in a single prompt. No complex chunking needed.

02

Native Tool Use

Define functions that Mercury 2 can invoke during reasoning. Build real AI agents.

03

Structured Output

Request responses in predefined JSON schemas for seamless integration.

04

OpenAI Compatible

Drop-in replacement. Just change the base URL and model name.

Get Started in Seconds

Fully compatible with OpenAI API specification

Before (OpenAI)
response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
After (Mercury 2)
response = openai.ChatCompletion.create(
    model="mercury-2",
    base_url="https://api.inceptionlabs.ai/v1",
    messages=[{"role": "user", "content": "Hello!"}]
)

The Future of AI Isn't Just Smarter—It's Faster

Start building at diffusion speed today.

Get Started