First Reasoning Diffusion AI Model

Mercury 2

The Diffusion Revolution That's Rewriting AI Speed Rules

0 Tokens/Second

0K Context Window

0x Faster

The Problem: Why AI Has Been Slow

Traditional AI models think one word at a time, like a meticulous typist pecking at a vintage keyboard.

Autoregressive Models

Sequential Token Generation

~150 tokens/sec

Each token waits for the previous one
Hard ceiling on speed
Error propagation risk

Diffusion Models

Parallel Token Generation

~1000 tokens/sec

All tokens generated simultaneously
Iterative refinement
Global context awareness

How Diffusion Text Generation Works

From noise to coherent response, refined in parallel passes

Step 0: Noise

∅∅∅∅∅ ∅∅∅∅∅

0% coherent

→

Step 1: First Pass

The∅quick∅brown ∅fox∅jumps∅

~30% coherent

→

Step 2: Refinement

Thequick∅brownfox jumpsover∅the∅

~60% coherent

→

Step 3: Final

Thequickbrownfoxjumps overthelazydog.

100% coherent

Speed Comparison

Mercury 2 delivers 5-10x speedup for equivalent quality

Tokens Per Second

Response Time (100 tokens)

Pricing Comparison (per million tokens)

GPT-4o

Input: $2.50

Output: $10.00

Claude 3.5

Input: $3.00

Output: $15.00

Mercury 2

                            Input:
                            $0.25
                        

                            Output:
                            $0.75
                        
90%+ Savings

Why Speed Matters

Speed enables entirely new possibilities with AI

🎤

Real-Time Voice AI

Sub-500ms responses enable natural conversation flow. No more awkward pauses.

Before

3-5s

Mercury 2

0.3s

🤖

AI Agents

Multi-step agent workflows that feel instant instead of sluggish.

Understand

→

Plan

→

Execute

→

Report

📝

Content at Scale

Generate a month's worth of content in seconds, not hours.

18 min Traditional

→

2 min Mercury 2

📊

High-Throughput Processing

Process 10,000 personalized emails in 4 hours instead of 28.

Key Features

128K Context Window

Hold entire documents, codebases, or books in a single prompt. No complex chunking needed.

Native Tool Use

Define functions that Mercury 2 can invoke during reasoning. Build real AI agents.

Structured Output

Request responses in predefined JSON schemas for seamless integration.

OpenAI Compatible

Drop-in replacement. Just change the base URL and model name.

Get Started in Seconds

Fully compatible with OpenAI API specification

Before (OpenAI)

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

→

After (Mercury 2)

response = openai.ChatCompletion.create(
    model="mercury-2",
    base_url="https://api.inceptionlabs.ai/v1",
    messages=[{"role": "user", "content": "Hello!"}]
)

Mercury 2

The Problem: Why AI Has Been Slow

Autoregressive Models

Diffusion Models

How Diffusion Text Generation Works

Speed Comparison

Tokens Per Second

Response Time (100 tokens)

Pricing Comparison (per million tokens)

Why Speed Matters

Real-Time Voice AI

AI Agents

Content at Scale

High-Throughput Processing

Key Features

128K Context Window

Native Tool Use

Structured Output

OpenAI Compatible

Get Started in Seconds

The Future of AI Isn't Just Smarter—It's Faster