Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.
The AI Inference platform
Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Serverless pricing
Rich model catalog
Widely compatible
Scale up, and down
Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.
What you pay for
on a hyperscaler
on a hyperscaler
What you pay for
on Cloudflare
on Cloudflare
AI models easily accessible via code, OpenAI SDK or API
Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.
Kimi K2.6
Powerful vision and agentic tool calling model
GLM 4.7 Flash
Rapid multilingual agent with expert tool calling
GPT-OSS-120B
Specialized for coding and debugging
Llama 4 Scout
Balanced generalist for everyday tasks
Run any AI model with one API call
Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.

const response = await env.AI.run("@cf/moonshotai/kimi-k2.6", { messages: [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World" }, ]} );
Practical AI at the Edge
Run real-world AI workloads directly on Cloudflare's global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.
Workers AI Explore a Rich Catalog of 50+ Ready-to-Use Models
Real-world examples in action
Image generation
Speech-to-text, in real-time
Embeddings
LLMs
Workers AI Pricing
50+ models running at the edge. View AI pricing details
Component
Free
Paid
Neurons
Free
—
Paid
$0.011 / thousand neurons
Shopify
"
For Shopify, the real challenge is not about how many different pieces of complex technology we can use but the opposite. Cloudflare helps us find a simple way to achieve something very complex that we can scale and maintain. "
Duncan Davidson VP of Developer Productivity
Powerful primitives, seamlessly integrated
Built on systems powering 20% of the Internet, Workers AI runs on the same infrastructure Cloudflare uses to build Cloudflare. Enterprise-grade reliability, security, and performance are standard.
Build section video off in Low preview — use High mode and refresh for full motion.