ky-thuat Beginner

What is an AI API? How to use LLM APIs

How developers call AI models from code — how the Claude API, OpenAI API, and Gemini API work.

Updated: May 5, 2026 · 3 min read

API (Application Programming Interface), in the AI context, is how developers call AI models (Claude, GPT, Gemini…) directly from code instead of through a chat UI. It’s the way you build AI features into your own app or website.

Why use the API instead of the web app?

Web (claude.ai, chatgpt.com)	API
For end users	For developers
Pay per subscription plan	Pay per token used
One user at a time	Thousands of parallel requests
Can’t be embedded into an app	Easy to integrate

If you’re building a chatbot, automation, or analysis tool, you’ll need the API.

API call examples (Python)

Claude

from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

response = client.messages.create(
    model="claude-sonnet-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain RAG to a beginner"}]
)
print(response.content[0].text)

OpenAI

from openai import OpenAI
client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Explain RAG to a beginner"}]
)
print(response.choices[0].message.content)

Gemini

from google import genai
client = genai.Client(api_key="...")

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Explain RAG to a beginner"
)
print(response.text)

The syntax differs but the concept is the same: send a message, receive a response.

API pricing for major providers (2026)

Provider	Model	Input / 1M tokens	Output / 1M tokens
Anthropic	Claude Sonnet 4.7	$3	$15
Anthropic	Claude Haiku 4.5	$0.80	$4
Anthropic	Claude Opus 4.5	$15	$75
OpenAI	GPT-5	$2.50	$10
OpenAI	GPT-5 mini	$0.15	$0.60
Google	Gemini 2.5 Pro	$1.25	$5
Google	Gemini 2.5 Flash	$0.10	$0.40

The “small/flash” tier is 10-30× cheaper than the flagship tier — golden rule: use the smallest model that’s good enough.

Advanced API features

Streaming: receive tokens piece by piece (great for chatbot UX)
Function calling / Tool use: let the LLM call your functions — see Function Calling
Structured output: force the LLM to return JSON matching a schema
Vision: send images alongside text
Caching: cache fixed prompts to cut costs by up to 90%
Batch API: send 1000 requests at once for a 50% discount

Practical notes for international developers

Payment: most providers require an international Visa/Mastercard. In some countries (for example, Vietnam) you may need a virtual card from your bank.
Rate limits: new accounts start with low limits. Verify your phone number and add credit to move up tiers.
Latency: an API call from Asia to us-east takes ~150-200ms. Anthropic and OpenAI offer Asia endpoints (Singapore, Tokyo) for higher tiers.
Compliance: if you handle sensitive customer data, read the provider’s data policy carefully.

Wrappers / SDKs worth using

LangChain — generalist framework supporting all providers (overkill for simple tasks)
LlamaIndex — great for RAG
Vercel AI SDK — best fit for TypeScript web apps
LiteLLM — proxy across many providers behind one interface