TopDev
ky-thuat Beginner

What is an AI API? How to use LLM APIs

How developers call AI models from code — how the Claude API, OpenAI API, and Gemini API work.

Updated: May 5, 2026 · 3 min read

API (Application Programming Interface), in the AI context, is how developers call AI models (Claude, GPT, Gemini…) directly from code instead of through a chat UI. It’s the way you build AI features into your own app or website.

Why use the API instead of the web app?

Web (claude.ai, chatgpt.com)API
For end usersFor developers
Pay per subscription planPay per token used
One user at a timeThousands of parallel requests
Can’t be embedded into an appEasy to integrate

If you’re building a chatbot, automation, or analysis tool, you’ll need the API.

API call examples (Python)

Claude

from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

response = client.messages.create(
    model="claude-sonnet-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain RAG to a beginner"}]
)
print(response.content[0].text)

OpenAI

from openai import OpenAI
client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Explain RAG to a beginner"}]
)
print(response.choices[0].message.content)

Gemini

from google import genai
client = genai.Client(api_key="...")

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Explain RAG to a beginner"
)
print(response.text)

The syntax differs but the concept is the same: send a message, receive a response.

API pricing for major providers (2026)

ProviderModelInput / 1M tokensOutput / 1M tokens
AnthropicClaude Sonnet 4.7$3$15
AnthropicClaude Haiku 4.5$0.80$4
AnthropicClaude Opus 4.5$15$75
OpenAIGPT-5$2.50$10
OpenAIGPT-5 mini$0.15$0.60
GoogleGemini 2.5 Pro$1.25$5
GoogleGemini 2.5 Flash$0.10$0.40

The “small/flash” tier is 10-30× cheaper than the flagship tier — golden rule: use the smallest model that’s good enough.

Advanced API features

  • Streaming: receive tokens piece by piece (great for chatbot UX)
  • Function calling / Tool use: let the LLM call your functions — see Function Calling
  • Structured output: force the LLM to return JSON matching a schema
  • Vision: send images alongside text
  • Caching: cache fixed prompts to cut costs by up to 90%
  • Batch API: send 1000 requests at once for a 50% discount

Practical notes for international developers

  • Payment: most providers require an international Visa/Mastercard. In some countries (for example, Vietnam) you may need a virtual card from your bank.
  • Rate limits: new accounts start with low limits. Verify your phone number and add credit to move up tiers.
  • Latency: an API call from Asia to us-east takes ~150-200ms. Anthropic and OpenAI offer Asia endpoints (Singapore, Tokyo) for higher tiers.
  • Compliance: if you handle sensitive customer data, read the provider’s data policy carefully.

Wrappers / SDKs worth using

  • LangChain — generalist framework supporting all providers (overkill for simple tasks)
  • LlamaIndex — great for RAG
  • Vercel AI SDK — best fit for TypeScript web apps
  • LiteLLM — proxy across many providers behind one interface
Tags
#api#developer#production