Quick Start
Get started with Hontoni API in under 2 minutes. Hontoni provides a unified API gateway for Claude Sonnet 4.x, Opus 4.x, GPT-5.x, and Gemini Pro models through OpenAI-compatible and Anthropic-compatible endpoints.
1. Get your API key
Register an account and create an API key from the dashboard, or use the API directly:
# Register
curl -X POST https://api.hontoni.vn/api/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "password": "your-password", "name": "Your Name"}'
# Create an API key
curl -X POST https://api.hontoni.vn/api/keys \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{"name": "my-key"}'
2. Make your first request
curl https://api.hontoni.vn/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
You're now using Claude Sonnet 4 through the OpenAI-compatible API. Switch models by changing the model field.
Authentication
All API requests require authentication via an API key. You can pass it in two ways:
# Option 1: Authorization header (recommended)
Authorization: Bearer sk-your-api-key
# Option 2: X-API-Key header
X-API-Key: sk-your-api-key
API keys are prefixed with sk-gw- followed by a unique identifier. Keep your keys secure and never expose them in client-side code.
Base URL
All AI API endpoints are relative to your deployment base URL:
${apiUrl}/v1
| Endpoint | Description |
|---|---|
/v1/chat/completions | OpenAI Chat Completions API |
/v1/messages | Anthropic Messages API |
/v1/responses | OpenAI Responses API |
/v1/models | List available models |
/api/keys | Manage API keys |
/api/billing | Balance & billing |
/api/subscription | Subscription plans |
/api/addons | Rate limit add-ons |
Tool Integrations
Hontoni works as a drop-in replacement for OpenAI and Anthropic APIs. Configure your favorite AI coding tool to use Hontoni as the backend.
Claude Code
Configure Claude Code to use Hontoni as the API provider:
# Set environment variables
export ANTHROPIC_BASE_URL=https://api.hontoni.vn/v1
export ANTHROPIC_API_KEY=sk-your-api-key
# Or configure in ~/.claude/config.json
{
"apiBaseUrl": "https://api.hontoni.vn/v1",
"apiKey": "sk-your-api-key"
}
Cursor
In Cursor Settings > Models > OpenAI API Key:
API Key: sk-your-api-key
Base URL: https://api.hontoni.vn/v1
Model: claude-sonnet-4
Enable "Override OpenAI Base URL" in settings, then enter the Hontoni base URL. All OpenAI-compatible models will work automatically.
Windsurf
Configure in Windsurf settings:
{
"ai.provider": "openai",
"ai.openai.baseUrl": "https://api.hontoni.vn/v1",
"ai.openai.apiKey": "sk-your-api-key",
"ai.openai.model": "claude-sonnet-4"
}
Continue
Add to your ~/.continue/config.json:
{
"models": [
{
"title": "Hontoni - Claude Sonnet 4",
"provider": "openai",
"model": "claude-sonnet-4",
"apiBase": "https://api.hontoni.vn/v1",
"apiKey": "sk-your-api-key"
}
]
}
Cline
In VS Code, open Cline settings and configure the API provider:
Provider: OpenAI Compatible
Base URL: https://api.hontoni.vn/v1
API Key: sk-your-api-key
Model: claude-sonnet-4
Aider
Set environment variables or use command-line flags:
# Environment variables
export OPENAI_API_BASE=https://api.hontoni.vn/v1
export OPENAI_API_KEY=sk-your-api-key
# Or use flags
aider --openai-api-base https://api.hontoni.vn/v1 \
--openai-api-key sk-your-api-key \
--model claude-sonnet-4
OpenCode
Configure in opencode.json:
{
"provider": {
"openai": {
"apiKey": "sk-your-api-key",
"baseURL": "https://api.hontoni.vn/v1"
}
},
"model": {
"default": "claude-sonnet-4"
}
}
Chat Completions API
OpenAI-compatible Chat Completions endpoint. Drop-in replacement for POST /v1/chat/completions.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. claude-sonnet-4) |
messages | array | Yes | Array of message objects |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | integer | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling parameter |
tools | array | No | Tool/function definitions |
tool_choice | string|object | No | Tool selection behavior |
Message Object
| Field | Type | Description |
|---|---|---|
role | string | system, user, assistant, or tool |
content | string|null | Message content |
name | string | Optional sender name |
tool_calls | array | Tool calls (assistant messages) |
Example: Non-streaming
curl -X POST https://api.hontoni.vn/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
],
"max_tokens": 200
}'
Example: Streaming
curl -X POST https://api.hontoni.vn/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
],
"stream": true
}'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "claude-sonnet-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 50,
"total_tokens": 75
}
}
Messages API
Anthropic-compatible Messages endpoint. Drop-in replacement for POST /v1/messages.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. claude-sonnet-4) |
messages | array | Yes | Conversation messages |
max_tokens | integer | Yes | Maximum tokens to generate |
system | string | No | System prompt |
stream | boolean | No | Enable SSE streaming |
temperature | number | No | Sampling temperature |
tools | array | No | Tool definitions |
thinking | object | No | Extended thinking config |
Thinking (Extended Reasoning)
For models that support reasoning (Claude Sonnet 4, Claude Opus 4):
{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Solve: x^2 + 5x + 6 = 0"}],
"max_tokens": 16000,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
Example Request
curl -X POST https://api.hontoni.vn/v1/messages \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4",
"max_tokens": 1024,
"system": "You are a helpful coding assistant.",
"messages": [
{"role": "user", "content": "Write a Python fibonacci function"}
]
}'
Response
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4",
"content": [
{
"type": "text",
"text": "Here's a Python fibonacci function..."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 30,
"output_tokens": 150
}
}
Responses API
OpenAI Responses API (newer format). Supports reasoning effort control and simplified input format.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID |
input | string|array | Yes | Simple string or structured input items |
instructions | string | No | System instructions |
stream | boolean | No | Enable SSE streaming |
max_output_tokens | integer | No | Maximum output tokens |
temperature | number | No | Sampling temperature |
reasoning | object | No | Reasoning configuration |
Reasoning Effort
Control how much reasoning the model uses:
{
"model": "gpt-5.2",
"input": "What is the meaning of life?",
"reasoning": {
"effort": "high",
"summary": "auto"
}
}
Effort levels: none, minimal, low, medium, high, xhigh
Streaming Events
When stream: true, the API sends Server-Sent Events:
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Generation started |
response.output_text.delta | Text content chunk |
response.reasoning_text.delta | Reasoning content chunk |
response.completed | Generation finished |
Models & Pricing
All available models with capabilities and per-token pricing (per 1M tokens in USD).
Anthropic Models
| Model | Context | Input | Output | Cache Read | Cache Write | Reasoning | Capabilities |
|---|---|---|---|---|---|---|---|
claude-sonnet-4 | 216K | $3.00 | $15.00 | $0.30 | $3.75 | $15.00 | reasoning, tools, vision, code, chat |
claude-sonnet-4.5 | 200K | $3.00 | $15.00 | $0.30 | $3.75 | $15.00 | reasoning, tools, vision, code, chat |
claude-sonnet-4.6 | 200K | $3.00 | $15.00 | $0.30 | $3.75 | $15.00 | reasoning, tools, vision, code, chat |
claude-opus-4.5 | 200K | $5.00 | $25.00 | $0.50 | $6.25 | $25.00 | reasoning, tools, vision, code, chat |
claude-opus-4.6 | 200K | $5.00 | $25.00 | $0.50 | $6.25 | $25.00 | reasoning, tools, vision, code, chat |
claude-haiku-4.5 | 200K | $1.00 | $5.00 | $0.10 | $1.25 | $5.00 | reasoning, tools, vision, code, chat |
OpenAI Models
| Model | Context | Input | Output | Reasoning | Capabilities |
|---|---|---|---|---|---|
gpt-4o | 128K | $2.50 | $10.00 | - | tools, vision, code, chat |
gpt-4o-mini | 128K | $0.15 | $0.60 | - | tools, vision, code, chat |
gpt-4.1 | 128K | $2.00 | $8.00 | - | tools, vision, code, chat |
gpt-5.1 | 264K | $5.00 | $15.00 | $15.00 | reasoning, tools, vision, code, chat |
gpt-5.2 | 264K | $5.00 | $20.00 | $20.00 | reasoning, tools, vision, code, chat |
gpt-5.2-codex | 400K | $5.00 | $20.00 | $20.00 | reasoning, tools, vision, code, chat |
gpt-5.3-codex | 400K | $5.00 | $20.00 | $20.00 | reasoning, tools, vision, code, chat |
gpt-5.4 | 400K | $3.00 | $12.00 | $12.00 | reasoning, tools, vision, code, chat |
gpt-5.4-mini | 400K | $0.40 | $1.60 | $1.60 | reasoning, tools, vision, code, chat |
gpt-5-mini | 264K | $1.00 | $4.00 | $4.00 | reasoning, tools, vision, code, chat |
Google Models
| Model | Context | Input | Output | Reasoning | Capabilities |
|---|---|---|---|---|---|
gemini-2.5-pro | 128K | $1.25 | $10.00 | $10.00 | reasoning, tools, vision, code, chat |
gemini-3-flash-preview | 128K | $0.15 | $0.60 | $0.60 | reasoning, tools, vision, code, chat |
gemini-3.1-pro-preview | 128K | $1.25 | $5.00 | $5.00 | reasoning, tools, vision, code, chat |
For the most up-to-date model list and capabilities, use the GET /api/models endpoint.
Model Variants
Use model variant suffixes to control reasoning effort and context size without extra parameters.
Reasoning Effort Suffixes
Append :high or :max to any thinking-capable model to set reasoning effort automatically:
| Suffix | Budget Models (Claude/Gemini) | Effort Models (o-series) |
|---|---|---|
:high | thinking_budget = 32,768 tokens | reasoning_effort = "high" |
:max | thinking_budget = 65,536 tokens | reasoning_effort = "xhigh" |
Extended Context Suffix
Append -1m to request an extended 1M token context window:
| Variant | Effect |
|---|---|
claude-sonnet-4-1m | Extended context via pay-as-you-go |
gpt-4o-1m | Extended context via pay-as-you-go |
Combining Suffixes
Reasoning and context suffixes can be combined:
curl -X POST ${baseUrl}/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-d '{"model": "claude-sonnet-4:high-1m", "messages": [{"role": "user", "content": "Analyze this large codebase..."}]}'
thinking_budget or reasoning_effort parameters — explicit parameters take priority over variant suffixes.
Rate Limits
Rate limits protect the API from abuse and ensure fair usage. Limits apply per API key.
Plan-Based Limits
| Plan | Price | RPM | Per 5h | Daily | Weekly | Monthly | Concurrent | Thinking Budget |
|---|---|---|---|---|---|---|---|---|
| Basic | $6/mo | 8 | 20 | 100 | 120 | 480 | 1 | 8,000 |
| Standard | $13/mo | 12 | 45 | 250 | 275 | 1,100 | 2 | 16,000 |
| Premium | $39/mo | 20 | 200 | 1,000 | 1,250 | 5,000 | 2 | 32,000 |
| Ultimate | $79/mo | 40 | 800 | 3,000 | 5,000 | 20,000 | 4 | 64,000 |
Rate Limit Headers
Every API response includes rate limit headers:
X-RateLimit-Limit-RPM: 12
X-RateLimit-Remaining-RPM: 11
X-RateLimit-Limit-Daily: 250
X-RateLimit-Remaining-Daily: 248
X-RateLimit-Limit-5h: 45
X-RateLimit-Remaining-5h: 43
X-RateLimit-Limit-Weekly: 275
X-RateLimit-Remaining-Weekly: 270
X-RateLimit-Limit-Monthly: 1100
X-RateLimit-Remaining-Monthly: 1095
X-RateLimit-Concurrent-Limit: 2
X-RateLimit-Concurrent-Active: 1
Rate Limit Add-ons
Boost your rate limits with add-ons:
| Add-on | Price | Effect |
|---|---|---|
| Rate Limit 2x | $4.99/mo | Double all rate limits |
| Rate Limit 5x | $9.99/mo | 5x all rate limits |
| Rate Limit 10x | $19.99/mo | 10x all rate limits |
Billing
Hontoni uses a prepaid balance system. Top up your account and pay per token used.
Cost Calculation
Cost is calculated per request based on token usage and model pricing:
cost = (input_tokens × input_price / 1,000,000)
+ (output_tokens × output_price / 1,000,000)
+ (reasoning_tokens × reasoning_price / 1,000,000)
Example
Using claude-sonnet-4 with 1,000 input tokens and 500 output tokens:
Input cost: 1,000 × $3.00 / 1,000,000 = $0.003
Output cost: 500 × $15.00 / 1,000,000 = $0.0075
Total cost: $0.0105
Billing Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/billing/balance | Check current balance |
| POST | /api/billing/topup | Add funds (direct credit) |
| POST | /api/billing/topup/checkout | Create Stripe Checkout session |
| POST | /api/billing/topup/intent | Create Stripe PaymentIntent |
| GET | /api/billing/transactions | Transaction history |
| GET | /api/billing/invoices | List Stripe invoices |
| GET | /api/billing/invoices/:id/pdf | Get invoice PDF URL |
| GET | /api/billing/referral | Referral info & stats |
| POST | /api/billing/referral | Apply referral code |
| POST | /api/billing/promo-code | Validate promo code |
Requests will be rejected with a 402 Payment Required error when your balance is too low. Top up your account to continue using the API.
Error Handling
The API uses standard HTTP status codes and returns structured error responses.
Error Response Format
{
"error": {
"message": "Invalid API key provided",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Status Codes
| Code | Description | Common Cause |
|---|---|---|
| 400 | Bad Request | Invalid request body or parameters |
| 401 | Unauthorized | Missing or invalid API key |
| 402 | Payment Required | Insufficient balance |
| 404 | Not Found | Invalid model or endpoint |
| 409 | Conflict | Duplicate resource |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Upstream provider error |
Handling Rate Limits
async function callWithRetry(request, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, request);
if (response.status === 429) {
const retryAfter = response.headers.get('retry-after');
const delay = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
await new Promise(r => setTimeout(r, delay));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
FAQ
What models are supported?
Hontoni supports Claude (Sonnet 4, Sonnet 4.5, Sonnet 4.6, Opus 4.5, Opus 4.6, Haiku 4.5), GPT (4o, 4o-mini, 4.1, 5.1, 5.2, 5.4, 5-mini, codex variants), and Gemini (2.5 Pro, 3 Flash, 3.1 Pro). See the Models & Pricing section for the full list.
Is the API compatible with OpenAI SDKs?
Yes. Hontoni provides an OpenAI-compatible API at /v1/chat/completions. You can use the official OpenAI SDK by setting the base URL:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-api-key',
baseURL: 'https://api.hontoni.vn/v1',
});
const response = await client.chat.completions.create({
model: 'claude-sonnet-4',
messages: [{ role: 'user', content: 'Hello!' }],
});
Is the API compatible with Anthropic SDKs?
Yes. Hontoni provides an Anthropic-compatible API at /v1/messages. Use the official Anthropic SDK:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: 'sk-your-api-key',
baseURL: 'https://api.hontoni.vn/v1',
});
const message = await client.messages.create({
model: 'claude-sonnet-4',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }],
});
How does billing work?
Hontoni uses a prepaid balance system. You top up your account with funds, and each API request deducts costs based on token usage and the model's pricing. See the Billing section for the cost calculation formula.
What happens when I exceed rate limits?
You'll receive a 429 Too Many Requests response with rate limit headers indicating when limits reset. Implement exponential backoff in your client. Consider upgrading your plan or adding rate limit add-ons for higher limits.
Can I use multiple models in the same project?
Absolutely. Simply change the model parameter in each request. Use cheaper models like gpt-4o-mini or claude-haiku-4.5 for simple tasks and powerful models like claude-opus-4.6 for complex reasoning.