API Reference
Overview of the Casola API. For the full interactive reference with request/response schemas, see the API reference (served from /reference on the API).
Base URL
Section titled “Base URL”https://api.casola.aiRegion-pinned API hosts route all requests from that host directly to the specified macro region, without needing the X-Region header:
| Host | Macro region |
|---|---|
api.us.casola.ai | US only |
api.eu.casola.ai | EU only |
api.ap.casola.ai | AP only |
Use region-pinned hosts when you need hard data-residency guarantees and want to enforce the region at the network layer rather than per-request.
Authentication
Section titled “Authentication”All API requests require a Bearer token in the Authorization header:
Authorization: Bearer csk_your_token_hereTokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.
OpenAI-Compatible Endpoints
Section titled “OpenAI-Compatible Endpoints”These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.
| Method | Path | Description |
|---|---|---|
POST | /openai/v1/chat/completions | Chat completion (streaming supported) |
POST | /openai/v1/responses | Responses API with hosted web_search and function tool support |
POST | /openai/v1/completions | Text completions (native on supported models; external fallback otherwise) |
POST | /openai/v1/embeddings | Text embeddings |
POST | /openai/v1/audio/speech | Text-to-speech |
POST | /openai/v1/audio/transcriptions | Speech-to-text (file upload or URL) |
POST | /openai/v1/audio/translations | Speech translation to English |
POST | /openai/v1/images/generations | Image generation |
POST | /openai/v1/images/edits | Image editing / inpainting |
POST | /openai/v1/rerank | Document reranking |
POST | /openai/v1/score | Text pair scoring |
POST | /openai/v1/pii | PII detection and redaction |
GET | /openai/v1/models | List available models |
GET | /openai/v1/models/:model | Get a single model |
POST | /openai/v1/files | Upload a file |
GET | /openai/v1/files | List files |
GET | /openai/v1/files/:file_id | Get file metadata |
DELETE | /openai/v1/files/:file_id | Delete a file |
GET | /openai/v1/files/:file_id/content | Download file content |
POST | /openai/v1/batches | Create a batch job |
GET | /openai/v1/batches | List batches |
GET | /openai/v1/batches/:batch_id | Get batch status |
POST | /openai/v1/batches/:batch_id/cancel | Cancel a batch |
Example: Chat Completion
Section titled “Example: Chat Completion”curl https://api.casola.ai/openai/v1/chat/completions \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen3.5-4B", "messages": [{"role": "user", "content": "Hello!"}] }'Response:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1711234567, "model": "Qwen/Qwen3.5-4B", "choices": [ { "index": 0, "message": {"role": "assistant", "content": "Hello! How can I help you today?"}, "finish_reason": "stop" } ], "usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}}Responses Hosted Tools
Section titled “Responses Hosted Tools”POST /openai/v1/responses supports the OpenAI Responses API request shape. The hosted web_search tool and client-managed function tools can be mixed freely in the same request.
tools: [{ "type": "web_search" }]is executed server-side before the model responds.tools: [{ "type": "function", "function": { ... } }]are forwarded to the model as client-managed tools.- Hosted and function tools can appear together in the same
toolsarray. stream: trueis supported.async: trueis not supported when hosted tools are present.
Example: Text-to-Speech
Section titled “Example: Text-to-Speech”curl https://api.casola.ai/openai/v1/audio/speech \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen3-TTS", "input": "Hello, welcome to Casola!", "voice": "Vivian", "response_format": "mp3" }' \ --output speech.mp3The response is the binary audio file directly.
Example: Speech-to-Text
Section titled “Example: Speech-to-Text”curl https://api.casola.ai/openai/v1/audio/transcriptions \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -F model="whisper-large-v3-turbo" \ -F file=@recording.mp3 \ -F response_format="verbose_json"Response:
{ "task": "transcribe", "language": "en", "duration": 45.2, "text": "Hello, this is a sample recording...", "segments": [ {"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."} ]}Example: Embeddings
Section titled “Example: Embeddings”curl https://api.casola.ai/openai/v1/embeddings \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "sglang-qwen3-0.6b-fp8-embed", "input": "The quick brown fox jumps over the lazy dog" }'Response:
{ "object": "list", "data": [ {"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]} ], "model": "sglang-qwen3-0.6b-fp8-embed", "usage": {"prompt_tokens": 10, "total_tokens": 10}}Example: Image Generation
Section titled “Example: Image Generation”curl https://api.casola.ai/openai/v1/images/generations \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "black-forest-labs/FLUX.1-schnell", "prompt": "a neon-lit alley in the rain", "size": "1024x1024" }'Response:
{ "created": 1711234567, "data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]}Example: Document Reranking
Section titled “Example: Document Reranking”curl https://api.casola.ai/openai/v1/rerank \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-0.6b-fp8-score", "query": "What is machine learning?", "documents": [ "Machine learning is a subset of artificial intelligence.", "The weather today is sunny.", "Deep learning uses neural networks with many layers." ], "top_n": 2, "return_documents": true }'Response:
{ "object": "list", "data": [ {"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."}, {"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."} ]}Example: List Models
Section titled “Example: List Models”curl https://api.casola.ai/openai/v1/models \ -H "Authorization: Bearer $CASOLA_API_KEY"Response:
{ "object": "list", "data": [ {"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"}, {"id": "black-forest-labs/FLUX.1-schnell", "object": "model", "created": 0, "owned_by": "casola"} ]}Fal-Compatible Endpoints
Section titled “Fal-Compatible Endpoints”Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.
| Method | Path | Description |
|---|---|---|
POST | /fal/{slug} | Submit a job (sync or async) |
GET | /fal/requests/{requestId} | Poll job status / get result |
POST | /fal/requests/batch | Batch status query (max 50 IDs) |
Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.
Sync vs Async
Section titled “Sync vs Async”Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.
curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A cat in space", "sync_mode": true}'Async mode (default): Returns immediately with a request_id. Poll for the result.
# Submitcurl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A sunset over the ocean"}'
# Pollcurl https://api.casola.ai/fal/requests/{requestId} \ -H "Authorization: Bearer $CASOLA_API_KEY"Async Job Flow
Section titled “Async Job Flow”For long-running tasks (video generation, batches), use the async job API:
POST /api/jobs → { id, queue_id } # 1. Create jobGET /api/jobs/{id} → { status, result, ... } # 2. Poll until completedPOST /api/jobs/{id}/cancel # 3. Cancel if neededJob Statuses
Section titled “Job Statuses”| Status | Meaning |
|---|---|
pending | Queued, waiting for a worker |
running | Worker is executing the job |
completed | Result is ready |
failed | Job failed (check error field) |
cancelled | Cancelled by the user |
dead_letter | Exhausted retries; permanently failed |
Fal-compatible endpoints (/fal/requests/{id}) map these to uppercase: IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED.
Core API Endpoints
Section titled “Core API Endpoints”| Method | Path | Description | Scope |
|---|---|---|---|
GET | /api/catalog | Public model catalog | user:read |
GET | /api/pricing | Public pricing information | none |
GET | /api/model-status | Model availability and status | user:read |
GET | /api/voice/models | List voice models with available voices | user:read |
POST | /api/render | Render endpoint | user:write |
GET | /api/search | Search models and content | user:read |
POST | /api/agents | Create an agent | user:write |
POST | /api/agents/{id}/run | Run an agent | user:write |
POST | /api/prompt-rewrite | AI-assisted prompt enhancement | user:write |
GET | /api/organizations/{orgId}/usage | Usage aggregates | admin:read |
GET | /api/organizations/{orgId}/usage/by-token | Usage breakdown by token | admin:read |
GET | /api/organizations/{orgId}/billing/rates | Billing rates for the organization | admin:read |
POST | /api/organizations/{orgId}/tokens | Create an API token | admin:write |
GET | /api/organizations/{orgId}/content-filter-policy | Get content filter policy | admin:read |
POST | /api/organizations/{orgId}/content-filter-policy | Create content filter policy (enterprise only) | admin:write |
PATCH | /api/organizations/{orgId}/content-filter-policy | Update content filter policy (enterprise only) | admin:write |
DELETE | /api/organizations/{orgId}/content-filter-policy | Delete content filter policy | admin:write |
Content filter policy actions:
block(reject on violation),flag(allow + tag),log(allow + audit),off(skip classification entirely — enterprise only, no GPU cost, no audit rows). For enterprise organizations, platform/provider minimum floors may be bypassed regardless of action. |POST|/api/library/items/{itemId}/share| Create a share link for a library item |user:write| |GET|/api/library/items/{itemId}/share| Get share link for a library item |user:read| |DELETE|/api/library/items/{itemId}/share| Delete a share link |user:write| |GET|/api/shares/{token}| Fetch a shared item by token | none |
Example: Model Status
Section titled “Example: Model Status”curl https://api.casola.ai/api/model-status \ -H "Authorization: Bearer $CASOLA_API_KEY"Response:
{ "models": [ { "model_id": "Qwen/Qwen3.5-4B", "spec_id": "spec_abc", "enabled": true, "tasks": ["openai/chat-completion"] } ]}Example: Create an Agent
Section titled “Example: Create an Agent”curl -X POST https://api.casola.ai/api/agents \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Image Generator", "dag": { "nodes": { "gen": { "model_id": "black-forest-labs/FLUX.1-schnell", "task": "fal/text-to-image", "inputs": {"prompt": "${input.prompt}"}, "outputs": ["images[0].url"] } }, "edges": [] } }'Example: Run an Agent
Section titled “Example: Run an Agent”curl -X POST https://api.casola.ai/api/agents/agent_abc123/run \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"input_params": {"prompt": "a sunset over the ocean"}}'Response:
{ "run": { "id": "run_def456", "agent_id": "agent_abc123", "status": "pending", "input_params": {"prompt": "a sunset over the ocean"}, "created_at": 1711234567 }}Example: Create an Agent with a Map Node
Section titled “Example: Create an Agent with a Map Node”Map nodes fan out a task across each element of an array. The items field is a template expression that resolves to an array at runtime. Each sub-job receives one element as item, referenced via ${item} or ${item.field} in the inner node’s payload.
curl -X POST https://api.casola.ai/api/agents \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Batch Image Generator", "dag": { "nodes": { "generate": { "type": "map", "items": "${input.prompts}", "node": { "model_id": "black-forest-labs/FLUX.1-schnell", "task": "fal/text-to-image", "payload": { "prompt": "${item}" } }, "tolerance": 0 } }, "edges": [] } }'The MapNodeSchema shape:
{ "type": "map", "items": "<template expression resolving to an array>", "node": { "model_id": "<model ID>", "task": "<task type>", "payload": { "...": "supports ${item} and ${item.field} interpolation" } }, "max_concurrency": 4, "tolerance": 0}| Field | Type | Required | Description |
|---|---|---|---|
type | "map" | yes | Identifies this as a map node |
items | string | yes | Template expression resolving to an array (e.g. ${input.prompts}) |
node.model_id | string | yes | Model to use for each sub-job |
node.task | string | yes | Task type for each sub-job |
node.payload | object | yes | Payload template — ${item} references the current element |
max_concurrency | number | no | Max parallel sub-jobs |
tolerance | number | no | Fraction of sub-jobs allowed to fail (0–1, default 0) |
Run the agent by passing an array in input_params:
curl -X POST https://api.casola.ai/api/agents/agent_batch123/run \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input_params": { "prompts": [ "a neon cityscape at night", "a watercolor forest in autumn", "a pencil sketch of a mountain lake" ] } }'The $each interpolation syntax can also be used inside any node payload to expand an array inline:
{ "type": "task", "model_id": "Qwen/Qwen3.5-4B", "task": "openai/chat-completion", "payload": { "messages": [ {"role": "system", "content": "Summarize the following items."}, { "$each": "${nodes.generate.result.items}", "template": { "role": "user", "content": "Item: ${item.url}" } } ] }}Example: Create an API Token
Section titled “Example: Create an API Token”curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "CI Pipeline Token", "scopes": ["user:read", "user:write"] }'Response:
{ "token": { "id": "tok_abc123", "name": "CI Pipeline Token", "scopes": ["user:read", "user:write"], "status": "active" }, "secret": "csk_a3b4f8c2..."}The secret is only returned once — store it securely.
See the interactive API reference for the full endpoint list.
Rate Limiting
Section titled “Rate Limiting”Requests are rate-limited per organization based on the plan:
| Plan | API Requests | Job Submissions |
|---|---|---|
| Free | 60/min | 10/min |
| Pro | 600/min | 100/min |
| Enterprise | 6,000/min | 1,000/min |
Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or agent run endpoints. Everything else counts as an API request.
When rate-limited, the API returns 429 Too Many Requests:
{ "error": { "code": "rate_limit", "message": "Rate limit exceeded", "type": "rate_limit_error" }}Error Format
Section titled “Error Format”All errors use a consistent envelope:
{ "error": { "code": "not_found", "message": "Job not found", "type": "not_found_error" }}OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.