Skip to content

API Reference

Overview of the Casola API. For the full interactive reference with request/response schemas, see the API reference (served from /reference on the API).

https://api.casola.ai

Region-pinned API hosts route all requests from that host directly to the specified macro region, without needing the X-Region header:

HostMacro region
api.us.casola.aiUS only
api.eu.casola.aiEU only
api.ap.casola.aiAP only

Use region-pinned hosts when you need hard data-residency guarantees and want to enforce the region at the network layer rather than per-request.

All API requests require a Bearer token in the Authorization header:

Authorization: Bearer csk_your_token_here

Tokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.

These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.

MethodPathDescription
POST/openai/v1/chat/completionsChat completion (streaming supported)
POST/openai/v1/responsesResponses API with hosted web_search and function tool support
POST/openai/v1/completionsText completions (native on supported models; external fallback otherwise)
POST/openai/v1/embeddingsText embeddings
POST/openai/v1/audio/speechText-to-speech
POST/openai/v1/audio/transcriptionsSpeech-to-text (file upload or URL)
POST/openai/v1/audio/translationsSpeech translation to English
POST/openai/v1/images/generationsImage generation
POST/openai/v1/images/editsImage editing / inpainting
POST/openai/v1/rerankDocument reranking
POST/openai/v1/scoreText pair scoring
POST/openai/v1/piiPII detection and redaction
GET/openai/v1/modelsList available models
GET/openai/v1/models/:modelGet a single model
POST/openai/v1/filesUpload a file
GET/openai/v1/filesList files
GET/openai/v1/files/:file_idGet file metadata
DELETE/openai/v1/files/:file_idDelete a file
GET/openai/v1/files/:file_id/contentDownload file content
POST/openai/v1/batchesCreate a batch job
GET/openai/v1/batchesList batches
GET/openai/v1/batches/:batch_idGet batch status
POST/openai/v1/batches/:batch_id/cancelCancel a batch
Terminal window
curl https://api.casola.ai/openai/v1/chat/completions \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-4B",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Response:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "Qwen/Qwen3.5-4B",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Hello! How can I help you today?"},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}
}

POST /openai/v1/responses supports the OpenAI Responses API request shape. The hosted web_search tool and client-managed function tools can be mixed freely in the same request.

  • tools: [{ "type": "web_search" }] is executed server-side before the model responds.
  • tools: [{ "type": "function", "function": { ... } }] are forwarded to the model as client-managed tools.
  • Hosted and function tools can appear together in the same tools array.
  • stream: true is supported.
  • async: true is not supported when hosted tools are present.
Terminal window
curl https://api.casola.ai/openai/v1/audio/speech \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3-TTS",
"input": "Hello, welcome to Casola!",
"voice": "Vivian",
"response_format": "mp3"
}' \
--output speech.mp3

The response is the binary audio file directly.

Terminal window
curl https://api.casola.ai/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-F model="whisper-large-v3-turbo" \
-F file=@recording.mp3 \
-F response_format="verbose_json"

Response:

{
"task": "transcribe",
"language": "en",
"duration": 45.2,
"text": "Hello, this is a sample recording...",
"segments": [
{"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."}
]
}
Terminal window
curl https://api.casola.ai/openai/v1/embeddings \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sglang-qwen3-0.6b-fp8-embed",
"input": "The quick brown fox jumps over the lazy dog"
}'

Response:

{
"object": "list",
"data": [
{"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]}
],
"model": "sglang-qwen3-0.6b-fp8-embed",
"usage": {"prompt_tokens": 10, "total_tokens": 10}
}
Terminal window
curl https://api.casola.ai/openai/v1/images/generations \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "a neon-lit alley in the rain",
"size": "1024x1024"
}'

Response:

{
"created": 1711234567,
"data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]
}
Terminal window
curl https://api.casola.ai/openai/v1/rerank \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-0.6b-fp8-score",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence.",
"The weather today is sunny.",
"Deep learning uses neural networks with many layers."
],
"top_n": 2,
"return_documents": true
}'

Response:

{
"object": "list",
"data": [
{"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."},
{"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."}
]
}
Terminal window
curl https://api.casola.ai/openai/v1/models \
-H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
"object": "list",
"data": [
{"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"},
{"id": "black-forest-labs/FLUX.1-schnell", "object": "model", "created": 0, "owned_by": "casola"}
]
}

Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.

MethodPathDescription
POST/fal/{slug}Submit a job (sync or async)
GET/fal/requests/{requestId}Poll job status / get result
POST/fal/requests/batchBatch status query (max 50 IDs)

Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.

Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.

Terminal window
curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "A cat in space", "sync_mode": true}'

Async mode (default): Returns immediately with a request_id. Poll for the result.

Terminal window
# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over the ocean"}'
# Poll
curl https://api.casola.ai/fal/requests/{requestId} \
-H "Authorization: Bearer $CASOLA_API_KEY"

For long-running tasks (video generation, batches), use the async job API:

POST /api/jobs → { id, queue_id } # 1. Create job
GET /api/jobs/{id} → { status, result, ... } # 2. Poll until completed
POST /api/jobs/{id}/cancel # 3. Cancel if needed
StatusMeaning
pendingQueued, waiting for a worker
runningWorker is executing the job
completedResult is ready
failedJob failed (check error field)
cancelledCancelled by the user
dead_letterExhausted retries; permanently failed

Fal-compatible endpoints (/fal/requests/{id}) map these to uppercase: IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED.

MethodPathDescriptionScope
GET/api/catalogPublic model cataloguser:read
GET/api/pricingPublic pricing informationnone
GET/api/model-statusModel availability and statususer:read
GET/api/voice/modelsList voice models with available voicesuser:read
POST/api/renderRender endpointuser:write
GET/api/searchSearch models and contentuser:read
POST/api/agentsCreate an agentuser:write
POST/api/agents/{id}/runRun an agentuser:write
POST/api/prompt-rewriteAI-assisted prompt enhancementuser:write
GET/api/organizations/{orgId}/usageUsage aggregatesadmin:read
GET/api/organizations/{orgId}/usage/by-tokenUsage breakdown by tokenadmin:read
GET/api/organizations/{orgId}/billing/ratesBilling rates for the organizationadmin:read
POST/api/organizations/{orgId}/tokensCreate an API tokenadmin:write
GET/api/organizations/{orgId}/content-filter-policyGet content filter policyadmin:read
POST/api/organizations/{orgId}/content-filter-policyCreate content filter policy (enterprise only)admin:write
PATCH/api/organizations/{orgId}/content-filter-policyUpdate content filter policy (enterprise only)admin:write
DELETE/api/organizations/{orgId}/content-filter-policyDelete content filter policyadmin:write

Content filter policy actions: block (reject on violation), flag (allow + tag), log (allow + audit), off (skip classification entirely — enterprise only, no GPU cost, no audit rows). For enterprise organizations, platform/provider minimum floors may be bypassed regardless of action. | POST | /api/library/items/{itemId}/share | Create a share link for a library item | user:write | | GET | /api/library/items/{itemId}/share | Get share link for a library item | user:read | | DELETE | /api/library/items/{itemId}/share | Delete a share link | user:write | | GET | /api/shares/{token} | Fetch a shared item by token | none |

Terminal window
curl https://api.casola.ai/api/model-status \
-H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
"models": [
{
"model_id": "Qwen/Qwen3.5-4B",
"spec_id": "spec_abc",
"enabled": true,
"tasks": ["openai/chat-completion"]
}
]
}
Terminal window
curl -X POST https://api.casola.ai/api/agents \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Image Generator",
"dag": {
"nodes": {
"gen": {
"model_id": "black-forest-labs/FLUX.1-schnell",
"task": "fal/text-to-image",
"inputs": {"prompt": "${input.prompt}"},
"outputs": ["images[0].url"]
}
},
"edges": []
}
}'
Terminal window
curl -X POST https://api.casola.ai/api/agents/agent_abc123/run \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input_params": {"prompt": "a sunset over the ocean"}}'

Response:

{
"run": {
"id": "run_def456",
"agent_id": "agent_abc123",
"status": "pending",
"input_params": {"prompt": "a sunset over the ocean"},
"created_at": 1711234567
}
}

Map nodes fan out a task across each element of an array. The items field is a template expression that resolves to an array at runtime. Each sub-job receives one element as item, referenced via ${item} or ${item.field} in the inner node’s payload.

Terminal window
curl -X POST https://api.casola.ai/api/agents \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Batch Image Generator",
"dag": {
"nodes": {
"generate": {
"type": "map",
"items": "${input.prompts}",
"node": {
"model_id": "black-forest-labs/FLUX.1-schnell",
"task": "fal/text-to-image",
"payload": {
"prompt": "${item}"
}
},
"tolerance": 0
}
},
"edges": []
}
}'

The MapNodeSchema shape:

{
"type": "map",
"items": "<template expression resolving to an array>",
"node": {
"model_id": "<model ID>",
"task": "<task type>",
"payload": { "...": "supports ${item} and ${item.field} interpolation" }
},
"max_concurrency": 4,
"tolerance": 0
}
FieldTypeRequiredDescription
type"map"yesIdentifies this as a map node
itemsstringyesTemplate expression resolving to an array (e.g. ${input.prompts})
node.model_idstringyesModel to use for each sub-job
node.taskstringyesTask type for each sub-job
node.payloadobjectyesPayload template — ${item} references the current element
max_concurrencynumbernoMax parallel sub-jobs
tolerancenumbernoFraction of sub-jobs allowed to fail (0–1, default 0)

Run the agent by passing an array in input_params:

Terminal window
curl -X POST https://api.casola.ai/api/agents/agent_batch123/run \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_params": {
"prompts": [
"a neon cityscape at night",
"a watercolor forest in autumn",
"a pencil sketch of a mountain lake"
]
}
}'

The $each interpolation syntax can also be used inside any node payload to expand an array inline:

{
"type": "task",
"model_id": "Qwen/Qwen3.5-4B",
"task": "openai/chat-completion",
"payload": {
"messages": [
{"role": "system", "content": "Summarize the following items."},
{
"$each": "${nodes.generate.result.items}",
"template": {
"role": "user",
"content": "Item: ${item.url}"
}
}
]
}
}
Terminal window
curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "CI Pipeline Token",
"scopes": ["user:read", "user:write"]
}'

Response:

{
"token": {
"id": "tok_abc123",
"name": "CI Pipeline Token",
"scopes": ["user:read", "user:write"],
"status": "active"
},
"secret": "csk_a3b4f8c2..."
}

The secret is only returned once — store it securely.

See the interactive API reference for the full endpoint list.

Requests are rate-limited per organization based on the plan:

PlanAPI RequestsJob Submissions
Free60/min10/min
Pro600/min100/min
Enterprise6,000/min1,000/min

Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or agent run endpoints. Everything else counts as an API request.

When rate-limited, the API returns 429 Too Many Requests:

{
"error": {
"code": "rate_limit",
"message": "Rate limit exceeded",
"type": "rate_limit_error"
}
}

All errors use a consistent envelope:

{
"error": {
"code": "not_found",
"message": "Job not found",
"type": "not_found_error"
}
}

OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.