API Reference

Overview of the Casola API. For the full interactive reference with request/response schemas, see the API reference (served from /reference on the API).

Base URL

https://api.casola.ai

Region-pinned API hosts route all requests from that host directly to the specified macro region, without needing the X-Region header:

Host	Macro region
`api.us.casola.ai`	US only
`api.eu.casola.ai`	EU only
`api.ap.casola.ai`	AP only

Use region-pinned hosts when you need hard data-residency guarantees and want to enforce the region at the network layer rather than per-request.

Authentication

All API requests require a Bearer token in the Authorization header:

Authorization: Bearer csk_your_token_here

Tokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.

OpenAI-Compatible Endpoints

These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.

Method	Path	Description
`POST`	`/openai/v1/chat/completions`	Chat completion (streaming supported)
`POST`	`/openai/v1/responses`	Responses API with hosted `web_search` and function tool support
`POST`	`/openai/v1/completions`	Text completions (native on supported models; external fallback otherwise)
`POST`	`/openai/v1/embeddings`	Text embeddings
`POST`	`/openai/v1/audio/speech`	Text-to-speech
`POST`	`/openai/v1/audio/transcriptions`	Speech-to-text (file upload or URL)
`POST`	`/openai/v1/audio/translations`	Speech translation to English
`POST`	`/openai/v1/images/generations`	Image generation
`POST`	`/openai/v1/images/edits`	Image editing / inpainting
`POST`	`/openai/v1/rerank`	Document reranking
`POST`	`/openai/v1/score`	Text pair scoring
`POST`	`/openai/v1/pii`	PII detection and redaction
`GET`	`/openai/v1/models`	List available models
`GET`	`/openai/v1/models/:model`	Get a single model
`POST`	`/openai/v1/files`	Upload a file
`GET`	`/openai/v1/files`	List files
`GET`	`/openai/v1/files/:file_id`	Get file metadata
`DELETE`	`/openai/v1/files/:file_id`	Delete a file
`GET`	`/openai/v1/files/:file_id/content`	Download file content
`POST`	`/openai/v1/batches`	Create a batch job
`GET`	`/openai/v1/batches`	List batches
`GET`	`/openai/v1/batches/:batch_id`	Get batch status
`POST`	`/openai/v1/batches/:batch_id/cancel`	Cancel a batch

Example: Chat Completion

curl https://api.casola.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-4B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "Qwen/Qwen3.5-4B",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello! How can I help you today?"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}
}

Responses Hosted Tools

POST /openai/v1/responses supports the OpenAI Responses API request shape. The hosted web_search tool and client-managed function tools can be mixed freely in the same request.

tools: [{ "type": "web_search" }] is executed server-side before the model responds.
tools: [{ "type": "function", "function": { ... } }] are forwarded to the model as client-managed tools.
Hosted and function tools can appear together in the same tools array.
stream: true is supported.
async: true is not supported when hosted tools are present.

Example: Text-to-Speech

curl https://api.casola.ai/openai/v1/audio/speech \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-TTS",
    "input": "Hello, welcome to Casola!",
    "voice": "Vivian",
    "response_format": "mp3"
  }' \
  --output speech.mp3

The response is the binary audio file directly.

Example: Speech-to-Text

curl https://api.casola.ai/openai/v1/audio/transcriptions \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -F model="whisper-large-v3-turbo" \
  -F file=@recording.mp3 \
  -F response_format="verbose_json"

Response:

{
  "task": "transcribe",
  "language": "en",
  "duration": 45.2,
  "text": "Hello, this is a sample recording...",
  "segments": [
    {"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."}
  ]
}

Example: Embeddings

curl https://api.casola.ai/openai/v1/embeddings \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sglang-qwen3-0.6b-fp8-embed",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response:

{
  "object": "list",
  "data": [
    {"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]}
  ],
  "model": "sglang-qwen3-0.6b-fp8-embed",
  "usage": {"prompt_tokens": 10, "total_tokens": 10}
}

Example: Image Generation

curl https://api.casola.ai/openai/v1/images/generations \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "black-forest-labs/FLUX.1-schnell",
    "prompt": "a neon-lit alley in the rain",
    "size": "1024x1024"
  }'

Response:

{
  "created": 1711234567,
  "data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]
}

Example: Document Reranking

curl https://api.casola.ai/openai/v1/rerank \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-0.6b-fp8-score",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of artificial intelligence.",
      "The weather today is sunny.",
      "Deep learning uses neural networks with many layers."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Response:

{
  "object": "list",
  "data": [
    {"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."},
    {"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."}
  ]
}

Example: List Models

curl https://api.casola.ai/openai/v1/models \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
  "object": "list",
  "data": [
    {"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"},
    {"id": "black-forest-labs/FLUX.1-schnell", "object": "model", "created": 0, "owned_by": "casola"}
  ]
}

Fal-Compatible Endpoints

Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.

Method	Path	Description
`POST`	`/fal/{slug}`	Submit a job (sync or async)
`GET`	`/fal/requests/{requestId}`	Poll job status / get result
`POST`	`/fal/requests/batch`	Batch status query (max 50 IDs)

Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.

Sync vs Async

Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.

curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat in space", "sync_mode": true}'

Async mode (default): Returns immediately with a request_id. Poll for the result.

# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over the ocean"}'

# Poll
curl https://api.casola.ai/fal/requests/{requestId} \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Async Job Flow

For long-running tasks (video generation, batches), use the async job API:

POST /api/jobs          →  { id, queue_id }           # 1. Create job
GET  /api/jobs/{id}     →  { status, result, ... }    # 2. Poll until completed
POST /api/jobs/{id}/cancel                             # 3. Cancel if needed

Job Statuses

Status	Meaning
`pending`	Queued, waiting for a worker
`running`	Worker is executing the job
`completed`	Result is ready
`failed`	Job failed (check `error` field)
`cancelled`	Cancelled by the user
`dead_letter`	Exhausted retries; permanently failed

Fal-compatible endpoints (/fal/requests/{id}) map these to uppercase: IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED.

Core API Endpoints

Method	Path	Description	Scope
`GET`	`/api/catalog`	Public model catalog	`user:read`
`GET`	`/api/pricing`	Public pricing information	none
`GET`	`/api/model-status`	Model availability and status	`user:read`
`GET`	`/api/voice/models`	List voice models with available voices	`user:read`
`POST`	`/api/render`	Render endpoint	`user:write`
`GET`	`/api/search`	Search models and content	`user:read`
`POST`	`/api/agents`	Create an agent	`user:write`
`POST`	`/api/agents/{id}/run`	Run an agent	`user:write`
`POST`	`/api/prompt-rewrite`	AI-assisted prompt enhancement	`user:write`
`GET`	`/api/organizations/{orgId}/usage`	Usage aggregates	`admin:read`
`GET`	`/api/organizations/{orgId}/usage/by-token`	Usage breakdown by token	`admin:read`
`GET`	`/api/organizations/{orgId}/billing/rates`	Billing rates for the organization	`admin:read`
`POST`	`/api/organizations/{orgId}/tokens`	Create an API token	`admin:write`
`GET`	`/api/organizations/{orgId}/content-filter-policy`	Get content filter policy	`admin:read`
`POST`	`/api/organizations/{orgId}/content-filter-policy`	Create content filter policy (enterprise only)	`admin:write`
`PATCH`	`/api/organizations/{orgId}/content-filter-policy`	Update content filter policy (enterprise only)	`admin:write`
`DELETE`	`/api/organizations/{orgId}/content-filter-policy`	Delete content filter policy	`admin:write`

Content filter policy actions: block (reject on violation), flag (allow + tag), log (allow + audit), off (skip classification entirely — enterprise only, no GPU cost, no audit rows). For enterprise organizations, platform/provider minimum floors may be bypassed regardless of action. | POST | /api/library/items/{itemId}/share | Create a share link for a library item | user:write | | GET | /api/library/items/{itemId}/share | Get share link for a library item | user:read | | DELETE | /api/library/items/{itemId}/share | Delete a share link | user:write | | GET | /api/shares/{token} | Fetch a shared item by token | none |

Example: Model Status

curl https://api.casola.ai/api/model-status \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
  "models": [
    {
      "model_id": "Qwen/Qwen3.5-4B",
      "spec_id": "spec_abc",
      "enabled": true,
      "tasks": ["openai/chat-completion"]
    }
  ]
}

Example: Create an Agent

curl -X POST https://api.casola.ai/api/agents \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Image Generator",
    "dag": {
      "nodes": {
        "gen": {
          "model_id": "black-forest-labs/FLUX.1-schnell",
          "task": "fal/text-to-image",
          "inputs": {"prompt": "${input.prompt}"},
          "outputs": ["images[0].url"]
        }
      },
      "edges": []
    }
  }'

Example: Run an Agent

curl -X POST https://api.casola.ai/api/agents/agent_abc123/run \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input_params": {"prompt": "a sunset over the ocean"}}'

Response:

{
  "run": {
    "id": "run_def456",
    "agent_id": "agent_abc123",
    "status": "pending",
    "input_params": {"prompt": "a sunset over the ocean"},
    "created_at": 1711234567
  }
}

Example: Create an Agent with a Map Node

Map nodes fan out a task across each element of an array. The items field is a template expression that resolves to an array at runtime. Each sub-job receives one element as item, referenced via ${item} or ${item.field} in the inner node’s payload.

curl -X POST https://api.casola.ai/api/agents \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Batch Image Generator",
    "dag": {
      "nodes": {
        "generate": {
          "type": "map",
          "items": "${input.prompts}",
          "node": {
            "model_id": "black-forest-labs/FLUX.1-schnell",
            "task": "fal/text-to-image",
            "payload": {
              "prompt": "${item}"
            }
          },
          "tolerance": 0
        }
      },
      "edges": []
    }
  }'

The MapNodeSchema shape:

{
  "type": "map",
  "items": "<template expression resolving to an array>",
  "node": {
    "model_id": "<model ID>",
    "task": "<task type>",
    "payload": { "...": "supports ${item} and ${item.field} interpolation" }
  },
  "max_concurrency": 4,
  "tolerance": 0
}

Field	Type	Required	Description
`type`	`"map"`	yes	Identifies this as a map node
`items`	`string`	yes	Template expression resolving to an array (e.g. `${input.prompts}`)
`node.model_id`	`string`	yes	Model to use for each sub-job
`node.task`	`string`	yes	Task type for each sub-job
`node.payload`	`object`	yes	Payload template — `${item}` references the current element
`max_concurrency`	`number`	no	Max parallel sub-jobs
`tolerance`	`number`	no	Fraction of sub-jobs allowed to fail (0–1, default 0)

Run the agent by passing an array in input_params:

curl -X POST https://api.casola.ai/api/agents/agent_batch123/run \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_params": {
      "prompts": [
        "a neon cityscape at night",
        "a watercolor forest in autumn",
        "a pencil sketch of a mountain lake"
      ]
    }
  }'

The $each interpolation syntax can also be used inside any node payload to expand an array inline:

{
  "type": "task",
  "model_id": "Qwen/Qwen3.5-4B",
  "task": "openai/chat-completion",
  "payload": {
    "messages": [
      {"role": "system", "content": "Summarize the following items."},
      {
        "$each": "${nodes.generate.result.items}",
        "template": {
          "role": "user",
          "content": "Item: ${item.url}"
        }
      }
    ]
  }
}

Example: Create an API Token

curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "CI Pipeline Token",
    "scopes": ["user:read", "user:write"]
  }'

Response:

{
  "token": {
    "id": "tok_abc123",
    "name": "CI Pipeline Token",
    "scopes": ["user:read", "user:write"],
    "status": "active"
  },
  "secret": "csk_a3b4f8c2..."
}

The secret is only returned once — store it securely.

See the interactive API reference for the full endpoint list.

Rate Limiting

Requests are rate-limited per organization based on the plan:

Plan	API Requests	Job Submissions
Free	60/min	10/min
Pro	600/min	100/min
Enterprise	6,000/min	1,000/min

Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or agent run endpoints. Everything else counts as an API request.

When rate-limited, the API returns 429 Too Many Requests:

{
  "error": {
    "code": "rate_limit",
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}

Error Format

All errors use a consistent envelope:

{
  "error": {
    "code": "not_found",
    "message": "Job not found",
    "type": "not_found_error"
  }
}

OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.