Skip to content

Task Types

Task types define the input/output contract for jobs and agent nodes. Each model supports one or more task types — see the Models reference for which models support which tasks.

Task TypeCategorySyncDescription
openai/chat-completionTextYesChat completion (LLM)
openai/chat-completion/visionTextYesChat completion with image input
openai/chat-completion/ocrOCRYesOCR via vision model
openai/embeddingsTextYesText embeddings
openai/rerankTextYesDocument reranking
openai/scoreTextYesText pair similarity scoring
openai/audio-speechAudioYesText-to-speech
openai/audio-transcriptionAudioYesSpeech-to-text
openai/image-generationImageYesImage generation (OpenAI format)
fal/text-to-imageImageYesImage generation (Fal format)
fal/image-editImageYesImage editing / inpainting
fal/text-to-videoVideoNoText-to-video generation
fal/image-to-videoVideoNoImage-to-video animation
fal/speech-to-videoVideoNoSpeech-driven video (talking head)
fal/audio-transcriptionAudioYesSpeech-to-text (Fal format)
fal/video-interpolateVideoNoFrame interpolation (slow motion) — coming soon
fal/video-upscaleVideoNoVideo super-resolution — coming soon
fal/ocrOCRYesDocument OCR (Fal format)
openai/chat-completion/moderationModerationYesContent moderation via chat model

Sync = can be dispatched synchronously (result returned inline). Async tasks return a request_id for polling.

Standard chat completion following the OpenAI API format.

FieldTypeRequiredDescription
messagesarrayYesArray of {role, content} message objects
modelstringNoModel ID (set automatically when using a specific endpoint)
temperaturenumberNoSampling temperature (0-2)
top_pnumberNoNucleus sampling
max_tokensnumberNoMaximum tokens to generate
streambooleanNoEnable streaming response
stopstring/arrayNoStop sequences
toolsarrayNoTool/function definitions

Output: choices[0].message.content (text)

Models: qwen3-0.6b-fp8, qwen3.5-4b, qwen3.5-9b, gpt-oss-20b, deepseek-ocr-v1, deepseek-ocr-v2

Chat completion with image input. Same parameters as openai/chat-completion, but messages can include image content:

{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
}

Models: qwen3.5-4b, qwen3.5-9b

OCR via a vision-capable model. Uses the same message format as vision, optimized for text extraction from images.

Models: deepseek-ocr-v1, deepseek-ocr-v2

Generate vector embeddings for text.

FieldTypeRequiredDescription
inputstring/arrayYesText or array of texts to embed
encoding_formatstringNofloat or base64
dimensionsnumberNoOutput dimensions

Output: data[0].embedding (float array)

Models: sglang-qwen3-0.6b-fp8-embed, sglang-gpt-oss-20b-embed

Rerank documents by relevance to a query.

FieldTypeRequiredDescription
modelstringYesModel ID
querystringYesSearch query
documentsarrayYesArray of strings or {text} objects
top_nnumberNoNumber of top results to return
return_documentsbooleanNoInclude document text in results

Output: Ranked documents with relevance scores

Models: qwen3-0.6b-fp8-score, gpt-oss-20b-score

Compute similarity scores between text pairs.

FieldTypeRequiredDescription
modelstringYesModel ID
text_1string/arrayYesFirst text(s)
text_2string/arrayYesSecond text(s)

Output: Similarity scores

Models: qwen3-0.6b-fp8-score, gpt-oss-20b-score

Convert text to spoken audio.

FieldTypeRequiredDescription
inputstringYesText to speak
voicestringNoVoice ID (model-specific)
response_formatstringNomp3, opus, aac, flac, wav, pcm
speednumberNoPlayback speed multiplier

Output: audio_url (audio file URL)

Models: qwen3-tts (9 voices), fox-tts (150+ voices)

Transcribe audio to text.

FieldTypeRequiredDescription
audio_urlstringYesURL to the audio file
languagestringNoLanguage code (ISO 639-1)
taskstringNotranscribe or translate

Also supports multipart file upload with a file field.

Output: text (transcribed text)

Models: whisper-large-v3-turbo

Same as openai/audio-transcription but using the Fal request format.

FieldTypeRequiredDescription
audio_urlstringYesURL to the audio file
languagestringNoLanguage code
taskstringNotranscribe or translate

Output: text

Models: whisper-large-v3-turbo

Generate images using the OpenAI-compatible format.

FieldTypeRequiredDescription
promptstringYesImage description
nnumberNoNumber of images
sizestringNoImage dimensions
qualitystringNostandard or hd
response_formatstringNourl or b64_json

Output: data[0].url (image URL)

Models: qwen-image-2512

Generate images using the Fal format. More parameters than the OpenAI format.

FieldTypeRequiredDescription
promptstringYesImage description
negative_promptstringNoWhat to avoid
image_sizestringNosquare_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence strength
num_imagesnumberNoNumber of images (max 10)
seednumberNoReproducibility seed
lorasarrayNoLoRA configs [{path, scale}] (scale 0-4)
enable_safety_checkerbooleanNoContent safety filter
output_formatstringNoOutput image format

Output: images[0].url (image URL)

Models: nunchaku-flux1-schnell, sglang-diffusion-flux2-klein-4b, qwen-image-2512, sglang-diffusion-qwen-image-2512-fp8

Edit or inpaint images.

FieldTypeRequiredDescription
promptstringYesEdit instruction
image_urlstringYesSource image URL
mask_urlstringNoMask for inpainting
strengthnumberNoEdit strength (0-1)
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_imagesnumberNoNumber of results (max 10)
seednumberNoReproducibility seed
lorasarrayNoLoRA configs

Output: images[0].url (image URL)

Models: nunchaku-flux1-schnell, sglang-diffusion-flux2-klein-4b, qwen-image-edit-2511, sglang-diffusion-qwen-image-edit-2511-fp8

All video tasks are async only — they return a request_id for polling.

Generate video from a text prompt.

FieldTypeRequiredDescription
promptstringYesVideo description
resolutionstringNo480p, 720p
aspect_ratiostringNo16:9, 9:16, 1:1
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_framesnumberNoNumber of frames
fpsnumberNoFrames per second
seednumberNoReproducibility seed
output_formatstringNoVideo format

Output: video.url (video URL)

Models: wan22-ti2v, sglang-diffusion-wan22-t2v-a14b-fp8, ltx2-distilled

Animate a still image into video.

FieldTypeRequiredDescription
promptstringYesMotion description
image_urlstringYesSource image URL
resolutionstringNo480p, 720p
aspect_ratiostringNo16:9, 9:16, 1:1
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_framesnumberNoNumber of frames
fpsnumberNoFrames per second
seednumberNoReproducibility seed

Output: video.url (video URL)

Models: wan22-ti2v, ltx2-distilled

Generate talking-head video driven by audio.

FieldTypeRequiredDescription
audio_urlstringYesAudio file URL
image_urlstringYesFace/character image URL
promptstringNoAdditional scene description
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
seednumberNoReproducibility seed

Output: video.url (video URL)

Models: wan22-s2v

Increase video frame rate (slow motion effect). No model is currently enabled for this task.

Upscale video resolution. This task is currently disabled while the model is being stabilized.

Task types are used as the task field in agent DAG nodes. Each node specifies a task type and a model, and can wire outputs from upstream nodes into its inputs.

{
"nodes": [
{
"id": "generate",
"task": "fal/text-to-image",
"model": "nunchaku-flux1-schnell",
"payload": {
"prompt": "{{input.prompt}}"
}
}
]
}

See the Agents guide for details on building multi-step pipelines.