Skip to content

PII Detection

The PII detection endpoint identifies personally identifiable information (PII) in text and returns structured spans with a ready-to-use redacted string. It is built on OpenAI Privacy Filter — a 1.5B sparse-MoE token classifier.

Detected categories: account_number, private_address, private_email, private_person, private_phone, private_url, private_date, secret

Terminal window
curl https://api.casola.ai/openai/v1/pii \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/privacy-filter",
"input": "My name is Harry Potter and my email is harry.potter@hogwarts.edu"
}'

Response:

{
"object": "pii.detection",
"model": "openai/privacy-filter",
"results": [
{
"spans": [
{ "label": "private_person", "start": 11, "end": 23, "score": 0.9987, "text": "Harry Potter" },
{ "label": "private_email", "start": 40, "end": 65, "score": 0.9994, "text": "harry.potter@hogwarts.edu" }
],
"redacted_text": "My name is [PRIVATE_PERSON] and my email is [PRIVATE_EMAIL]"
}
]
}

Pass an array of strings to process multiple texts in a single request:

{
"model": "openai/privacy-filter",
"input": [
"Call me at 555-867-5309.",
"Ship to 1 Infinite Loop, Cupertino, CA 95014."
]
}

The results array matches the order of the input array.

FieldTypeDescription
results[].spansarrayDetected PII spans (see below)
results[].redacted_textstringInput text with each span replaced by [LABEL]

Each span has:

FieldTypeDescription
labelstringPII category (e.g. private_email)
startintegerCharacter start offset (inclusive)
endintegerCharacter end offset (exclusive)
scorefloatAverage token-level confidence for this span
textstringThe matched substring

For large inputs or batch workloads, add "async": true to get a job ID you can poll:

Terminal window
curl https://api.casola.ai/openai/v1/pii \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"openai/privacy-filter","input":"...","async":true}'

Poll the returned job ID at GET /api/jobs/{id}.

  • Context limit: the underlying banded-attention model supports up to 512 tokens per call; longer inputs are silently truncated. Split very long documents into paragraphs before processing.
  • Language support: optimized for English. Cross-lingual detection is partially supported for proper names and email/URL/phone patterns.
  • Not a generative model: Privacy Filter runs as a pure token classifier — it does not generate text and has no system prompt.