تخطي إلى المحتوى
POST AI agent ready /v1/pii/detect

PII Detect API - Find and Mask Emails, SSNs, Cards in Text

Detects six PII categories in free-form text: emails, phone numbers (with length validation), US SSNs (filtering known-invalid prefixes), credit cards (13-19 digits, Luhn-validated), IPv4 addresses, and dates of birth near DOB keywords. Returns each finding with type, raw value, start/end character offsets, and a masked version.

Parameters

stringrequired

Text to scan for PII.

Code examples

curl -X POST https://api.botoi.com/v1/pii/detect \
  -H "Content-Type: application/json" \
  -d '{"text":"Reach me at alice@example.com or 555-123-4567. Card on file: 4111 1111 1111 1111."}'

When to use this API

Redact PII before sending to LLMs

Before a customer support transcript hits an LLM for summarization, run it through this endpoint and replace each finding with its masked version. Keeps emails, phone numbers, and card data out of third-party training and logs.

Audit stored content for leaked secrets

Periodically scan notes, support tickets, or chat logs stored in your DB. Alert when SSN or credit-card matches appear in places they shouldn't live.

Compliance checks for user-uploaded content

Run the endpoint on every form field or document upload. Block or flag submissions containing unmasked PII to stay inside your GDPR and CCPA data-minimization commitments.

Frequently asked questions

How are false positives controlled?
Credit cards must pass Luhn with 13-19 digits. SSNs skip known-invalid prefixes (000, 666, 9xx) and excluded group/serial zeros. Phone numbers must be 10 or 11 digits after stripping punctuation. These filters reduce noise but won't catch every edge case.
Which PII types are detected?
email, phone, ssn (US format), credit_card (Luhn-validated), ip_address (IPv4), and date_of_birth (dates within 3 words of a DOB keyword). International IDs, IBANs, and passport numbers are not detected; use /v1/validate/iban separately for IBAN validation.
Are overlapping matches reported?
No. Once a character range is claimed by one finding, overlapping matches from other patterns are skipped. Ensures each character is reported once.
Is the text stored after scanning?
No. The text is processed in memory and discarded after the response is sent. Nothing is written to disk or any persistent store.
How do I redact instead of just detect?
Walk the findings array in reverse order (so offsets stay valid), and for each finding replace text.slice(start, end) with finding.masked. The result is the redacted string ready for downstream use.

Get your API key

Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.