Skip to content
tutorial

Redact PII from AI agent logs before it hits your database

| 7 min read
Padlock representing PII redaction and data protection in AI agent logs
Photo by FLY:D on Unsplash

Your AI agent logs the prompt. The tool call input. The tool result. The final response. That used to be a gold-mine of debugging data. Today it is a GDPR disclosure waiting for a support ticket to carry an SSN past your validation.

You cannot stop users from pasting sensitive data. You can stop the raw text from reaching your log store, your observability vendor, and your weekly eval export. A small middleware does it in one hop.

The leak most agent setups already have

// before: every raw prompt, tool call, and tool result lands in the log row
logger.info({
  event: 'agent.turn',
  prompt: userInput,
  tool_calls: toolCalls,
  tool_results: toolResults,
});

// one support ticket later: "My SSN is 123-45-6789 and card 4111 1111 1111 1111"
// sits in the logs, the observability vendor, and the weekly eval export.

Every row now carries a card number. The log store keeps it for 30 days. The observability SDK ships it to a third party. The eval export picks up the same string two days later. Five copies of one card; all of them out of scope for your encryption-at-rest policy.

Detect PII with one call

The Botoi /v1/pii/detect endpoint scans text for emails, phone numbers, SSNs, credit cards (Luhn-validated), IP addresses, and dates of birth. It returns each finding with a start offset, an end offset, and a masked value you can drop in place.

Request

curl -X POST https://api.botoi.com/v1/pii/detect \
  -H "Content-Type: application/json" \
  -d '{"text": "Reach me at alice@example.com or 555-123-4567. Card: 4111 1111 1111 1111."}'

Response

{
  "found": true,
  "count": 3,
  "findings": [
    { "type": "email",       "value": "alice@example.com",      "start": 12, "end": 29, "masked": "al***@example.com" },
    { "type": "phone",       "value": "555-123-4567",           "start": 33, "end": 45, "masked": "***-***-4567" },
    { "type": "credit_card", "value": "4111 1111 1111 1111",    "start": 53, "end": 72, "masked": "************1111" }
  ]
}

Three matches, three masked replacements, positions you can splice cleanly. No regex library to maintain, no SSN prefix table to keep current, no Luhn pass to write yourself.

A log middleware that redacts before write

The right place for this is the last hop before the row leaves your process. Every upstream component still sees the raw text it needs; the persisted copy is sanitized.

// log-redact.ts
import type { LogRecord } from './types';

const PII_FIELDS = ['prompt', 'tool_calls', 'tool_results', 'output'] as const;

export async function redactPii(record: LogRecord): Promise {
  const clone = structuredClone(record);
  for (const field of PII_FIELDS) {
    const value = clone[field];
    if (!value) continue;
    clone[field] = await scrub(JSON.stringify(value));
  }
  return clone;
}

async function scrub(text: string): Promise {
  const res = await fetch('https://api.botoi.com/v1/pii/detect', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.BOTOI_API_KEY}`,
    },
    body: JSON.stringify({ text }),
  });
  const data = await res.json();
  if (!data.found) return text;

  // Replace from the end of the string so offsets stay valid.
  const sorted = [...data.findings].sort((a, b) => b.start - a.start);
  let scrubbed = text;
  for (const f of sorted) {
    scrubbed = scrubbed.slice(0, f.start) + f.masked + scrubbed.slice(f.end);
  }
  return scrubbed;
}

Three details matter in that snippet. First, it walks known PII-heavy fields rather than the whole record; you do not need to scrub the request ID. Second, it serializes each field to a single string before sending to the API, so one call covers an entire tool result. Third, it splices replacements from the end of the string so offsets do not shift under it.

Wire it into the logger

// logger.ts
import { redactPii } from './log-redact';

export async function logTurn(raw: LogRecord) {
  const safe = await redactPii(raw);
  await logStore.write(safe);
}

// anywhere in your agent loop:
await logTurn({
  event: 'agent.turn',
  prompt: userInput,
  tool_calls: toolCalls,
  tool_results: toolResults,
});

Call logTurn in place of your existing logger.info at the turn boundary. Everything upstream stays the same.

Fail closed, not fail silent

The detect endpoint usually answers in under 20ms. When it times out, you still have a choice: log the row raw (leak risk) or drop the sensitive fields and log a marker. Dropping is the safer default for compliance-sensitive workloads.

async function redactPiiSafe(record: LogRecord): Promise {
  try {
    return await Promise.race([
      redactPii(record),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error('pii-detect timeout')), 250)
      ),
    ]);
  } catch (err) {
    // Fail closed: drop the sensitive fields rather than logging them raw.
    return { ...record, prompt: '[REDACT_FAILED]', tool_calls: [], tool_results: [] };
  }
}

Set the timeout to something small. 250ms is enough to absorb a regional slowdown without blocking a healthy request path.

Python version

# log_redact.py
import os, json, httpx

PII_FIELDS = ('prompt', 'tool_calls', 'tool_results', 'output')
API = 'https://api.botoi.com/v1/pii/detect'

async def scrub(text: str) -> str:
    async with httpx.AsyncClient(timeout=0.25) as client:
        r = await client.post(
            API,
            headers={'Authorization': f"Bearer {os.environ['BOTOI_API_KEY']}"},
            json={'text': text},
        )
    data = r.json()
    if not data.get('found'):
        return text
    out = text
    for f in sorted(data['findings'], key=lambda x: x['start'], reverse=True):
        out = out[:f['start']] + f['masked'] + out[f['end']:]
    return out

Drop scrub into your agent framework's log hook. FastAPI middleware, LangChain callbacks, and OpenInference span exporters all accept async functions.

What this middleware does not do

  • It does not catch names, addresses, or account numbers that do not look like any supported type. Those need a named-entity model and a policy decision (mask? drop? redact the whole turn?).
  • It does not protect your model vendor from seeing the raw prompt. For that, run the same detect call on the client before you send to the model.
  • It does not replace a data-retention policy. Shorten the log TTL anyway.

Two places this belongs

Layer Protects Call timing
Before model request Model vendor, training data, eval leaks Blocking, user-visible latency
Before log write Log store, observability vendor, exports Out-of-band, invisible to the user

Ship the log-write middleware first. It runs outside the hot path and blocks the most common leak pattern. Add the pre-model version once the log side is covered.

Get an API key and start

Anonymous access gives you 5 requests per minute, enough to try the endpoint against a sample log. For production middleware, grab a free key at botoi.com/api/signup. The free tier covers 1,000 scrub calls per day with no credit card.

See the full endpoint reference at the PII Detect API page or browse api.botoi.com/docs for the other 149 endpoints.

Frequently asked questions

Why do AI agent logs leak more PII than normal server logs?
Agents log the entire prompt, every tool call input, and every tool output. A support transcript that once lived behind a "do not log" flag now appears in five places: the orchestrator, the tool server, the observability vendor, the model provider, and the training eval set.
Where should the redaction step run?
Run it at the log-writer boundary, right before the row is sent to your log store. That way every upstream component (orchestrator, tool, observability SDK) sees the raw text it needs, and only the persisted copy is sanitized.
Does a regex redactor catch everything?
No. Roll-your-own regex misses credit card numbers with unusual spacing, SSNs that look like other 9-digit numbers, and names of people. An API like /v1/pii/detect runs Luhn on cards, filters SSN prefixes, and returns positions so you can drop only the match, not the whole line.
What latency does the Botoi PII Detect API add?
The endpoint runs at the edge and returns in under 20ms for a 500-token payload. You can call it synchronously in a log middleware without affecting user-visible response times; logging happens after the response is sent.
Can I redact on the client before sending to the model?
Yes, and it is a good second layer. Redacting in the server middleware protects your log store; redacting in the client protects your model vendor from seeing raw PII. Both together are the GDPR-friendly setup.

Try this API

PII Detect API — interactive playground and code examples

More tutorial posts

Start building with botoi

150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.