Scan user input for PII before storing it with one API call
A customer submits a support ticket and pastes their credit card number in the description. A user fills out a feedback form and includes their Social Security number. An internal tool logs the full request body, and now your log aggregator stores email addresses and phone numbers you never asked for.
This is the most common way companies collect PII they don't need. It's not a feature request gone wrong; it's free-text input doing what free-text input does. And under GDPR Article 5(1)(c), storing personal data you don't need violates the data minimization principle.
The fix: scan text for PII *before* it reaches your database. One API call catches emails, phone numbers, SSNs, credit card numbers, IP addresses, and dates of birth.
One API call to detect PII
Send any text to the /v1/pii/detect endpoint. The API scans it and returns every
PII match with its type, position, and a masked version.
curl -X POST https://api.botoi.com/v1/pii/detect \
-H "Content-Type: application/json" \
-d '{
"text": "My name is John Smith, call me at 555-123-4567 or email john@example.com"
}' Response:
{
"success": true,
"data": {
"found": true,
"count": 3,
"findings": [
{
"type": "email",
"value": "john@example.com",
"start": 56,
"end": 72,
"masked": "j***@e******.com"
},
{
"type": "phone",
"value": "555-123-4567",
"start": 37,
"end": 49,
"masked": "***-***-4567"
}
]
}
}
The API found two PII items in the input: an email address and a phone number. Each finding includes
the character positions (start and end) so you can replace, redact,
or flag the exact substring.
Supported PII types
Type Example match Masked output
───────────── ───────────────────────── ─────────────────────
email john@example.com j***@e******.com
phone 555-123-4567 ***-***-4567
ssn 123-45-6789 ***-**-6789
credit_card 4111111111111111 ************1111
ip_address 192.168.1.42 ***.***.***.42
date_of_birth 1990-05-15 ****-**-15
Every type returns a masked version that preserves enough context to identify the
data category without exposing the full value.
Build a pre-storage scanner
The highest-value integration point is right before you write user input to your database. This Node.js example scans support ticket fields and rejects submissions that contain PII.
import express from "express";
const app = express();
app.use(express.json());
async function detectPII(text) {
const res = await fetch("https://api.botoi.com/v1/pii/detect", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text }),
});
return res.json();
}
app.post("/support-tickets", async (req, res) => {
const { subject, body } = req.body;
// Scan both fields before saving
const subjectScan = await detectPII(subject);
const bodyScan = await detectPII(body);
if (subjectScan.data.found || bodyScan.data.found) {
const allFindings = [
...(subjectScan.data.findings || []),
...(bodyScan.data.findings || []),
];
return res.status(422).json({
error: "PII detected in submission",
findings: allFindings.map((f) => ({
type: f.type,
masked: f.masked,
})),
});
}
// Safe to store; no PII found
await saveTicket({ subject, body });
res.status(201).json({ created: true });
}); When a user submits a ticket containing PII, they get a 422 response listing what was found (using masked values, not the raw data). They can remove the sensitive information and resubmit. Your database never sees the PII.
This approach works for any form: contact forms, feedback surveys, comment systems, internal notes. Anywhere users type free text, PII can appear.
Redact before logging
Rejecting PII works for user-facing forms. But for logs, error messages, and audit trails, you want to keep the text while stripping the sensitive parts. This function replaces each PII match with its masked version.
async function redactPII(text) {
const res = await fetch("https://api.botoi.com/v1/pii/detect", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text }),
});
const { data } = await res.json();
if (!data.found) return text;
// Replace each finding with its masked version
let redacted = text;
// Process findings from end to start so positions stay valid
const sorted = [...data.findings].sort((a, b) => b.start - a.start);
for (const finding of sorted) {
redacted =
redacted.slice(0, finding.start) +
finding.masked +
redacted.slice(finding.end);
}
return redacted;
}
// Usage
const logMessage = "User john@example.com reported issue from 192.168.1.42";
const safe = await redactPII(logMessage);
console.log(safe);
// "User j***@e******.com reported issue from ***.***.***.42" The function processes findings from the end of the string backward. This keeps character positions valid as the string length changes during replacement. The result is a log-safe string where the meaning is preserved but the personal data is masked.
Drop this into your logging pipeline, error reporting middleware, or any system that captures user-generated text.
GDPR compliance: scan free-text fields automatically
GDPR's data minimization principle (Article 5(1)(c)) requires that you collect only the personal data you need for a specific purpose. Free-text fields are the biggest gap in most compliance strategies because you can't predict what users will type.
This Express middleware scans configurable fields across multiple routes:
async function gdprScanMiddleware(req, res, next) {
const fieldsToScan = ["message", "notes", "description", "comment"];
const findings = [];
for (const field of fieldsToScan) {
if (req.body[field]) {
const scan = await detectPII(req.body[field]);
if (scan.data.found) {
findings.push(
...scan.data.findings.map((f) => ({
field,
type: f.type,
masked: f.masked,
}))
);
}
}
}
if (findings.length > 0) {
return res.status(422).json({
error: "Personal data detected. Remove PII before submitting.",
findings,
});
}
next();
}
// Apply to routes that accept free-text input
app.post("/feedback", gdprScanMiddleware, feedbackHandler);
app.post("/comments", gdprScanMiddleware, commentHandler);
app.post("/contact", gdprScanMiddleware, contactHandler); Attach the middleware to any route that accepts free-text input. It scans the fields you specify, and if PII is found, the request is rejected with a clear error message before any data is stored.
This gives you an auditable control you can point to during a GDPR review: "Free-text inputs are scanned for PII at the API layer. Submissions containing personal data are rejected before storage."
Where to add PII scanning in your stack
- API middleware. Scan request bodies before they reach your business logic. Catches PII at the entry point of your system.
- Form validation. Call the API client-side or server-side before form submission. Give users a chance to remove PII themselves.
- Log pipeline. Redact PII in log messages before they reach your log aggregator. Prevents sensitive data from spreading across your infrastructure.
- Data export. Scan CSV or JSON exports before sending them to third parties. One more checkpoint before data leaves your system.
- Chat and messaging. Scan messages in internal tools or customer-facing chat before they're stored in your message history.
The API processes text in memory on Cloudflare's edge network and discards it after responding. No data is stored or logged on botoi's side. You can verify this by checking the API documentation for the endpoint's privacy guarantees.
Frequently asked questions
- What PII types does the API detect?
- The API detects six types: email addresses, phone numbers, Social Security numbers (SSN), credit card numbers, IP addresses, and dates of birth. Each finding includes the type, raw value, character position, and a masked version.
- Is the PII detection API free?
- Yes. Anonymous access is available at 5 requests per minute with IP-based rate limiting. No API key, no account, no credit card required. Paid plans offer higher rate limits.
- Does the API store or log the text I send?
- No. The API runs on Cloudflare Workers at the edge. Your text is processed in memory and discarded after the response is returned. Nothing is written to disk or logged.
- Can I use this for GDPR compliance?
- The API helps you identify PII before storage, which supports data minimization under GDPR Article 5(1)(c). It is a technical tool, not legal advice. Pair it with your organization's data protection policies and consult a legal professional for compliance questions.
- How accurate is the detection?
- The API uses pattern matching tuned for common formats (US phone numbers, standard email addresses, Luhn-valid credit card numbers, etc.). It catches the most common PII patterns. For domain-specific formats or non-US identifiers, test with your own data to confirm coverage.
Try this API
Extract Emails API — interactive playground and code examples
More tutorial posts
Start building with botoi
150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.