DeepgramBacked by Deepgram Startup ProgramLearn more

PII Redaction

PII redaction automatically detects and masks sensitive personal data in voice AI transcripts and logs before it gets stored.

Definition

PII redaction is the automatic detection and masking of sensitive personal information — such as full card numbers, CVVs, Aadhaar, PAN and passwords — from voice AI transcripts, logs and recordings before they are written to storage. Masked values are typically replaced with tokens like [CARD] or [AADHAAR], so operations teams can still audit calls without ever seeing the raw sensitive data.

What PII redaction does

Redaction is the last line of defence between a live voice conversation and the long-term log of that conversation. As the caller speaks an account number or reads a card's CVV, the redaction layer identifies the sensitive span and rewrites it to a placeholder like [CARD] or [OTP] in the stored transcript. The audio recording can be muted or beeped over the same span.

Why it matters

For any business handling payments, collections or KYC, redaction is what makes voice AI legally usable. It is required for DPDP Act data minimisation, and PCI-DSS explicitly forbids storing CVVs and full PANs in plaintext. Redacted logs also make audits and QA review safe: supervisors can listen to coaching calls without being exposed to customer secrets.

How ThinnestAI does it

ThinnestAI uses a hybrid approach. Deterministic regex catches well-structured fields — 12-digit Aadhaar, 10-character PAN, 16-digit card numbers, OTPs — with near-zero false negatives. A lightweight LLM classifier catches free-form sensitive spans that regex misses, like a spoken password. Detection runs on both the ASR output and the TTS input, so the agent also cannot accidentally speak back a full card number. Redaction rules are configurable per agent workload.

Limits and tradeoffs

Redaction is never perfect. Aggressive rules can over-redact and destroy audit value; permissive rules risk leakage. Voice adds extra difficulty because ASR can mis-hear a digit and break a regex match. Teams should tune redaction per use case and regularly sample redacted transcripts for quality.