Skip to main content

PII Detection Functions

Detect and mask Personally Identifiable Information (PII) in text data.

Quick Reference

FunctionDescriptionSQL Signature
anofox_tab_pii_detectDetect all PII(text) -> VARCHAR (JSON)
anofox_tab_pii_maskMask PII in text(text, strategy) -> VARCHAR
anofox_tab_pii_containsCheck for any PII(text) -> BOOLEAN
anofox_tab_pii_countCount PII matches(text) -> BIGINT
anofox_tab_pii_scan_tableScan table for PII(table, columns) -> TABLE
anofox_tab_pii_audit_tableRow-level audit(table, columns) -> TABLE

Supported PII Types

The module detects 17 types of PII:

Pattern-Based Detection (13 types)

TypeDescriptionExample
EMAILEmail addressesuser@example.com
PHONEPhone numbers+1-555-123-4567
CREDIT_CARDCredit card numbers4111-1111-1111-1111
US_SSNUS Social Security Numbers123-45-6789
IBANInternational Bank Account NumbersDE89370400440532013000
IP_ADDRESSIPv4 addresses192.168.1.100
URLHTTP/HTTPS URLshttps://example.com
DE_TAX_IDGerman Tax ID12345678901
MAC_ADDRESSNetwork MAC addresses00:1A:2B:3C:4D:5E
UK_NINOUK National Insurance NumberAB123456C
US_PASSPORTUS Passport numbersA12345678
API_KEYAPI keys (AWS, GitHub)AKIAIOSFODNN7EXAMPLE
CRYPTO_ADDRESSBitcoin/Ethereum addresses1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa

NER-Based Detection (4 types)

These require OpenVINO and are available on Linux x64 and macOS only:

TypeDescriptionExample
NAMEPerson namesJohn Smith
ORGANIZATIONCompany/org namesMicrosoft, Google Inc
LOCATIONGeographic locationsParis, New York
MISCMiscellaneous entitiesFrench, Nobel Prize

Masking Strategies

StrategyDescriptionExample Output
redactReplace with type label (default)[EMAIL], [US_SSN]
partialShow partial valuete**@example.com, ***-**-6789
asteriskReplace with asterisks****************
hashReplace with SHA-256 hasha1b2c3d4e5f6...

Core Detection Functions (4)

anofox_tab_pii_detect

Detect all PII in text and return matches as JSON.

Parameters

ParameterTypeRequiredDefaultDescription
textVARCHARYes-Text to scan for PII

Returns: JSON array of detected PII with type, text, position, and confidence

Example

SELECT anofox_tab_pii_detect('Contact: john.doe@example.com, SSN: 123-45-6789');
-- Returns: [
-- {"type":"EMAIL","text":"john.doe@example.com","start":9,"end":29,"confidence":1.00},
-- {"type":"US_SSN","text":"123-45-6789","start":36,"end":47,"confidence":1.00}
-- ]

anofox_tab_pii_mask

Mask all detected PII using specified strategy.

Parameters

ParameterTypeRequiredDefaultDescription
textVARCHARYes-Text containing PII
strategyVARCHARNo'redact'Masking strategy

Returns: Text with PII masked

Example

-- Redact (default)
SELECT anofox_tab_pii_mask('Email: test@example.com');
-- Output: 'Email: [EMAIL]'

-- Partial masking
SELECT anofox_tab_pii_mask('SSN: 123-45-6789', 'partial');
-- Output: 'SSN: ***-**-6789'

-- Asterisk masking
SELECT anofox_tab_pii_mask('Card: 4111111111111111', 'asterisk');
-- Output: 'Card: ****************'

anofox_tab_pii_contains

Check if text contains any PII.

SELECT anofox_tab_pii_contains('Contact us at support@company.com');
-- Output: true

SELECT anofox_tab_pii_contains('Hello, world!');
-- Output: false

anofox_tab_pii_count

Count PII occurrences in text.

SELECT anofox_tab_pii_count('Email: a@b.com, SSN: 123-45-6789, Card: 4111111111111111');
-- Output: 3

Table Scanning Functions (3)

anofox_tab_pii_scan_table

Scan entire table for PII across VARCHAR columns.

Parameters

ParameterTypeRequiredDefaultDescription
table_nameVARCHARYes-Table to scan
columnsVARCHARNoNULLComma-separated columns (all VARCHAR if omitted)

Returns: Table with column_name, pii_type, match_count, sample_values, confidence

Example

-- Scan all columns
SELECT * FROM anofox_tab_pii_scan_table('users');

-- Scan specific columns
SELECT * FROM anofox_tab_pii_scan_table('users', 'email,phone');

-- Find high-risk columns
SELECT column_name, COUNT(DISTINCT pii_type) as pii_types
FROM anofox_tab_pii_scan_table('customer_data')
GROUP BY column_name
HAVING COUNT(DISTINCT pii_type) >= 2;

anofox_tab_pii_audit_table

Row-level PII audit with masking for detailed inspection.

Parameters

ParameterTypeRequiredDefaultDescription
table_nameVARCHARYes-Table to audit
columnsVARCHARNoNULLComma-separated columns

Returns: Table with row_id, column_name, pii_type, original_value, masked_value, positions, confidence

Example

SELECT * FROM anofox_tab_pii_audit_table('customer_data')
WHERE pii_type = 'US_SSN';

anofox_tab_pii_status

List all registered PII recognizers.

SELECT * FROM anofox_tab_pii_status() ORDER BY pii_type;
-- Shows all 17 PII types with their recognizer info

Type-Specific Detection Functions (6)

For targeted scans when you only need specific PII types:

FunctionReturns
anofox_tab_pii_detect_emails(text)List of EMAIL matches
anofox_tab_pii_detect_phones(text)List of PHONE matches
anofox_tab_pii_detect_credit_cards(text)List of CREDIT_CARD matches
anofox_tab_pii_detect_ssns(text)List of US_SSN matches
anofox_tab_pii_detect_names(text)List of NAME matches (NER-based)
anofox_tab_pii_detect_ibans(text)List of IBAN matches

Example

SELECT anofox_tab_pii_detect_emails('Contact: john@example.com and jane@example.com');
-- Returns only EMAIL matches

Validation Functions (6)

Validate specific PII formats with checksum verification:

FunctionDescription
anofox_tab_pii_is_valid_ssn(text)Validate US SSN format
anofox_tab_pii_is_valid_iban(text)Validate IBAN with MOD-97 checksum
anofox_tab_pii_is_valid_credit_card(text)Validate with Luhn algorithm
anofox_tab_pii_is_valid_nino(text)Validate UK National Insurance
anofox_tab_pii_is_valid_de_tax_id(text)Validate German Tax ID
anofox_tab_pii_is_valid_crypto_address(text)Validate Bitcoin/Ethereum

Example

SELECT anofox_tab_pii_is_valid_credit_card('4111111111111111');  -- true (Luhn valid)
SELECT anofox_tab_pii_is_valid_iban('DE89370400440532013000'); -- true (MOD-97 valid)

Advanced Masking Functions (2)

anofox_tab_pii_mask_column

Type-specific masking for a single PII type.

SELECT anofox_tab_pii_mask_column('123-45-6789', 'US_SSN', 'partial');
-- Returns: ***-**-6789

anofox_tab_pii_redact_column

Mask all PII in text with optional strategy.

SELECT anofox_tab_pii_redact_column('Email: test@example.com');
-- Returns: Email: [EMAIL]

Practical Patterns

Data Privacy Audit

-- Find all tables with PII exposure
WITH pii_summary AS (
SELECT
'customers' as table_name,
column_name,
pii_type,
match_count
FROM anofox_tab_pii_scan_table('customers')
UNION ALL
SELECT
'orders' as table_name,
column_name,
pii_type,
match_count
FROM anofox_tab_pii_scan_table('orders')
)
SELECT
table_name,
COUNT(DISTINCT pii_type) as pii_types,
SUM(match_count) as total_matches
FROM pii_summary
GROUP BY table_name
ORDER BY pii_types DESC;

GDPR Compliance Check

-- Identify columns needing anonymization
SELECT
column_name,
pii_type,
match_count,
CASE
WHEN pii_type IN ('US_SSN', 'CREDIT_CARD', 'IBAN') THEN 'HIGH'
WHEN pii_type IN ('EMAIL', 'PHONE', 'NAME') THEN 'MEDIUM'
ELSE 'LOW'
END as risk_level
FROM anofox_tab_pii_scan_table('customer_data')
ORDER BY
CASE risk_level
WHEN 'HIGH' THEN 1
WHEN 'MEDIUM' THEN 2
ELSE 3
END;

Pre-Migration PII Discovery

-- Before migrating data, identify all PII
SELECT
column_name,
pii_type,
sample_values[1] as example,
match_count
FROM anofox_tab_pii_scan_table('legacy_data')
WHERE match_count > 0
ORDER BY match_count DESC;

Configuration Options

OptionTypeDefaultDescription
anofox_pii_min_confidenceDOUBLE0.5Minimum confidence threshold
anofox_pii_default_mask_strategyVARCHAR'redact'Default masking strategy
anofox_pii_enabled_typesVARCHAR''Comma-separated enabled types (empty = all)
anofox_pii_deep_validationBOOLEANfalseEnable deep email/phone validation
SET anofox_pii_min_confidence = 0.7;
SET anofox_pii_default_mask_strategy = 'partial';

Platform Availability

PlatformPattern-based (13 types)NER-based (4 types)
Linux x64 (glibc)YesYes
macOS x64YesYes
macOS ARM64YesYes
Windows x64YesNo (dictionary fallback for NAME)
Linux ARM64YesNo
Linux musl (Alpine)YesNo

Performance Characteristics

FunctionSpeedNotes
Pattern detection~1ms/KBFast regex matching
NER detection~10ms/KBML model inference
Table scan~100ms/1K rowsDepends on text length
Validation<1msChecksum calculations

🍪 Cookie Settings