PII Detection Functions
Detect and mask Personally Identifiable Information (PII) in text data.
Quick Reference
| Function | Description | SQL Signature |
|---|---|---|
anofox_tab_pii_detect | Detect all PII | (text) -> VARCHAR (JSON) |
anofox_tab_pii_mask | Mask PII in text | (text, strategy) -> VARCHAR |
anofox_tab_pii_contains | Check for any PII | (text) -> BOOLEAN |
anofox_tab_pii_count | Count PII matches | (text) -> BIGINT |
anofox_tab_pii_scan_table | Scan table for PII | (table, columns) -> TABLE |
anofox_tab_pii_audit_table | Row-level audit | (table, columns) -> TABLE |
Supported PII Types
The module detects 17 types of PII:
Pattern-Based Detection (13 types)
| Type | Description | Example |
|---|---|---|
EMAIL | Email addresses | user@example.com |
PHONE | Phone numbers | +1-555-123-4567 |
CREDIT_CARD | Credit card numbers | 4111-1111-1111-1111 |
US_SSN | US Social Security Numbers | 123-45-6789 |
IBAN | International Bank Account Numbers | DE89370400440532013000 |
IP_ADDRESS | IPv4 addresses | 192.168.1.100 |
URL | HTTP/HTTPS URLs | https://example.com |
DE_TAX_ID | German Tax ID | 12345678901 |
MAC_ADDRESS | Network MAC addresses | 00:1A:2B:3C:4D:5E |
UK_NINO | UK National Insurance Number | AB123456C |
US_PASSPORT | US Passport numbers | A12345678 |
API_KEY | API keys (AWS, GitHub) | AKIAIOSFODNN7EXAMPLE |
CRYPTO_ADDRESS | Bitcoin/Ethereum addresses | 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa |
NER-Based Detection (4 types)
These require OpenVINO and are available on Linux x64 and macOS only:
| Type | Description | Example |
|---|---|---|
NAME | Person names | John Smith |
ORGANIZATION | Company/org names | Microsoft, Google Inc |
LOCATION | Geographic locations | Paris, New York |
MISC | Miscellaneous entities | French, Nobel Prize |
Masking Strategies
| Strategy | Description | Example Output |
|---|---|---|
redact | Replace with type label (default) | [EMAIL], [US_SSN] |
partial | Show partial value | te**@example.com, ***-**-6789 |
asterisk | Replace with asterisks | **************** |
hash | Replace with SHA-256 hash | a1b2c3d4e5f6... |
Core Detection Functions (4)
anofox_tab_pii_detect
Detect all PII in text and return matches as JSON.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | VARCHAR | Yes | - | Text to scan for PII |
Returns: JSON array of detected PII with type, text, position, and confidence
Example
SELECT anofox_tab_pii_detect('Contact: john.doe@example.com, SSN: 123-45-6789');
-- Returns: [
-- {"type":"EMAIL","text":"john.doe@example.com","start":9,"end":29,"confidence":1.00},
-- {"type":"US_SSN","text":"123-45-6789","start":36,"end":47,"confidence":1.00}
-- ]
anofox_tab_pii_mask
Mask all detected PII using specified strategy.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | VARCHAR | Yes | - | Text containing PII |
strategy | VARCHAR | No | 'redact' | Masking strategy |
Returns: Text with PII masked
Example
-- Redact (default)
SELECT anofox_tab_pii_mask('Email: test@example.com');
-- Output: 'Email: [EMAIL]'
-- Partial masking
SELECT anofox_tab_pii_mask('SSN: 123-45-6789', 'partial');
-- Output: 'SSN: ***-**-6789'
-- Asterisk masking
SELECT anofox_tab_pii_mask('Card: 4111111111111111', 'asterisk');
-- Output: 'Card: ****************'
anofox_tab_pii_contains
Check if text contains any PII.
SELECT anofox_tab_pii_contains('Contact us at support@company.com');
-- Output: true
SELECT anofox_tab_pii_contains('Hello, world!');
-- Output: false
anofox_tab_pii_count
Count PII occurrences in text.
SELECT anofox_tab_pii_count('Email: a@b.com, SSN: 123-45-6789, Card: 4111111111111111');
-- Output: 3
Table Scanning Functions (3)
anofox_tab_pii_scan_table
Scan entire table for PII across VARCHAR columns.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
table_name | VARCHAR | Yes | - | Table to scan |
columns | VARCHAR | No | NULL | Comma-separated columns (all VARCHAR if omitted) |
Returns: Table with column_name, pii_type, match_count, sample_values, confidence
Example
-- Scan all columns
SELECT * FROM anofox_tab_pii_scan_table('users');
-- Scan specific columns
SELECT * FROM anofox_tab_pii_scan_table('users', 'email,phone');
-- Find high-risk columns
SELECT column_name, COUNT(DISTINCT pii_type) as pii_types
FROM anofox_tab_pii_scan_table('customer_data')
GROUP BY column_name
HAVING COUNT(DISTINCT pii_type) >= 2;
anofox_tab_pii_audit_table
Row-level PII audit with masking for detailed inspection.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
table_name | VARCHAR | Yes | - | Table to audit |
columns | VARCHAR | No | NULL | Comma-separated columns |
Returns: Table with row_id, column_name, pii_type, original_value, masked_value, positions, confidence
Example
SELECT * FROM anofox_tab_pii_audit_table('customer_data')
WHERE pii_type = 'US_SSN';
anofox_tab_pii_status
List all registered PII recognizers.
SELECT * FROM anofox_tab_pii_status() ORDER BY pii_type;
-- Shows all 17 PII types with their recognizer info
Type-Specific Detection Functions (6)
For targeted scans when you only need specific PII types:
| Function | Returns |
|---|---|
anofox_tab_pii_detect_emails(text) | List of EMAIL matches |
anofox_tab_pii_detect_phones(text) | List of PHONE matches |
anofox_tab_pii_detect_credit_cards(text) | List of CREDIT_CARD matches |
anofox_tab_pii_detect_ssns(text) | List of US_SSN matches |
anofox_tab_pii_detect_names(text) | List of NAME matches (NER-based) |
anofox_tab_pii_detect_ibans(text) | List of IBAN matches |
Example
SELECT anofox_tab_pii_detect_emails('Contact: john@example.com and jane@example.com');
-- Returns only EMAIL matches
Validation Functions (6)
Validate specific PII formats with checksum verification:
| Function | Description |
|---|---|
anofox_tab_pii_is_valid_ssn(text) | Validate US SSN format |
anofox_tab_pii_is_valid_iban(text) | Validate IBAN with MOD-97 checksum |
anofox_tab_pii_is_valid_credit_card(text) | Validate with Luhn algorithm |
anofox_tab_pii_is_valid_nino(text) | Validate UK National Insurance |
anofox_tab_pii_is_valid_de_tax_id(text) | Validate German Tax ID |
anofox_tab_pii_is_valid_crypto_address(text) | Validate Bitcoin/Ethereum |
Example
SELECT anofox_tab_pii_is_valid_credit_card('4111111111111111'); -- true (Luhn valid)
SELECT anofox_tab_pii_is_valid_iban('DE89370400440532013000'); -- true (MOD-97 valid)
Advanced Masking Functions (2)
anofox_tab_pii_mask_column
Type-specific masking for a single PII type.
SELECT anofox_tab_pii_mask_column('123-45-6789', 'US_SSN', 'partial');
-- Returns: ***-**-6789
anofox_tab_pii_redact_column
Mask all PII in text with optional strategy.
SELECT anofox_tab_pii_redact_column('Email: test@example.com');
-- Returns: Email: [EMAIL]
Practical Patterns
Data Privacy Audit
-- Find all tables with PII exposure
WITH pii_summary AS (
SELECT
'customers' as table_name,
column_name,
pii_type,
match_count
FROM anofox_tab_pii_scan_table('customers')
UNION ALL
SELECT
'orders' as table_name,
column_name,
pii_type,
match_count
FROM anofox_tab_pii_scan_table('orders')
)
SELECT
table_name,
COUNT(DISTINCT pii_type) as pii_types,
SUM(match_count) as total_matches
FROM pii_summary
GROUP BY table_name
ORDER BY pii_types DESC;
GDPR Compliance Check
-- Identify columns needing anonymization
SELECT
column_name,
pii_type,
match_count,
CASE
WHEN pii_type IN ('US_SSN', 'CREDIT_CARD', 'IBAN') THEN 'HIGH'
WHEN pii_type IN ('EMAIL', 'PHONE', 'NAME') THEN 'MEDIUM'
ELSE 'LOW'
END as risk_level
FROM anofox_tab_pii_scan_table('customer_data')
ORDER BY
CASE risk_level
WHEN 'HIGH' THEN 1
WHEN 'MEDIUM' THEN 2
ELSE 3
END;
Pre-Migration PII Discovery
-- Before migrating data, identify all PII
SELECT
column_name,
pii_type,
sample_values[1] as example,
match_count
FROM anofox_tab_pii_scan_table('legacy_data')
WHERE match_count > 0
ORDER BY match_count DESC;
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
anofox_pii_min_confidence | DOUBLE | 0.5 | Minimum confidence threshold |
anofox_pii_default_mask_strategy | VARCHAR | 'redact' | Default masking strategy |
anofox_pii_enabled_types | VARCHAR | '' | Comma-separated enabled types (empty = all) |
anofox_pii_deep_validation | BOOLEAN | false | Enable deep email/phone validation |
SET anofox_pii_min_confidence = 0.7;
SET anofox_pii_default_mask_strategy = 'partial';
Platform Availability
| Platform | Pattern-based (13 types) | NER-based (4 types) |
|---|---|---|
| Linux x64 (glibc) | Yes | Yes |
| macOS x64 | Yes | Yes |
| macOS ARM64 | Yes | Yes |
| Windows x64 | Yes | No (dictionary fallback for NAME) |
| Linux ARM64 | Yes | No |
| Linux musl (Alpine) | Yes | No |
Performance Characteristics
| Function | Speed | Notes |
|---|---|---|
| Pattern detection | ~1ms/KB | Fast regex matching |
| NER detection | ~10ms/KB | ML model inference |
| Table scan | ~100ms/1K rows | Depends on text length |
| Validation | <1ms | Checksum calculations |