Quality Metrics
Functions for profiling and measuring data quality.
Quick Reference
| Function | Description | SQL Signature |
|---|---|---|
anofox_metric_nullness | % NULL values | (column) -> DECIMAL |
anofox_metric_distinctness | % unique values | (column) -> DECIMAL |
anofox_metric_freshness | Days since update | (date_column) -> DECIMAL |
anofox_metric_volume | Row count | (table_name) -> INTEGER |
anofox_metric_consistency | Format consistency | (column) -> DECIMAL |
anofox_metric_schema_match | Schema compatible | (table1, table2) -> BOOLEAN |
Core Metrics (7 functions)
anofox_metric_nullness
Measure percentage of NULL values in a column.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
column | ANY | Yes | - | Column to analyze |
Returns: Percentage of NULLs (0.0-100.0)
Interpretation:
- 0%: No missing values (complete)
- 1-5%: Mostly complete
- 5-20%: Significant gaps
-
20%: Unusable column
Example
SELECT anofox_metric_nullness(email) as null_percentage
FROM customers;
-- Output: 2.5 (2.5% of emails are NULL)
anofox_metric_distinctness
Measure percentage of unique values.
Interpretation:
- 100%: All unique (key column)
- 50-100%: High diversity
- 10-50%: Moderate categorization
- < 10%: Heavily concentrated
Example
SELECT
anofox_metric_distinctness(customer_id) as id_uniqueness,
anofox_metric_distinctness(country) as country_uniqueness
FROM customers;
-- id_uniqueness: 100.0
-- country_uniqueness: 5.2
anofox_metric_freshness
Measure days since most recent update.
Interpretation:
- 0-1 days: Real-time (fresh)
- 1-7 days: Current (acceptable)
- 7-30 days: Stale (aging)
-
30 days: Very stale
Example
SELECT anofox_metric_freshness(updated_at) as days_since_update
FROM customers;
-- Output: 3
anofox_metric_volume
Get row count of table.
SELECT anofox_metric_volume('customers') as customer_count;
-- Output: 125000
anofox_metric_consistency
Measure format consistency across column.
Interpretation:
-
95%: Highly consistent
- 80-95%: Mostly consistent
- 50-80%: Inconsistent (may need normalization)
- < 50%: Very inconsistent
Example
SELECT anofox_metric_consistency(phone) as phone_format_consistency
FROM customers;
-- Output: 75.5
anofox_metric_schema_match
Check if two tables have compatible schemas.
SELECT anofox_metric_schema_match('customers', 'customers_backup');
-- Output: true
Data Quality Profile
Create a comprehensive health check:
SELECT
'customers' as table_name,
anofox_metric_row_count('customers') as row_count,
anofox_metric_nullness(email) as email_nullness_pct,
anofox_metric_nullness(phone) as phone_nullness_pct,
anofox_metric_distinctness(customer_id) as id_uniqueness_pct,
anofox_metric_freshness(updated_at) as days_since_update,
anofox_metric_consistency(phone) as phone_format_consistency_pct
FROM customers;
Quality Scoring
Create a composite quality score:
SELECT
'customers' as table_name,
ROUND(
(
CASE WHEN anofox_metric_nullness(email) < 5 THEN 25 ELSE 0 END +
CASE WHEN anofox_metric_distinctness(customer_id) > 95 THEN 25 ELSE 0 END +
CASE WHEN anofox_metric_freshness(updated_at) < 7 THEN 25 ELSE 0 END +
CASE WHEN anofox_metric_consistency(phone) > 80 THEN 25 ELSE 0 END
),
1
) as quality_score
FROM customers;
Quality Thresholds
| Metric | Green | Yellow | Red |
|---|---|---|---|
| Nullness | < 5% | 5-10% | > 10% |
| Uniqueness (ID) | > 99% | 95-99% | < 95% |
| Freshness | < 1 day | 1-7 days | > 7 days |
| Consistency | > 90% | 70-90% | < 70% |
Alerting Rules
SELECT
CASE
WHEN anofox_metric_nullness(email) > 10 THEN 'Alert: Email nullness > 10%'
ELSE 'OK'
END as email_alert,
CASE
WHEN anofox_metric_freshness(updated_at) > 30 THEN 'Alert: Data is > 30 days old'
ELSE 'OK'
END as freshness_alert
FROM customers;