Skip to main content

Anomaly Detection

Statistical and machine learning methods for outlier detection.

Quick Reference

FunctionDescriptionSQL Signature
anofox_zscore_anomalyZ-score outlier(value, threshold) OVER () -> BOOLEAN
anofox_iqr_anomalyIQR outlier(value, multiplier) OVER () -> BOOLEAN
anofox_isolation_forest_anomalyMultivariate ML(features[], contamination) OVER () -> BOOLEAN
anofox_dbscan_anomalyDensity-based(features[], eps, min_pts) OVER () -> BOOLEAN

Statistical Methods (2)

anofox_zscore_anomaly

Detect outliers using Z-score method.

Parameters

ParameterTypeRequiredDefaultDescription
valueDOUBLEYes-Column to analyze
thresholdDOUBLEYes-Z-score threshold (2.0, 2.5, or 3.0)

Method: z = (value - mean) / std_dev, anomaly = |z| > threshold

Thresholds:

Threshold% in DistributionOutliers
2.095%5% of data
2.598.8%1.2% of data
3.099.7%0.3% of data

Example

SELECT
transaction_id,
amount,
anofox_zscore_anomaly(amount, 3.0) OVER () as is_outlier
FROM transactions
WHERE anofox_zscore_anomaly(amount, 3.0) OVER () = TRUE;

anofox_iqr_anomaly

Detect outliers using Interquartile Range.

Parameters

ParameterTypeRequiredDefaultDescription
valueDOUBLEYes-Column to analyze
multiplierDOUBLEYes-IQR multiplier (1.5, 2.0, or 3.0)

Method:

  • IQR = Q3 - Q1
  • lower = Q1 - multiplier * IQR
  • upper = Q3 + multiplier * IQR
  • anomaly = value < lower OR value > upper

Example

SELECT
transaction_id,
amount,
anofox_iqr_anomaly(amount, 1.5) OVER () as is_outlier
FROM transactions;

Pros over Z-score:

  • Robust to extreme outliers
  • Works with any distribution
  • Doesn't require normality assumption

Machine Learning Methods (2)

anofox_isolation_forest_anomaly

Detect outliers using Isolation Forest algorithm.

Parameters

ParameterTypeRequiredDefaultDescription
featuresDOUBLE[]Yes-Array of numeric columns
contaminationDOUBLEYes-Expected proportion (0.0-1.0)

Contamination guidance:

ValueExpectationUse Case
0.011% anomaliesHigh quality data
0.055% anomaliesNormal data
0.1010% anomaliesMessy data

Example

SELECT
customer_id,
amount,
frequency,
anofox_isolation_forest_anomaly(
[amount, frequency, days_since_purchase],
0.05
) OVER () as is_anomaly
FROM customers
WHERE anofox_isolation_forest_anomaly(
[amount, frequency, days_since_purchase],
0.05
) OVER () = TRUE;

anofox_dbscan_anomaly

Detect outliers using Density-Based Clustering.

Parameters

ParameterTypeRequiredDefaultDescription
featuresDOUBLE[]Yes-Array of numeric columns
epsDOUBLEYes-Distance threshold (0.1-2.0)
min_ptsINTEGERYes-Min points for cluster (5-20)

Example

SELECT
customer_id,
amount,
frequency,
anofox_dbscan_anomaly(
[amount, frequency],
0.5, -- eps
10 -- min_pts
) OVER () as is_anomaly
FROM customers;

Method Comparison

MethodTypeSpeedInterpretationUse Case
Z-ScoreStatisticalFastEasySingle column, normal dist
IQRStatisticalFastEasySingle column, robust
Isolation ForestMLMediumHardMultiple columns
DBSCANMLSlowMediumDensity-based clusters

Decision Tree

How many features?
├─ ONE column
│ ├─ Normal distribution? → Z-Score
│ └─ Any distribution? → IQR

└─ MULTIPLE columns
├─ Need interpretability? → DBSCAN
└─ Maximize detection? → Isolation Forest

Window Function Usage

-- Global detection
anofox_zscore_anomaly(amount, 3.0) OVER ()

-- Partition by group
anofox_zscore_anomaly(amount, 3.0) OVER (PARTITION BY region)

-- Rolling window
anofox_zscore_anomaly(amount, 3.0) OVER (
PARTITION BY customer_id
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
)

Multi-Stage Detection

SELECT
transaction_id,
amount,
CASE
WHEN anofox_zscore_anomaly(amount, 3.0) OVER () THEN 'z_score'
WHEN anofox_iqr_anomaly(amount, 1.5) OVER () THEN 'iqr'
ELSE NULL
END as anomaly_method
FROM transactions;

🍪 Cookie Settings