Exploratory Data Analysis
Profile your time series data before forecasting with comprehensive statistical metrics.
| Function | Description |
|---|---|
ts_stats | Compute per-series statistical metrics |
ts_quality_report | Generate quality assessment from stats |
ts_stats_summary | Aggregate statistics across all series |
Showing 3 of 3
anofox_fcst_ts_stats
Computes per-series statistical metrics including length, date ranges, central tendencies, dispersion, value distributions, and quality indicators for exploratory analysis and data profiling.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_name | VARCHAR | Yes | Source table |
group_col | VARCHAR | Yes | Grouping column name |
date_col | VARCHAR | Yes | Date/timestamp column |
value_col | VARCHAR | Yes | Value column |
frequency | VARCHAR/INTEGER | Yes | Time frequency ('1d', '1w', '1mo') or integer step |
Output
| Column | Type | Description |
|---|---|---|
series_id | BIGINT | Series identifier (matches group_col) |
length | BIGINT | Number of observations |
start_date | DATE/TIMESTAMP | First date in series |
end_date | DATE/TIMESTAMP | Last date in series |
expected_length | BIGINT | Expected length based on frequency |
mean | DOUBLE | Average value |
std | DOUBLE | Standard deviation |
min | DOUBLE | Minimum value |
max | DOUBLE | Maximum value |
median | DOUBLE | Median value |
n_null | BIGINT | Count of NULL values |
n_zeros | BIGINT | Count of zero values |
n_unique_values | BIGINT | Count of distinct values |
is_constant | BOOLEAN | True if series has no variation |
plateau_size | BIGINT | Longest consecutive constant segment |
plateau_size_non_zero | BIGINT | Longest non-zero constant segment |
n_zeros_start | BIGINT | Leading zeros count |
n_zeros_end | BIGINT | Trailing zeros count |
n_duplicate_timestamps | HUGEINT | Duplicate date count |
Example
-- Compute statistics for each product
CREATE TABLE sales_stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales_raw',
'product_id',
'date',
'amount',
'1d'
);
-- View series with potential issues
SELECT series_id, length, n_null, is_constant
FROM sales_stats
WHERE n_null > 0 OR is_constant = true;
Use when:
- Profiling a new dataset before forecasting
- Identifying series with data quality issues
- Understanding the characteristics of your time series
anofox_fcst_ts_quality_report
Generates quality assessment reports by evaluating series against configurable thresholds for gaps, missing values, constant series, short series, and temporal alignment.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
stats_table | VARCHAR | Yes | Table produced by ts_stats |
min_length | INTEGER | Yes | Minimum acceptable series length |
Quality Checks Performed
- Gap analysis - Missing timestamps within series
- Missing values - NULL counts
- Constant series - No variation in values
- Short series - Below minimum length threshold
- End date alignment - Series ending on different dates
Example
-- First generate statistics
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales',
'product_id',
'date',
'amount',
'1d'
);
-- Then generate quality report with 30-day minimum
CREATE TABLE quality AS
SELECT * FROM anofox_fcst_ts_quality_report('stats', 30);
Use when:
- Identifying series that need data preparation
- Creating data quality dashboards
- Filtering out problematic series before forecasting
anofox_fcst_ts_stats_summary
Aggregates statistical metrics across all series from ts_stats output, computing dataset-level summaries for high-level dataset characterization.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
stats_table | VARCHAR | Yes | Table produced by ts_stats |
Output
| Column | Type | Description |
|---|---|---|
total_series | INTEGER | Count of distinct series |
total_observations | BIGINT | Sum of all observations |
avg_series_length | DOUBLE | Mean series length |
date_span | INTEGER | Number of distinct date intervals |
Example
-- Generate statistics first
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales',
'product_id',
'date',
'amount',
'1d'
);
-- Get dataset-level summary
SELECT * FROM anofox_fcst_ts_stats_summary('stats');
-- Example output:
-- total_series | total_observations | avg_series_length | date_span
-- 150 | 45000 | 300.0 | 365
Use when:
- Getting a quick overview of your dataset
- Reporting on data characteristics
- Comparing datasets
Complete EDA Workflow
-- 1. Compute per-series statistics
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales_raw',
'product_id',
'date',
'amount',
'1d'
);
-- 2. Get dataset overview
SELECT * FROM anofox_fcst_ts_stats_summary('stats');
-- 3. Generate quality report (minimum 60 days)
CREATE TABLE quality AS
SELECT * FROM anofox_fcst_ts_quality_report('stats', 60);
-- 4. Identify problematic series
SELECT series_id
FROM stats
WHERE is_constant = true
OR length < 60
OR n_null > length * 0.1;