Skip to main content

Exploratory Data Analysis

Profile your time series data before forecasting with comprehensive statistical metrics.

FunctionDescription
ts_statsCompute per-series statistical metrics
ts_quality_reportGenerate quality assessment from stats
ts_stats_summaryAggregate statistics across all series
Showing 3 of 3

anofox_fcst_ts_stats

Computes per-series statistical metrics including length, date ranges, central tendencies, dispersion, value distributions, and quality indicators for exploratory analysis and data profiling.

Parameters

ParameterTypeRequiredDescription
table_nameVARCHARYesSource table
group_colVARCHARYesGrouping column name
date_colVARCHARYesDate/timestamp column
value_colVARCHARYesValue column
frequencyVARCHAR/INTEGERYesTime frequency ('1d', '1w', '1mo') or integer step

Output

ColumnTypeDescription
series_idBIGINTSeries identifier (matches group_col)
lengthBIGINTNumber of observations
start_dateDATE/TIMESTAMPFirst date in series
end_dateDATE/TIMESTAMPLast date in series
expected_lengthBIGINTExpected length based on frequency
meanDOUBLEAverage value
stdDOUBLEStandard deviation
minDOUBLEMinimum value
maxDOUBLEMaximum value
medianDOUBLEMedian value
n_nullBIGINTCount of NULL values
n_zerosBIGINTCount of zero values
n_unique_valuesBIGINTCount of distinct values
is_constantBOOLEANTrue if series has no variation
plateau_sizeBIGINTLongest consecutive constant segment
plateau_size_non_zeroBIGINTLongest non-zero constant segment
n_zeros_startBIGINTLeading zeros count
n_zeros_endBIGINTTrailing zeros count
n_duplicate_timestampsHUGEINTDuplicate date count

Example

-- Compute statistics for each product
CREATE TABLE sales_stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales_raw',
'product_id',
'date',
'amount',
'1d'
);

-- View series with potential issues
SELECT series_id, length, n_null, is_constant
FROM sales_stats
WHERE n_null > 0 OR is_constant = true;

Use when:

  • Profiling a new dataset before forecasting
  • Identifying series with data quality issues
  • Understanding the characteristics of your time series

anofox_fcst_ts_quality_report

Generates quality assessment reports by evaluating series against configurable thresholds for gaps, missing values, constant series, short series, and temporal alignment.

Parameters

ParameterTypeRequiredDescription
stats_tableVARCHARYesTable produced by ts_stats
min_lengthINTEGERYesMinimum acceptable series length

Quality Checks Performed

  • Gap analysis - Missing timestamps within series
  • Missing values - NULL counts
  • Constant series - No variation in values
  • Short series - Below minimum length threshold
  • End date alignment - Series ending on different dates

Example

-- First generate statistics
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales',
'product_id',
'date',
'amount',
'1d'
);

-- Then generate quality report with 30-day minimum
CREATE TABLE quality AS
SELECT * FROM anofox_fcst_ts_quality_report('stats', 30);

Use when:

  • Identifying series that need data preparation
  • Creating data quality dashboards
  • Filtering out problematic series before forecasting

anofox_fcst_ts_stats_summary

Aggregates statistical metrics across all series from ts_stats output, computing dataset-level summaries for high-level dataset characterization.

Parameters

ParameterTypeRequiredDescription
stats_tableVARCHARYesTable produced by ts_stats

Output

ColumnTypeDescription
total_seriesINTEGERCount of distinct series
total_observationsBIGINTSum of all observations
avg_series_lengthDOUBLEMean series length
date_spanINTEGERNumber of distinct date intervals

Example

-- Generate statistics first
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales',
'product_id',
'date',
'amount',
'1d'
);

-- Get dataset-level summary
SELECT * FROM anofox_fcst_ts_stats_summary('stats');

-- Example output:
-- total_series | total_observations | avg_series_length | date_span
-- 150 | 45000 | 300.0 | 365

Use when:

  • Getting a quick overview of your dataset
  • Reporting on data characteristics
  • Comparing datasets

Complete EDA Workflow

-- 1. Compute per-series statistics
CREATE TABLE stats AS
SELECT * FROM anofox_fcst_ts_stats(
'sales_raw',
'product_id',
'date',
'amount',
'1d'
);

-- 2. Get dataset overview
SELECT * FROM anofox_fcst_ts_stats_summary('stats');

-- 3. Generate quality report (minimum 60 days)
CREATE TABLE quality AS
SELECT * FROM anofox_fcst_ts_quality_report('stats', 60);

-- 4. Identify problematic series
SELECT series_id
FROM stats
WHERE is_constant = true
OR length < 60
OR n_null > length * 0.1;

🍪 Cookie Settings