Skip to main content

Data Operations

Dataset comparison and diffing functions.

Quick Reference

FunctionDescriptionSQL Signature
anofox_table_diff_hashHash-based diff(table1, table2) -> TABLE
anofox_table_diff_joinJoin-based diff(table1, table2, key_columns[]) -> TABLE

Diffing Functions (2)

anofox_table_diff_hash

Hash-based table diff for change detection.

Parameters

ParameterTypeRequiredDefaultDescription
table1VARCHARYes-Source table name
table2VARCHARYes-Comparison table name

Speed: O(n) per table, hash-based comparison

Output

ColumnTypeDescription
table_nameVARCHARWhich table
row_numberINTEGERRow identifier
change_typeVARCHARADDED, MODIFIED, DELETED
reasonVARCHARExplanation

Example

SELECT * FROM anofox_table_diff_hash('customers', 'customers_backup');

Pros:

  • Fast (hash-based)
  • Works with any column types
  • Good for "what changed?" queries

Cons:

  • Doesn't show which columns changed
  • Can't see detailed differences

anofox_table_diff_join

Join-based table diff with detailed column-level differences.

Parameters

ParameterTypeRequiredDefaultDescription
table1VARCHARYes-Source table
table2VARCHARYes-Comparison table
key_columnsVARCHAR[]Yes-Columns to join on

Speed: O(n log n) join-based

Output

ColumnTypeDescription
key_columnANYKey value
*_matchBOOLEANTRUE if column matches
statusVARCHARSAME, DIFFERENT, ONLY_IN_TABLE1, ONLY_IN_TABLE2

Example

SELECT * FROM anofox_table_diff_join(
'customers',
'customers_backup',
ARRAY['customer_id']
);

Pros:

  • Column-level detail
  • See exactly which columns differ
  • Better for data validation

Practical Patterns

Validate Before ETL

SELECT COUNT(*) as mismatch_count
FROM anofox_table_diff_join(
'raw_import',
'customers',
ARRAY['customer_id']
)
WHERE status != 'SAME';

Change Tracking

SELECT
customer_id,
'EMAIL_CHANGED' as change_type
FROM anofox_table_diff_join(
'customers_today',
'customers_yesterday',
ARRAY['customer_id']
)
WHERE email_match = FALSE;

Data Sync Verification

SELECT
COUNT(*) as total_rows,
COUNT(*) FILTER (WHERE status = 'SAME') as matching_rows,
COUNT(*) FILTER (WHERE status = 'DIFFERENT') as different_rows,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'SAME') / COUNT(*), 2) as sync_percentage
FROM anofox_table_diff_join(
'customers_primary',
'customers_replica',
ARRAY['customer_id']
);

Migration Validation

SELECT
COUNT(*) as total_differences,
COUNT(*) FILTER (WHERE status = 'ONLY_IN_TABLE1') as lost_in_migration,
COUNT(*) FILTER (WHERE status = 'ONLY_IN_TABLE2') as added_in_migration,
COUNT(*) FILTER (WHERE status = 'DIFFERENT') as data_corruption
FROM anofox_table_diff_join(
'customers_legacy_system',
'customers_new_system',
ARRAY['customer_id']
)
WHERE status != 'SAME';

Combining with Validation

SELECT
diff.*,
anofox_vat_is_valid(new.vat_id) as vat_valid
FROM anofox_table_diff_join(
'customers_old',
'customers_new',
ARRAY['customer_id']
) diff
LEFT JOIN customers_new new USING (customer_id)
WHERE diff.status = 'DIFFERENT'
AND NOT anofox_vat_is_valid(new.vat_id);

Performance Considerations

Hash-Based (Faster)

-- Use when you just need "are they different?"
SELECT * FROM anofox_table_diff_hash('t1', 't2');

Best for: Large tables (10M+ rows), any column types

Join-Based (More Detail)

-- Use when you need to know what changed
SELECT * FROM anofox_table_diff_join('t1', 't2', ARRAY['id']);

Best for: Medium tables (< 10M rows), detailed reports


🍪 Cookie Settings