Lookback

Lookback lets you specify a historical period of past DQ scans to include in a DQ Job. For example, a lookback period of 5 includes DQ checks over the last 5 days of available data. By using the date column of the table, file, or view to scan the history of your dataset, lookback can help you identify potential outliers, patterns, and behavioral anomalies.

Important When a job that includes a lookback period runs, a temporary job appears in the Jobs queue. The temporary job will not resolve until all other lookback jobs are complete.

Lookback options

There are two lookback options to consider.

Option Description
Union Lookback Loads a specified number of historical runs of files on certain dates under the same dataset name and includes them in the DQ scans.
Full File Lookback Includes the historical context of a single file in the outlier and/or patterns scans. To string together historical files that contain a single timeslice and include a larger historical window, use union lookback instead.

Union Lookback (-fllb)

Union Lookback, or File Lookback (-fllb) as it is also known, is used with deep learning and pattern matching. In the example below, it is used with deep learning.

File Lookback is used to check DQ Check history for previous files.

Copy
-fllb

This is often used with files and in conjunction with -adddc in cases where a date column is not in an ideal format or you do not have a date column on the given dataset.

Despite the name, this can be used with file or database storage formats.

Note File look back (-fllb) should only be used when a SQL layer is not available. This is considered for advanced use cases, but may not be suitable for all file types and folder structures. Best practice is to expose a date signature somewhere in the file or directory naming convention.

Example

Copy
-ds "demo_lookback" \
-rd "2017-07-29" \
-lib "/opt/owl/drivers/mysql" \
-cxn "mysql" \
-q "select  * from lake.dateseries where DATE_COL = '2017-07-29' " \      
-dc DATE_COL \
-dl \ 
-dlkey sym \
-dllb 4 \
-fllb

Note This look back will load your past 4 runs as your historical training set

Full File Lookback (-fullfile)

Like Union Lookback, Full File Lookback (-fullfile) is used with deep learning and pattern matching.

Fullfile Lookback uses the entire file for lookbacks instead of just filequery.

Understanding lookback command line flags

When you look at the DQ Job command line with lookback enabled, you may see several different lookback flags. The following table describes what each lookback flag means and to which DQ layers they apply.

Flag Description Layer
-fllb Union lookback loads the DQ check history of files or database tables on certain dates under the same dataset name and includes them in the DQ Job. Outliers and Patterns
-fllbminrow The minimum number of rows in a dataset before it is included in a file lookback. This is automatically applied when Union Lookback is selected. Outliers and Patterns
-dllb Deep learning lookback loads DQ checks over a specified period of days and includes them in the scan for outliers. Outliers
-dlminhist An automatically generated flag based on the outlier lookback setting -dllb that defines the minimum number of days before DQ flags data as potential outliers. -dlminhistensures that the number of days in the algorithm is relative to the total scope of the lookback period. Outliers
-fullfile Includes the historical context of a single file in the outlier and/or patterns scans. Outliers and Patterns
-bhlb Behavior lookback loads a specified period of past DQ checks to include in a DQ Job. This controls the baseline profiling of a dataset. Behavior
-fpglb Pattern lookback loads DQ checks over a specified period of days and includes them in the scan for patterns. Patterns