Working with link IDs

Link ID is an out-of-the-box feature that allows you to link the findings of a DQ Job back to the source record, or key, for remediation outside the application. You can specify the link ID when selecting columns during the job creation process. Depending on whether you run jobs in Pullup or Pushdown, the link ID requirements may differ, as shown in the following table.

Processing method	Link ID required for break record archival?
Pushdown	No Important Optional for rules; required for dupes, outliers, and shapes.
Pullup	Yes

Processing method

Link ID required for break record archival?

Pushdown

Important Optional for rules; required for dupes, outliers, and shapes.

Pullup

Yes

Importance of uniqueness

The link ID should be unique and should not contain any null values. It is most commonly the primary key. Depending on the dataset, you may need to select multiple columns to create a composite primary key. Otherwise, if the link ID is not unique, the DQ job may not be able to isolate the specific break records.

Link ID example using sales data

Suppose the dataset contains the following sales data:

NAME	TRDATE	SALES
John	2022-05-01	1000
John	2022-05-02	2000
John	2022-05-03	1000
John	2022-05-04	2000
John	2022-05-05	0
Steve	2022-05-01	300
Steve	2022-05-02	400
Steve	2022-05-03	200
Steve	2022-05-04	500
Steve	2022-05-05	10000

In this example, the rule specifies that SALES = 0 is a break record. Notice that this condition exists for "John" on "2022-05-05".

Suppose the link ID is set to only one column, NAME, which is not a unique value. In this case, the rule correctly identifies a break record, however, the DQ Job cannot identify the row that constitutes the break record. It can only determine the NAME of the sales associate with the break record "John".

If the link ID is set to both NAME and TRDATE, which together constitutes a unique key, the job can correctly identify the row that constitutes the break record.

Link ID example using NYSE data

In this example, a NYSE dataset has the columns CLOSE and SYMBOL selected as link ID columns. The following rule queries allow you to view rule break records:

# both link ID columns are included in the SELECT statement

SELECT CLOSE, SYMBOL FROM ...

# SELECT * statement includes all columns, including link IDs

SELECT * FROM ...

To view the rule break records, you can specify the link ID columns in your SELECT query. Alternatively, SELECT * FROM also allows you to view rule break records because it includes all dataset columns in your query.

Conversely, when CLOSE and SYMBOL are the only specified link ID columns in the same NYSE dataset, the following query will not display rule break records in the Rule Breaks dialog box on the Findings page:

# only one of the link ID columns included in the SELECT statement

SELECT close FROM ...

Data Quality & Observability Classic supports one or many primary key columns in your dataset for record linkage to your original table, file, or data frame. If your primary key column contains many columns, use a comma to separate them.

Link ID and the Metastore

No personal data is stored in the Metastore when using link ID (unless, of course, the column used as the link ID contains PII). The Metastore only stores:

The rule applied to your DQ Job.
The dataset used for your DQ Job.
The column of reference.
The link ID.

For more on sensitive information, go to the Data Class and Sensitivity Labels documentation.

Set up link IDs

Create a DQ job as described in Create a DQ Job. Be sure to assign link IDs when selecting your columns.
Before running the job, open the settings to configure the desired break record archive behavior. Depending on the type of job, follow the steps provided based on the type of job:
- Archiving break records from Pushdown jobs
- Archiving break records from Pullup jobs

View break records

You can view rule break records using one of two methods:

View break records from the Rules tab of the Findings page.
Export a CSV using the Actions button on the Rules tab.

Rules with break records have associated link IDs that link back to the original dataset.

View break records from the Findings page

On the Findings page, open the Rules tab.
Click the Actions button and select Rule Breaks.

The Rule Breaks dialog box opens.

Note You can also use the Rule Discovery feature to apply sensitivity labels to data classes and trigger breaks for all the records that do not match your link ID. The Rule Discovery feature is described in more detail on the Rules tab of the Rule Exploring data quality findings page.

Export a CSV with link IDs

On the Findings page, open the Rules tab.
Click the Actions button and select Rule Breaks.