DQ Job Link IDs
Link ID is an out-of-the-box feature that lets you link the findings of a DQ Job back to the source record, or key, for remediation outside the application. The link ID should be unique and is most commonly the primary key. Composite primary key is also supported.
For more information on Link IDs, go to Working with Link IDs.
Notebook
To define the Link ID in a notebook, use the following command:
val opt = new OwlOptions()
opt.runId = "2018-02-24"
opt.dataset = "orders"
opt.linkId = Array("transaction_id", "trans_time")
Command Line
To define the Link ID using the command line, use the following command:
./owlcheck -ds orders \
-rd "2018-02-24" \
-linkid transaction_id,trans_time
Note For rules to use linkID, the columns need to be present in the select statement (either select * or select specific column names). All Simple rules are eligible for linkID and Freeform rules need to contain the columns in the projection part of the SQL statement.
Notebook API Example
+------------+----------+-------+-------+-----+-----------------+---------------+
| dataset| runId|fieldNm| format|count| percent| transaction_id|
+------------+----------+-------+-------+-----+-----------------+---------------+
| order |2018-02-24| fname|xxxx'x.| 1|7.142857142857142|t-1232 |
+------------+----------+-------+-------+-----+-----------------+---------------+
owl.getShapesDF
Rest API Example
When supplying a linkID, Collibra DQ naturally excludes this field from most activities, meaning a unique ID or primary key column can not be duplicative or it would not be the primary key. Because of this, it is not evaluated for duplicates. The same is true for Outliers and Shapes, as a large sequence number or other variations might trigger a false positive when this column is denoted to be simply for the purpose of linking uniquely back to the source. If you also want to evaluate this column and link it, create a derived column with a different name and Collibra DQ will naturally handle both cases.
owl.getShapes
owl.getDupes
owl.getOutliers
owl.getRuleBreaks
owl.getSourceBreaks
getRules()
----Rules----
+-----------------+----------+--------------------+------------------+------+
| dataset| runId| ruleNm| ruleValue|linkId|
+-----------------+----------+--------------------+------------------+------+
|dataset_outlier_3|2018-02-24| fname_like_Kirk|fname like 'Kirk' | c-41|
|dataset_outlier_3|2018-02-24| fname_like_Kirk|fname like 'Kirk' | c-42|
|dataset_outlier_3|2018-02-24| fname_like_Kirk|fname like 'Kirk' | c-43|
|dataset_outlier_3|2018-02-24| fname_like_Kirk|fname like 'Kirk' | c-44|
|dataset_outlier_3|2018-02-24| fname_like_Kirk|fname like 'Kirk' | c-45|
|dataset_outlier_3|2018-02-24|if_email_is_valid...| email| c-31|
|dataset_outlier_3|2018-02-24|if_email_is_valid...| email| c-33|
|dataset_outlier_3|2018-02-24|if_zip_is_valid_Z...| zip| c-40|
+-----------------+----------+--------------------+------------------+------+
getDupes()
First split on ~~ then if you have a multiple part key split on ~|.
----Dupes----
+-----------------+----------+-----+--------------------+----------+
| dataset| runId|score| key| linkId|
+-----------------+----------+-----+--------------------+----------+
|dataset_outlier_3|2018-02-24| 100|9ec828d5194fa397b...|c-45~~c-36|
|dataset_outlier_3|2018-02-24| 100|1f96274d1d10c9f77...|c-45~~c-35|
|dataset_outlier_3|2018-02-24| 100|051532044be286f99...|c-45~~c-44|
|dataset_outlier_3|2018-02-24| 100|af2e96921ae53674a...|c-45~~c-43|
|dataset_outlier_3|2018-02-24| 100|ad6f04bf98b38117a...|c-45~~c-42|
|dataset_outlier_3|2018-02-24| 100|1ff7d50a7a9d07d02...|c-45~~c-41|
|dataset_outlier_3|2018-02-24| 100|6ed858ed1f4178bb0...|c-45~~c-40|
|dataset_outlier_3|2018-02-24| 100|d2903703b348fb4cb...|c-45~~c-39|
|dataset_outlier_3|2018-02-24| 100|24bf54412de1e720d...|c-45~~c-38|
|dataset_outlier_3|2018-02-24| 100|7a7ce0beb41b39564...|c-45~~c-37|
+-----------------+----------+-----+--------------------+----------+
getRuleBreaks()
The getRuleBreaks endpoint retrieves all broken records within your dataset. There is no size limit to this API.
----Rule-Breaks----
+-----------------+----------+--------------------+------+
| dataset| runId| ruleNm|linkId|
+-----------------+----------+--------------------+------+
|dataset_outlier_3|2018-02-24| fname_like_Kirk| c-41|
|dataset_outlier_3|2018-02-24| fname_like_Kirk| c-42|
|dataset_outlier_3|2018-02-24| fname_like_Kirk| c-43|
|dataset_outlier_3|2018-02-24| fname_like_Kirk| c-44|
|dataset_outlier_3|2018-02-24| fname_like_Kirk| c-45|
|dataset_outlier_3|2018-02-24|if_email_is_valid...| c-31|
|dataset_outlier_3|2018-02-24|if_email_is_valid...| c-33|
|dataset_outlier_3|2018-02-24|if_zip_is_valid_Z...| c-40|
+-----------------+----------+--------------------+------+