Add or edit a custom SQL rule

This section shows you how to create a custom SQL rule from the rule workbench to monitor data quality according to your specific needs of your organization.

Prerequisites

  • You have an existing Data Quality Job with at least one run.
  • The Data Quality Pushdown Processing Edge capability is enabled for your data source.
  • You have the Data Quality Editor or Data Quality Manager resource role with the following resource permissions, depending on the actions you intend to perform:
  • Action Role and permission requirement
    Create a custom rule monitor You have the Data Quality Editor or Data Quality Manager resource role with the Data Quality Job > Create Monitor resource permission.
    Edit, suppress, or activate a monitor You have the Data Quality Editor or Data Quality Manager resource role with the Data Quality Job > Edit Monitor resource permission.
    Delete a rule monitor You have the Data Quality Editor or Data Quality Manager resource role with the Data Quality Job > Delete Monitor resource permission.
    To preview rule break records You have the Data Quality Editor or Data Quality Manager resource role with the Data Quality Job > Preview Rule Break Records resource permission.

Adding a SQL rule

  1. Select the job to which you want to add a rule from the Job drop-down list, or enter its name in the input field.
  2. Enter a unique name in the Rule name input field.
  3. Note You cannot use a name that is already in use by another rule in the job. Rule names can contain periods (.), underscores (_), and hyphens (-). All other special characters are not allowed.

  4. Optionally, select a primary column from the Primary column drop-down list. This allows Collibra to associate your rule with a column asset based on the selected primary column.
  5. Important When the Require primary column setting is enabled, this field is required.
  6. From the SQL editor in the SQL tab, enter your SQL query. Alternatively, click AI icon SQL AI to allow Collibra to suggest a SQL query based on your plain-text input.
  7. Note 
    • The SQL editor automatically provides typeahead suggestions when you type characters. When a typeahead suggestion appears, click it or press Enter on your keyboard to add it to the SQL editor, or continue typing to ignore it.
    • Join statements are supported when the tables are from the same connection.
  8. Optionally, click Align left icon Format to automatically format the SQL for improved readability.
  9. Optionally, click Validate SQL to validate the syntax of your query.
  10. Optionally, click the Settings tab.
    The settings tab opens, where you can associate data quality dimensions with your rule, set a rule tolerance percentage, and add notifications and a description.
  11. Click Save.
    The rule is added to your job.

A filter query allows you to narrow the scope of your rule by adding a second query definition to it, limiting the total number of rows included in the rule score calculation.

  1. Select the job to which you want to add a rule from the Job drop-down list, or enter its name in the input field.
  2. Enter a unique name in the Rule name input field.
  3. Note You cannot use a name that is already in use by another rule in the job. Rule names can contain periods (.), underscores (_), and hyphens (-). All other special characters are not allowed.

  4. Optionally, select a primary column from the Primary column drop-down list. This allows Collibra to associate your rule with a column asset based on the selected primary column.
  5. Important When the Require primary column setting is enabled, this field is required.
  6. From the SQL editor in the SQL tab, enter your SQL query. Alternatively, click AI icon SQL AI to allow Collibra to suggest a SQL query based on your plain-text input.
  7. Note 
    • The SQL editor automatically provides typeahead suggestions when you type characters. When a typeahead suggestion appears, click it or press Enter on your keyboard to add it to the SQL editor, or continue typing to ignore it.
    • Join statements are supported when the tables are from the same connection.
  8. Click Filter icon to add a filter query.
    The filter query input field expands.
  9. Enter a filter query using a simple query format, such as SYMBOL IS NOT NULL or LOW > 25.
  10. Tip 

    The filter query must include only conditions and predicates that appear after the WHERE clause in a SQL statement. For example, a correct filter query for results in the "low" column of a NYSE table would be LOW > 25. Full SQL statements, such as SELECT * FROM PUBLIC.NYSE WHERE LOW > 25, are not allowed.

  11. Optionally, click Align left icon Format to automatically format the SQL for improved readability.
  12. Optionally, click Validate SQL to validate the syntax of your query.
  13. Optionally, click the Settings tab.
    The settings tab opens, where you can associate data quality dimensions with your rule, set a rule tolerance percentage, and add notifications and a description.
  14. Click Save.
    The rule is added to your job.

Schema details

The schema details tab provides an overview of the columns of your table. It serves as a helpful reference point when adding your SQL rule and helps you determine the type of rules to create. It shows the:

  • Row number.
  • Name of the column.
  • Related Collibra Platform assets, when applicable.
  • Defined data type of a column.
  • Number of unique values in a column
  • Percentage of null cells in a column.
  • Percentage of empty values in a column.

Results preview

After you enter a valid SQL statement in the SQL editor and specify the number of sample rows in the run preview button, you can preview the results of your rule on the rule preview tab. This helps ensure the rule will return the expected output before your job runs.

The run preview button allows you to preview the results of your rule, based on the number sample rows you select from the row sample drop-down list. The available sample row options are:

  • 25
  • 50
  • 100 (default)
  • 250
  • 500

What's next

  • About rule-level settings
  • Go to the Job Details page to:
    • Manually re-run your job and view the output of the rule monitor in the Monitors tab.
    • View the output of the rule monitor in the Monitors tab after your scheduled job runs automatically.