DQ Connector

Important 
This capability has been archived. For the latest option to integrate your Collibra DQ metadata with Collibra Platform, check out our enhanced integration documentation.

The Native DQ Connector brings intelligence from Collibra Data Quality & Observability into Collibra Platform. Once this integration is established, you will be able to bring in your Data Quality user-defined rules, metrics, and dimensions into Collibra Data Catalog.

Note Only data sources ingested by both Collibra Data Catalog and Collibra Data Quality & Observability can synchronize data quality assets.

Prerequisites

Resource Notes
Collibra Edge Site DQ Connector is a capability of Edge
Collibra Data Intelligence Cloud 2021.07 Release (or newer)
Collibra Data Quality 2.15 (or newer)
Databases and Drivers Proper Access and Credentials (Username / Password)

Because the DQ Connector is an Edge capability, you must be able to ingest data via Edge. For information about enabling and configuring Edge, see the Edge Configuration guide.

Create a Collibra Data Quality & Observability Edge site

Create an Edge site with the following properties:

Field Description

Name

The name of the Edge site, for example Collibra-DQ-Edge. Do not use spaces or special characters.

This field is mandatory and the name must be globally unique.

Description

The description of the Edge site. We recommend to put at least basic location information of the Edge site.

This field is mandatory.

Install the Collibra Data Quality & Observability Edge site

Follow the instructions for your environment to Install an Edge site.

Note This process automatically creates an Edge user, which you use later in the setup process.

Connect to your Collibra Data Quality & Observability source

Create a connection for each Collibra Data Quality & Observability data source you want to synchronize. The following table shows the available properties and their descriptions as they appear on the :

Section Property Description
Connection settings  
  Name

The same name as the Collibra Data Quality & Observability connection name. Ensure that your connection name does not contain any white spaces, as they are not supported in Collibra DQ.

Warning The connection name in Collibra Platform must be an exact match to the connection name used in Collibra DQ.

  Description The description of the JDBC connection. This field is also visible when you register content.
  Connection provider The connection provider, which determines the available connection parameters. Same as Collibra Data Quality & Observability.
Connection parameters Example for Username / Password JDBC driver

 

Username The same username as the Collibra DQ connection username.
  Password The same password as the Collibra DQ connection password.
  Driver class name In most cases, this is the same driver name as the Collibra DQ connection driver name. If you select a different driver in Collibra Platform, the driver class name can be different from the Collibra DQ driver class name.
  Driver Jar

In most cases, this is the same driver JAR file as from Collibra DQ. If you select a different driver in Collibra Platform, the driver jar can be different from the Collibra DQ driver jar. Ensure that the driver is supported in both Collibra Platform and Collibra DQ.

Note Some CDATA drivers that are supported in Collibra Platform are not supported in Collibra DQ. It is best practice to use a CDATA driver in Collibra Platform, but you can use a different driver in Collibra DQ.

  Connection string In most cases, this is the same URL as the Collibra DQ connection URL. If you select a different driver in Collibra Platform, the connection URL can be different from the Collibra DQ connection URL.

Add ingestion capabilities to your Collibra Data Quality & Observability connection

You must add a Catalog JDBC ingestion Edge capability template for each connection you have created to extract and process data for your data source.

Field Description Required

Capability

This section contains general information about the capability.

Name

The name of the Edge capability.

Yes

Description

The description of the Edge capability.

No

Capability template

The capability template. The value that you select in this field determines which sections appear on the page.

Select the following Edge capability:

Catalog JDBC ingestion

Yes

Connection

This section contains information to connect to the data source.

JDBC connection

The connection to the data source.

Yes

JDBC data source type (Deprecated)

Deprecated field. The field was used to indicate the type of the data source. You no longer need to change this field. The required value is automatically identified.

Note The automatically identified value is not shown in this page.

Yes

Supports schemas

A text field where you have to enter True to enable database registration of data sources that have no schema. If the data source has schemas, you can ignore this field.

Tip If the data source does not have a schema, Data Catalog creates a Schema asset with the same name as the full name of the database.

No

Other Settings

Others

This section can contain additional capability properties.
Click Add propertyAdd Other Settings to add a property.ClosedShow possible properties

Note No validation is performed on the values you add.

No

General

This section contains general information about logging.

Debug

An option to automatically send Edge infrastructure log files to Collibra Platform. By default, this option is set to false.

Note We highly recommend to only send Edge infrastructure log files to Collibra Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

For more information, go to logging.

No

Log level

An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

No

Configure destinations for Collibra Data Quality & Observability assets

Collibra Data Quality & Observability rules, metrics and dimensions require their own domains in Data Catalog. If you don't have existing domains for data quality or wish to use new ones for the quality extraction purpose, create a domain for each type of data quality asset:

  • Rules: Rulebook Domain
  • Metrics: Rulebook Domain
  • Dimensions: Governance Asset Domain

Assign permissions for Collibra Data Quality & Observability domains

Edge must have the correct resource permissions to manage assets inside the dedicated Collibra Data Quality & Observability domains. For each dedicated domain, assign the Technical Steward role to the Edge user.

Note The Edge user is automatically created when you install the Edge site.

Add Collibra Data Quality & Observability characteristics to assets

To show Collibra Data Quality & Observability statistics for your data source, assign the following characteristic types to the Table and Column asset types:

Asset type Characteristic type
Table governed by Governance Asset
Column is governed by Data Quality Rule

Add a DQ Connector capability

The DQ Connector facilitates the communication with Collibra Data Quality & Observability. Add a DQ Connector capability to your Collibra Data Quality & Observability Edge site:

Field Description Required

Capability

This section contains general information about the capability.

Name

The name of the Edge capability.

Yes

Description

The description of the Edge capability.

No

Capability template

The capability template. The value that you select in this field determines which sections appear on the page.

Select the following capability template to ingest Collibra Data Quality & Observability user-defined rules, metrics, and dimensions into Collibra Data Catalog:

DQ Connector

Yes

DQ

This section contains information about the Collibra Data Quality & Observability connection.
Base URL
Your Collibra Data Quality & Observability URL

Yes

Username
The Collibra Data Quality & Observability username for this connection.

Yes

Password
The Collibra Data Quality & Observability password for this connection.

Yes

Encryption options

Select the type of encryption to use.

Default: To be encrypted by Edge management server.

Issuer of the JWT
If you have selected Encrypted with public key, enter your JWT issuer.

No

Collibra metadata model This section contains information about where to ingest Collibra Data Quality & Observability assets.
DQ Rules domain id
The UUID of the Rulebook Domain for the ingested Collibra Data Quality & Observability rules.

Yes

DQ Metrics domain id
The UUID of the Rulebook Domain for the ingested Collibra Data Quality & Observability metrics.

Yes

DQ Dimensions domain id
The UUID of the Governance Asset Domain for the ingested Collibra Data Quality & Observability dimensions.

Yes

Default DQ Dimension name

The default Data Quality Dimension, for example Accuracy, Completeness, Consistency and so on.

Default: Completeness.

Yes

DQ Metric classified by DQ Dimension relation type id
The UUID of the Data Quality Metric classified by / classifies Data Quality Dimension relation. If left unspecified, this relation will not be added.

No

Assets are imported in batches of this size

The batch size of the ingestion.

Default: 5000.

Yes

To make the Collibra Data Quality & Observability metadata available in Collibra Data Catalog, you must register the data source for each Collibra Data Quality & Observability data source you want to synchronize.

Create a Data Catalog System Asset

As a prerequisite to registering a data source in Data Catalog, you must create a System asset for each connected data source with the following properties:

Field Value
Type System
Domain The domain to which the new assets will belong. You can only create a asset type in any domain of a domain type that is assigned to a selected asset type.
Name The same name as the Collibra Data Quality & Observability connection name.

Register the Collibra Data Quality & Observability data source in Data Catalog

To make the Collibra Data Quality & Observability metadata available in Collibra Data Catalog, you must register the data source for each Collibra Data Quality & Observability data source you want to synchronize.

Create a Data Catalog System Asset

As a prerequisite to registering a data source in Data Catalog, you must create a System asset for each connected data source with the following properties:

Field Value
Type System
Domain The domain to which the new assets will belong. You can only create a asset type in any domain of a domain type that is assigned to a selected asset type.
Name

The same name as the Collibra Data Quality & Observability connection name.

Warning Connection name must be an exact match in both Collibra DQ and Collibra Platform. For example, if your connection name is postgres-gcp in Collibra DQ, it should also be postgres-gcp in Collibra Platform.

Register the Collibra Data Quality & Observability data source in Data Catalog

Register each Collibra Data Quality & Observability source in Data Catalog.

Extract Data Quality metadata

After you completed the DQ Connector configuration, you can start ingesting Collibra Data Quality & Observability metadata.

Prerequisites

Steps

  1. Open a Database asset page.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Quality extraction section, do one of the following:
    • To select schemas for data quality synchronization:
      1. Click Edit.
        The Data quality column becomes editable.
      2. Select whether to synchronize the available schemas.

      3. Click Save.

    • To synchronize the selected schemas:
      1. Select the schema name to see its configuration.
      2. Click Synchronize.

        The synchronization job is started for the selected schemas.

Known Limitations

  • Only 1 source tenant from Collibra DQ can be specified.
  • On-demand ingestion (vs. scheduled).
  • Can only specify 1 domain destination for each of Rules, Metrics, and Dimensions.
  • Only JDBC sources supported (no file sources).
  • When you integrate a dataset based on joins from multiple tables, the columns derived from the secondary dataset in Collibra DQ do not appear on the table ingested by Collibra Platform, and thus, the DQ Connector fails due to a missing asset. Because of the limitation where there must be a direct match between table and column names across Collibra DQ and Collibra Platform, the DQ Connector does not support aliases.
  • There is an issue where Rest API calls from Collibra DGC to Collibra DQ result in a 403 Forbidden Error because Collibra DGC passes a cookie header which causes Collibra DQ to reject the request due to recent updates to CSRF token requirements. This limits Collibra DQ's ability to connect to Collibra Platform.
    • A possible workaround until a fix is available is to update export CSRF_TOKEN_ENABLED=false in the owl-env.sh file for Standalone deployments or update dq.security.csrf.token.enabled=${CSRF_TOKEN_ENABLED:false} in the Web ConfigMap for Kubernetes deployments.

FAQ

Q: DQ Dashboard In DGC: I can verify the DQ Connector is synchronizing Data Quality Rules and Data Quality Metrics, but why don't Data Quality Dashboard Charts display?

A: Ensure correct Aggregation Paths and Global Assignments (or create, if none exist) for Table and Column below.

Global Assignments For Data Quality Rules

Q: DQ Dashboard In DGC: Why won't my DQ Dimension charts display in my Dashboard?

A: Please 1) add a new custom Relation 'Data Quality Metric classified by Data Quality Dimension', 2) Global Assignment for 'Data Quality Metric', 3) UUID of the new Relation into the DQ Connector setup in Step 1G, 4).

Q: I've connected and configured data sources correctly, why aren't DQ Rules and DQ Metrics being synchronized?

A: Please ensure Connection / System Names between Collibra DQ, Collibra, and Edge exactly match.

A: Please ensure Edge user has admin permissions to write the assets into Data Catalog.

A: Please ensure correct URL specified within the DQ Connector capability e.g. http://cdq.customer.com:9000/.

Q: Is DQ Connector unidirectional?

A: Yes, from Collibra DQ to Data Catalog in Collibra Platform.

Q: How many DQ Connectors can I run simultaneously?

A: Currently, one.

Q: Does the DQ Connector work with On-Prem Collibra DGC?

A: No, any work with on-prem Collibra DGC would be custom API development via Collibra Professional Services or a partner SI.

Q: If I delete a rule from Collibra DQ that I have already synchronized into Data Catalog, will it be deleted from Catalog in the next synchronization?

A: No, the DQ Connector only upserts into Data Catalog. If a rule is deleted from Collibra DQ, it will not be automatically deleted in Data Catalog.

Q: Why are my scores different in Collibra DQ and Data Catalog?

A: Currently, the DQ Connector pulls in the most recent user-defined rules from Collibra DQ. Other components that affect score such as Behaviors, Outliers, Patterns, Dupes, Source are not yet included.

Q: Getting errors when trying to delete both domain that Edge created for DB and the Connection?

A: Please delete Edge created domain via API.

Q: I've hit the synchronize button, how can I tell if my job is complete?

A: Check the Activities circle (button on top right of menu) for the status of your DQ Synchronization.

Q: Why did the rule with joins between two views that I created in Collibra DQ fail to import into Collibra Platform?

A: Because the column from the secondary view is flagged as primary, the rule maps the secondary column to the primary view. This causes the rule to import incorrectly, as the primary column does not exist as the primary view. A known workaround for this is to not select a primary column for this rule, and instead write the rule expression, including the columns required from both the primary and secondary views.