Add the Databricks Unity Catalog capability

You can add the Databricks Unity Catalog capability to Edge or Collibra Cloud site to enable AI metadata synchronization between Databricks and Collibra.

Prerequisites

In your Collibra environment:

Steps

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the table, click the name of the site whose status is Healthy.
      The site page opens.
  2. In the Capabilities section, click Add capability.
    The Add capability page appears.
  3. Select the Databricks Unity Catalog synchronization capability template.
  4. Enter the required information.
    FieldDescriptionRequired

    Name

    The name of the capability.

    Yes

    Description

    The description of the capability.

    No

    Databricks Connection
    The Databricks connection to be used.

    Yes

    Save input metadata
    If you select this option, the metadata extracted from the data source will be saved in a file that can be used for troubleshooting. Select this option only on request of Collibra Support.

    No

    Exclude Schemas (will be removed soon, use domain mapping instead)

    Comma-separated list of the schemas that you don't want to integrate.

    No

    (deprecated) Filters and Domain Mapping

    Important This field is deprecated in the latest UI. You can now define the mappings in the integration configuration.
    If you have existing mappings here, they will continue to work. However, we advise you to move them to the integration configuration.

    Text in JSON format to include or exclude databases and schemas, and to configure domain mappings.

    • The text must be in JSON format and can contain an include and an exclude block. You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use.
    • In the include block, you can specify the domain in which specific catalogs or schemas must be ingested. The format is: “Catalog/Database > schema ”: “domain ID”. For example, "HR > address-schema": "30000000-0000-0000-0000-000000000000".
    • In the exclude block, you can specify the catalogs or schemas that you don't want to ingest. For example, "* > test".
    • The exclude block has priority over the include block.
    • If the include block is not present, we ingest all assets into new domains.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • You can use the keyword default as a domain ID. In that case, the catalog or schema will be ingested in a new domain.
    • A match with a database has priority over a match with a schema.
    • The integration fails before the synchronization starts, if one or more domain IDs specified in the include block don't exist.
    • The integration fails before the synchronization starts if a domain ID is left empty in the include block.
    • You can use the ? and * wildcards in the catalog and schema names. If a catalog or schema matches multiple lines, the most detailed match is taken into account.

    No

    Default Asset Status

    Define the status that assets need to receive during the integration synchronization.

    • No Status (default): With the first synchronization, assets receive the first status listed in the Operating Model statuses. During a resynchronization, the status is not updated. For example, if you change an asset status from "Candidate" to "Review" before resynchronization, the status remains "Review."
    • Implemented: All assets get the "Implemented" status.

    Note This field can currently be overruled by the Default Asset Status (Deprecated) field in the synchronization configuration.

    Yes

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important 
    • Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.
    • Only use Log level if your data source is a commercial JDBC offering. For more information, go to the Collibra Marketplace.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    Only complete this field on the request of or together with Collibra Support.

    No

  5. Click Add.
    The capability is added to the Edge or Collibra Cloud site.
    The fields become read-only.

What's next

You can now synchronize Databricks Unity Catalog for AI.