Synchronize Databricks Unity Catalog

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Synchronizing Databricks Unity Catalog is the process of integrating metadata from the databases connected to Databricks Unity Catalog and making this metadata available in Collibra Data Intelligence Platform. You can also use the synchronization to add Databricks AI Model assets in Collibra.
You can synchronize manually, or you can automate it by adding a synchronization schedule.

Before you begin

  • You have created a connection to Databricks in your Edge site.
  • You have added the Databricks Unity Catalog capability for the connection.
  • For metadata synchronization, you know in which System asset you want to add the Databricks Unity Catalog assets.
    • If you have registered Databricks databases via the JDBC driver before, use the same System asset.
    • If you never registered Databricks databases before, create a new System asset manually and use that one.
  • For AI model synchronization, you know in which domain you want to add the Databricks AI Model assets.

Requirements and permissions

Steps

  1. On the main toolbar, click Products icon, and then click Catalog.
    The Data Catalog Home opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. On the main toolbar, click .
    The Create dialog box appears.
  5. In the Register with Edge section of the Create dialog box, click Register a data sourceIntegration Configuration.
    The Register contentIntegration Configuration tab page opens.
  6. Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Data sources/Capabilities column.
    The synchronization configuration page opens.
  7. In the Configuration SectionSynchronization Configuration section, click Add Configuration.
  8. In Ingestion Type, select whether you want to integrate the metadata, AI models, or metadata and AI models.

    Depending on your selection, extra fields appear. Your selection will also impact the integrated Databricks Unity Catalog data.

  9. Complete the fields as needed.
    FieldAvailable if you integrateAction
    SystemMetadata

    In System, select the System asset in which you want to link the Databricks assets.

    Default Asset Status

    Metadata

    AI models

    In Default Asset Status, select how you want to set the status of the synchronized assets. The possible values are:

    • Implemented: Implemented means that all assets receive the Implemented status.
    • No Status: No status means that newly created assets receive the first status listed in your operating model statuses, and that existing assets keep their assigned status.
      Example 

      In these examples, you register a data source with assets A, B, and C.

    Domain Include MappingsMetadata

    Optionally, in Domain Include Mappings, specify which databases and schemas that you want to integrate and the Collibra domains where they need to be added.

    Important 
    • If you don't define include mappings, the integration automatically creates new domains for each Database and Schema asset in the same domain as the System asset.
    • A match with a database has priority over a match with a schema.
    Domain Exclude MappingsMetadata

    Optionally, in Domain Exclude Mappings, specify the path to databases and schemas in Databricks Unity Catalog that you don't want to integrate.

    Note The exclude mapping has priority over the include mapping.

    Extensible Properties MappingsMetadata

    Via the Extensible Properties Mapping field, Databricks Unity Catalog allows you to add additional properties to Catalog, Schema, and Table objects.

    Optionally, in Extensible Properties Mappings, specify which additional default system properties or custom properties that you want to integrate from Databricks Unity Catalog into Collibra.
    You can integrate most values from the Details page from Catalog, Schema, Table, and View objects into specific attributes in Collibra assets. You do this by adding the mapping between the fields for the objects in Databricks Unity Catalog and the Collibra attribute.

    Important 
    • If you use this feature, make sure to add all required characteristics to the asset type assignments.
    • The name of the property starts with the object type, for example catalogs.systemAttributes.metastore_id.
      catalogs refers to Database assets, schemas to Schema assets, table to Table assets, and views to Database View assets.
    • The following system properties are supported:  
      • Catalogs: "browse_only", "catalog_type", "connection_name", "created_at", "created_by", "isolation_mode", "metastore_id", "provider_name", "provisioning_info", "securable_kind", "securable_type", "share_name", "storage_location", "storage_root", "updated_at" , and "updated_by".
      • Schemas: "catalog_type", "created_at", "created_by", "metastore_id", "securable_type", "securable_kind", "storage_location", "storage_root", "updated_at", and "updated_by".
      • Table: "access_point", "catalog_name", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "schema_name", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
      • Views: "access_point", "catalog_name", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "schema_name", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
    DomainAI modelsIn Domain, select the domain in which you want to add the Databricks AI Model assets.
    Custom AI Metrics MappingsAI models

    Optionally, in Custom AI Metrics Mappings, define which custom Databricks AI Model metrics you want to integrate. You do this by adding the mapping between the custom metric and the Collibra attribute.
    For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.

    Important 

    If you use this feature, make sure to add all required characteristics to the Databricks AI Model asset type assignment.

    Exclude system AI ModelsAI models

    Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
    By default, No is selected, and all accessible AI models are integrated.
    If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration. For more information about these pretrained Databricks AI models, go to the Databricks documentation.

    Important 

    When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets will receive the Missing from Source status.

  10. Click Save Configuration.
  11. Click Synchronize.
    A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon, and then click Catalog.
    The Data Catalog Home opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. On the main toolbar, click .
    The Create dialog box appears.
  5. In the Register with Edge section of the Create dialog box, click Register a data sourceIntegration Configuration.
    The Register contentIntegration Configuration tab page opens.
  6. Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Data sources/Capabilities column.
    The synchronization configuration page opens.
  7. In the Configuration SectionSynchronization Configuration section, click Add Configuration.
  8. In Ingestion Type, select whether you want to integrate the metadata, AI models, or metadata and AI models.

    Depending on your selection, extra fields appear. Your selection will also impact the integrated Databricks Unity Catalog data.

  9. Complete the fields as needed.
    FieldAvailable if you integrateAction
    SystemMetadata

    In System, select the System asset in which you want to link the Databricks assets.

    Default Asset Status

    Metadata

    AI models

    In Default Asset Status, select how you want to set the status of the synchronized assets. The possible values are:

    • Implemented: Implemented means that all assets receive the Implemented status.
    • No Status: No status means that newly created assets receive the first status listed in your operating model statuses, and that existing assets keep their assigned status.
      Example 

      In these examples, you register a data source with assets A, B, and C.

    Domain Include MappingsMetadata

    Optionally, in Domain Include Mappings, specify which databases and schemas that you want to integrate and the Collibra domains where they need to be added.

    Important 
    • If you don't define include mappings, the integration automatically creates new domains for each Database and Schema asset in the same domain as the System asset.
    • A match with a database has priority over a match with a schema.
    Domain Exclude MappingsMetadata

    Optionally, in Domain Exclude Mappings, specify the path to databases and schemas in Databricks Unity Catalog that you don't want to integrate.

    Note The exclude mapping has priority over the include mapping.

    Extensible Properties MappingsMetadata

    Via the Extensible Properties Mapping field, Databricks Unity Catalog allows you to add additional properties to Catalog, Schema, and Table objects.

    Optionally, in Extensible Properties Mappings, specify which additional default system properties or custom properties that you want to integrate from Databricks Unity Catalog into Collibra.
    You can integrate most values from the Details page from Catalog, Schema, Table, and View objects into specific attributes in Collibra assets. You do this by adding the mapping between the fields for the objects in Databricks Unity Catalog and the Collibra attribute.

    Important 
    • If you use this feature, make sure to add all required characteristics to the asset type assignments.
    • The name of the property starts with the object type, for example catalogs.systemAttributes.metastore_id.
      catalogs refers to Database assets, schemas to Schema assets, table to Table assets, and views to Database View assets.
    • The following system properties are supported:  
      • Catalogs: "browse_only", "catalog_type", "connection_name", "created_at", "created_by", "isolation_mode", "metastore_id", "provider_name", "provisioning_info", "securable_kind", "securable_type", "share_name", "storage_location", "storage_root", "updated_at" , and "updated_by".
      • Schemas: "catalog_type", "created_at", "created_by", "metastore_id", "securable_type", "securable_kind", "storage_location", "storage_root", "updated_at", and "updated_by".
      • Table: "access_point", "catalog_name", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "schema_name", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
      • Views: "access_point", "catalog_name", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "schema_name", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
    DomainAI modelsIn Domain, select the domain in which you want to add the Databricks AI Model assets.
    Custom AI Metrics MappingsAI models

    Optionally, in Custom AI Metrics Mappings, define which custom Databricks AI Model metrics you want to integrate. You do this by adding the mapping between the custom metric and the Collibra attribute.
    For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.

    Important 

    If you use this feature, make sure to add all required characteristics to the Databricks AI Model asset type assignment.

    Exclude system AI ModelsAI models

    Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
    By default, No is selected, and all accessible AI models are integrated.
    If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration. For more information about these pretrained Databricks AI models, go to the Databricks documentation.

    Important 

    When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets will receive the Missing from Source status.

  10. Click Save Configuration.
  11. In the Synchronization Schedule section, click Add Schedule.
  12. Enter the information.
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.
  13. Click Save

What's next?

Depending on your selection, the synchronization job integrates the AI models, the metadata of the databases, schemas, tables, and columns, or both.
After the synchronization:

The synchronization job integrates the metadata of the databases, schemas, tables and columns.
After the synchronization: