Synchronize Databricks Unity Catalog for AI

After Edge is ready to integrate Databricks Unity Catalog, you can start the synchronization process.

Synchronizing Databricks Unity Catalog is the process of integrating AI metadata from the databases connected to Databricks Unity Catalog and making adding this Databricks AI Model assets in Collibra.

You can either synchronize manually or automate the process by adding a synchronization schedule.

Tip  You can also integrate the metadata of Databricks databases, schemas, tables, database views, and columns using the Databricks Unity Catalog metadata integration. For more information, go to the Databricks Unity Catalog documentation.

Prerequisites

In your Collibra environment:

Steps

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Capabilities column.
    The synchronization configuration page opens.
  5. In Ingestion Type, select AI ingestion to integrate only AI model metadata from Databricks Unity Catalog.
    To integrate Databricks Unity Catalog metadata, go to the Databricks Unity Catalog documentation instead.
  6. In the Synchronization Configuration section, click the Edit icon.
  7. In Domain, select the Domain asset in which you want to add the Databricks AI Model assets.

    Important Ensure that you select a domain of the type Technology Asset.

  8. Optionally, in Custom AI Metrics Mappings, define the custom Databricks AI Model metrics that you want to integrate. You can do this by adding the mapping between the custom metric and the Collibra attribute.

    For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.

    Important 

    If you use this feature, make sure to add any custom attributes/characteristics, as needed, to the asset type assignment.

    To add a custom AI metric mapping:

    1. Click Add Custom AI Metrics Mappings.
    2. In Metric, select the custom metric from the list of available Databricks AI metrics.
    3. In Attribute, select the attribute in which you want to see the value.
      Make sure to select an attribute that is included in the Databricks AI Model asset type assignment.
  9. Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
    1. By default, No is selected, and all accessible AI models are integrated.
    2. If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration.
    Important When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.

    For more information about these pre-trained Databricks AI models, go to the Databricks documentation.

  10. Optionally, select the Ingest input datasets of AI Models checkbox if you want to ingest Catalog Database, table, column or file assets.
    1. Click Save.
    2. Click Synchronize.
      A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Capabilities column.
    The synchronization configuration page opens.
  5. In Ingestion Type, select AI ingestion to integrate only AI model metadata from Databricks Unity Catalog.
    To integrate Databricks Unity Catalog metadata, go to the Databricks Unity Catalog documentation instead.
  6. In the Synchronization Configuration section, click the Edit icon.
  7. In Domain, select the Domain asset in which you want to add the Databricks AI Model assets.

    Important Ensure that you select a domain of the type Technology Asset.

  8. Optionally, in Custom AI Metrics Mappings, define the custom Databricks AI Model metrics that you want to integrate. You can do this by adding the mapping between the custom metric and the Collibra attribute.

    For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.

    Important 

    If you use this feature, make sure to add any custom attributes/characteristics, as needed, to the asset type assignment.

    To add a custom AI metric mapping:

    1. Click Add Custom AI Metrics Mappings.
    2. In Metric, select the custom metric from the list of available Databricks AI metrics.
    3. In Attribute, select the attribute in which you want to see the value.
      Make sure to select an attribute that is included in the Databricks AI Model asset type assignment.
  9. Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
    1. By default, No is selected, and all accessible AI models are integrated.
    2. If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration.
    Important When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.

    For more information about these pre-trained Databricks AI models, go to the Databricks documentation.

  10. Optionally, select the Ingest input datasets of AI Models checkbox if you want to ingest Catalog Database, table, column or file assets.
  11. Click Save.
  12. In the Synchronization Schedule section, click the Add synchronization schedule icon.
  13. Enter the information.
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.
  14. Click Save

What's next

Depending on your selection, the synchronization job integrates the AI models, metadata of the databases, schemas, tables, and columns, or both.
After the synchronization: