Synchronize Databricks Unity Catalog for AI
After Edge is ready to integrate Databricks Unity Catalog, you can start the synchronization process.
Synchronizing Databricks Unity Catalog is the process of integrating AI metadata from the databases connected to Databricks Unity Catalog and making adding this Databricks AI Model assets in Collibra.
You can either synchronize manually or automate the process by adding a synchronization schedule.
Tip
Prerequisites
In your Collibra environment:
- You have added the Databricks Unity Catalog capability for the connection.
- For metadata synchronization, you know the System asset to use to add the Databricks Unity Catalog assets.
- If you have registered Databricks databases via the Databricks JDBC driver before, use the same System asset.
- If you never registered Databricks databases before, create a new System asset manually and use that one.
- If you will be creating multiple Databricks Unity Catalog integration capabilities, use a unique System asset for each one.
- For AI model synchronization, you know in which domain you want to add the Databricks AI Model assets.
- You have a resource role with the Configure external system resource permission, for example, Owner.
- You have a global role with the Catalog global permission, for example, Catalog Author.
- You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.
Steps
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click Integrations.
The Integrations page opens. - Click the Integration Configuration tab.
- Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Capabilities column.
The synchronization configuration page opens. - In Ingestion Type, select AI ingestion to integrate only AI model metadata from Databricks Unity Catalog.
To integrate Databricks Unity Catalog metadata, go to the Databricks Unity Catalog documentation instead. - In the Synchronization Configuration section, click the Edit icon.
- In Domain, select the Domain asset in which you want to add the Databricks AI Model assets.
Important Ensure that you select a domain of the type Technology Asset.
-
Optionally, in Custom AI Metrics Mappings, define the custom Databricks AI Model metrics that you want to integrate. You can do this by adding the mapping between the custom metric and the Collibra attribute.
Show available custom metrics
- accuracy_score,
- exact_match
- example_count
- f1_score
- f1_score_micro
- f1_score_macro
- false_negatives
- false_positives
- log_loss
- max_error
- mean_absolute_error
- mean_absolute_percentage_error
- mean_on_target
- mean_squared_error
- precision
- precision_recall_auc
- r2_score
- recall
- roc_auc
- root_mean_squared_error
- sum_on_target
- token_count
- true_negatives
- true_positives
For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.
ImportantIf you use this feature, make sure to add any custom attributes/characteristics, as needed, to the asset type assignment.
To add a custom AI metric mapping:
- Click Add Custom AI Metrics Mappings.
- In Metric, select the custom metric from the list of available Databricks AI metrics.
- In Attribute, select the attribute in which you want to see the value.
Make sure to select an attribute that is included in the Databricks AI Model asset type assignment.
- Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
- By default, No is selected, and all accessible AI models are integrated.
- If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration.
Important When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.For more information about these pre-trained Databricks AI models, go to the Databricks documentation.
- Optionally, select the Ingest input datasets of AI Models checkbox if you want to ingest Catalog Database, table, column or file assets.
- Click Save.
- Click Synchronize.
A notification indicates the synchronization has started.
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click Integrations.
The Integrations page opens. - Click the Integration Configuration tab.
- Locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Capabilities column.
The synchronization configuration page opens. - In Ingestion Type, select AI ingestion to integrate only AI model metadata from Databricks Unity Catalog.
To integrate Databricks Unity Catalog metadata, go to the Databricks Unity Catalog documentation instead. - In the Synchronization Configuration section, click the Edit icon.
- In Domain, select the Domain asset in which you want to add the Databricks AI Model assets.
Important Ensure that you select a domain of the type Technology Asset.
-
Optionally, in Custom AI Metrics Mappings, define the custom Databricks AI Model metrics that you want to integrate. You can do this by adding the mapping between the custom metric and the Collibra attribute.
Show available custom metrics
- accuracy_score,
- exact_match
- example_count
- f1_score
- f1_score_micro
- f1_score_macro
- false_negatives
- false_positives
- log_loss
- max_error
- mean_absolute_error
- mean_absolute_percentage_error
- mean_on_target
- mean_squared_error
- precision
- precision_recall_auc
- r2_score
- recall
- roc_auc
- root_mean_squared_error
- sum_on_target
- token_count
- true_negatives
- true_positives
For an overview of the out-of-the-box metrics we integrate by default, go to Integrated Databricks Unity Catalog data.
ImportantIf you use this feature, make sure to add any custom attributes/characteristics, as needed, to the asset type assignment.
To add a custom AI metric mapping:
- Click Add Custom AI Metrics Mappings.
- In Metric, select the custom metric from the list of available Databricks AI metrics.
- In Attribute, select the attribute in which you want to see the value.
Make sure to select an attribute that is included in the Databricks AI Model asset type assignment.
- Optionally, in Exclude system AI models, indicate that you don't want to integrate the pretrained Databricks AI models.
- By default, No is selected, and all accessible AI models are integrated.
- If you select Yes, the AI models in the "system" Databricks catalog will be excluded from the integration.
Important When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a registered asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.For more information about these pre-trained Databricks AI models, go to the Databricks documentation.
- Optionally, select the Ingest input datasets of AI Models checkbox if you want to ingest Catalog Database, table, column or file assets.
- Click Save.
- In the Synchronization Schedule section, click the Add synchronization schedule icon.
- Enter the information.
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule. - Click Save
Depending on your selection, the synchronization job integrates the AI models, metadata of the databases, schemas, tables, and columns, or both.
After the synchronization:
- You can view a summary of the results from the Activities list.
- For metadata synchronization, the resulting assets get a relation to the System asset that you selected.
- For information on the integrated data, go to Integrated Databricks Unity Catalog data.