Integrated Databricks Unity Catalog data
Databricks Unity Catalog objects are mapped to Collibra assets during synchronization. Once synchronization is completed, you can explore asset attributes or metadata synchronized per asset type.
After the synchronization, Database, Schema, Table, Database View, and Column assets become available in Collibra.
Asset location
- If you don't define any domain include mappings, the integration automatically creates new domains for each Database and Schema asset, using the following naming conventions:
- Database domain name: System asset domain name > Database name
- Schema domain name: Database domain name > Schema name
- If you define domain include mappings, only the referenced assets are integrated in their specified location.
-
If you move the resulting Table and Column assets to another domain and you run the integration again, the Table and Column assets will be returned to their initial domain. However, if you move the resulting Database or Schema asset to another domain, the Database asset will remain in the new domain.
To move all resulting assets to another location permanently, select another System asset in the current synchronization configuration or create a new capability with a synchronization configuration that integrates the data in the new location.ExampleYou created System asset A in Domain A and synchronized Databricks. As a result, Table A and Column A have been added to Domain A. Then, you manually moved Table A and column A to Domain B. When you synchronize Databricks again, Table A and Column A will move back to Domain A.
-
The Databricks Unity Catalog integration uses different naming conventions for assets compared to the Edge JDBC naming conventions. The applied naming conventions are:
Asset type Naming convention Example Database domainName>catalogName ay-tech-domain-4>oleg-test Schema databaseFullName>schemaName ay-tech-domain-4>oleg-test>demo Table schemaFullName>tableName ay-tech-domain-4>oleg-test>demo>dinner Database View schemaFullName>viewName ay-tech-domain-4>oleg-test>demo>dinner_view Column tableFullName>columnName(column) ay-tech-domain-4>oleg-test>demo>dinner>recipe(column)
Note If you selected V0 for the Version
field and add a domain include mapping for the database but not for a related schema, the automatically created domain for the schema is added in the same community as the domain of the database.
Asset status
The status of the assets depends on the selected value in the Default Asset Status field during synchronization.
- If Implemented is selected, all assets receive the "Implemented" status.
- If No Status is selected, newly created assets receive the first status listed in your operating model statuses and existing assets keep their assigned status.
When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a integrated asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.
Note If a temporary communication issue results in a partial synchronization, the status of the assets that were not synchronized becomes Missing from source. If the assets are identified in the source system during the next fully successful synchronization, the previous statuses are restored.
Integrated assets and their metadata
Database, Schema, Table, Database View, and Column assets are added.
You can enable a multi-path hierarchy to show the assets in a tree structure. For the best results, use the following relations in the multi-path hierarchy:
- Technology Asset groups Technology Asset
- Database contains Table
- Technology Asset has Schema
- Schema contains Table
- Table contains Column
The following image shows the resulting hierarchical table.
Synchronized metadata per asset type
This table shows the metadata for each Databricks asset type.
|
Asset type |
Synchronized metadata |
Public ID |
|---|---|---|
| Database |
Description from source system |
DescriptionFromSourceSystem |
| Owner in source | OwnerInSource | |
| Source Tags, if the HTTP path has been defined in the capability and the Databricks access token or OAuth Client has the required permissions. The tag naming convention is We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API. |
SourceTags | |
|
Data Source Type The value is automatically set to Databricks Unity Catalog. |
DataSourceType | |
|
Any extensible properties defined in the configuration. |
||
| Technology Asset groups / is grouped by Technology Asset | TechnologyAssetHasSchema | |
| Schema |
Description from source system |
DescriptionFromSourceSystem |
| Owner in source | OwnerInSource | |
| Source Tags, if the HTTP path has been defined in the capability and the Databricks access token or OAuth Client has the required permissions. The tag naming convention is We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API. |
SourceTags | |
|
Data Source Type The value is automatically set to Databricks Unity Catalog. |
DataSourceType | |
| Any extensible properties defined in the configuration. | ||
| Technology Asset has / belongs to Schema | TechnologyAssetHasSchema | |
| Table |
Description from source system |
DescriptionFromSourceSystem |
| Owner in source | OwnerInSource | |
| Source Tags, if the HTTP path has been defined in the capability and the Databricks access token or OAuth Client has the required permissions. The tag naming convention is We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API. |
SourceTags | |
|
Any extensible properties defined in the configuration. |
||
| Schema contains / is part of Table | SchemaContainsTable | |
| Database View (includes Databricks metric views) |
Description from source system |
DescriptionFromSourceSystem |
| Owner in source | OwnerInSource | |
| Schema contains / is part of Table | SchemaContainsTable | |
| Column |
Description from source system |
DescriptionFromSourceSystem |
| Source Tags, if the HTTP path has been defined in the capability and the Databricks access token or OAuth Client has the required permissions. The tag naming convention is We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API. |
SourceTags | |
| Column Position | ColumnPosition | |
| Is Nullable | IsNullable | |
| Is Primary Key | IsPrimaryKey | |
| Primary Key Name (if the column is the primary key) | PrimaryKeyName | |
| Original Name | OriginalName | |
|
Technical Data Type Tip
You see the technical data type in the Technical Data Type field in the At a glance sidebar of the Column asset. If the At a glance sidebar is hidden, click |
TechnicalDataType | |
| Column is part of / contains Table | ColumnIsPartOfTable | |
| Foreign Key Mapping (if the column is part of a foreign key) | ForeignKeyMapping | |
| Foreign Key | The full name of the Foreign Key asset has the following pattern: table_full_name > foreign_key_name (foreign_key) | |
| Foreign Key Mapping | ForeignKeyMapping |
After the synchronization, Databricks AI Model assets become available in Collibra.
Asset location
The Databricks AI Model assets are available in the domain defined in the configuration.
Asset status
The status of the assets depends on the selected value in the Default Asset Status field during synchronization.
- If Implemented is selected, all assets receive the "Implemented" status.
- If No Status is selected, newly created assets receive the first status listed in your operating model statuses and existing assets keep their assigned status.
When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude an integrated asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.
Do not move the assets to another domain. Doing so may lead to errors during future synchronizations.
The following image shows the result.
Synchronized metadata
Important considerations:
- We always integrate the latest version of a model.
- The integrated attributes depend on training run type.
- If you don't see the listed synchronized metadata, you can add characteristics to the layout on the asset type page.
|
Asset type |
Synchronized metadata |
Public ID |
|---|---|---|
| Databricks AI Model Version |
Description from source system |
DescriptionFromSourceSystem |
| Model Accuracy | ModelAccuracy | |
| Model Precision | ModelPrecision | |
| Mean Squared Error | MeanSquaredError | |
| Mean Absolute Error | MeanAbsoluteError | |
| Feature Importance | Feature Importance | |
| Version | Version | |
|
Any custom metrics defined via the configuration. If you use custom metrics, ensure that you add them to the assignment and layout on the asset type page. Show available custom metrics
|
||
| AI Base Model | Description | Description |
| Model Name | ModelName | |
| Version | Version | |
| Model Lifecycle Status | ModelLifecycleStatus | |
| Creation Date in Source | CreationDateInSource | |
| Retirement Date in Source | RetirementDateInSource | |
| Content Filter | ContentFilter | |
| AI System Provider | AISystemProvider | |
| AI Monitor | Data Drift Detection Enabled | DataDriftDetection |
| Prediction Drift Detection Enabled | PredictionDriftDetection | |
| Schedule | Schedule | |
| Alert Configuration | AlertConfiguration | |
| AI Model Deployment | Description from Source | DescriptionFromSourceSystem |
| Initiating User in Source | InitiatingUserInSource | |
| Creation Date in Source | CreationDateInSource | |
| Modification Date in Source | ModificationDateInSource | |
| Retirement Date in Source | RetirementDateInSource | |
| Implemented Content Filtering | ImplementedContentFiltering | |
| Compute Configuration | ComputeConfiguration | |
| AI Endpoint | Access Method | AccessMethod |
| Access Instructions | AccessInstructions | |
| Traffic Split | TrafficSplit |
Depending on your needs and setup, you can profile and classify the data for the integrated assets. For more information, go to Steps: Integrate Databricks Unity Catalog via Edge.