Integrated Databricks Unity Catalog data

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After the synchronization, the Database, Schema, Table, and Column assets are available in Collibra.

  • If you didn't define domain include mappings:
    • The integration automatically created new domains for each Database and Schema asset, using the following naming conventions:
      • Database domain name: System asset domain name > Database name
      • Schema domain name: Database domain name > Schema name
    • These new domains are placed in the same domain as the provided System asset.
  • If you defined domain include mappings, the assets are added to the specified location.
Important 
  • If you move the resulting Table and Column assets to another domain and you run the integration again, the Table and Column assets will be moved back to their initial domain. However, if you move the resulting Database or Schema asset to another domain, the Database asset will remain in the new domain.
    To move all resulting assets to another location permanently, select another System asset in the current synchronization configuration or create a new capability with a synchronization configuration that integrates the data in the new location.

    Example 

    You created System asset A in Domain A and synchronized Databricks. As a result, Table A and Column A have been added to Domain A. Then, you manually moved Table A and column A to Domain B.
    When you synchronize Databricks again, Table A and Column A will move back to Domain A.

  • The Databricks Unity Catalog integration uses different naming conventions compared to the Edge JDBC naming conventions. The applied naming conventions are:

    Asset type Naming convention Example
    Database domainName>catalogName ay-tech-domain-4>oleg-test
    Schema databaseFullName>schemaName ay-tech-domain-4>oleg-test>demo
    Table schemaFullName>tableName ay-tech-domain-4>oleg-test>demo>dinner
    Column tableFullName>columnName(column) ay-tech-domain-4>oleg-test>demo>dinner>recipe(column)

By default, the assets get the Implemented status.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. Their previous status is restored, if they are found in the source system during the next fully successful synchronization.

You can enable a multi-path hierarchy to show it in a tree structure. For the best result, use the following relations in the multi-path hierarchy:

  1. Technology Asset groups Technology Asset
  2. Database contains Table
  3. Technology Asset has Schema
  4. Schema contains Table
  5. Table contains Column

The following image shows the resulting hierarchical table.

Synchronized metadata per asset type

This table shows the metadata for each Databricks asset type.

Asset type

Synchronized metadata

Resource ID
Database

Description from source system

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

To have this field available on the asset page, an admin can navigate to the Database asset template and add Source Tags as a field.

00000000-0000-0000-0011-000500000019

Data Source Type

The value is automatically set to Databricks Unity Catalog.

00000000-0000-0000-0001-000500000018

Any extensible properties defined via the capability.

 
Technology Asset groups / is grouped by Technology Asset 00000000-0000-0000-0000-000000007054
Schema

Description from source system

To have this field available on the asset page, an admin can navigate to the Schema asset template and add Description from source system as a field.

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019

Data Source Type

The value is automatically set to Databricks Unity Catalog.

00000000-0000-0000-0001-000500000018
Any extensible properties defined via the configuration.  
Technology Asset has / belongs to Schema 00000000-0000-0000-0000-000000007024
Table

Description from source system

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019

Any extensible properties defined via the capability.

 
Schema contains / is part of Table 00000000-0000-0000-0000-000000007043
Column

Description from source system

00000000-0000-0000-0001-000500000074

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019
Column Position 00000000-0000-0000-0001-000500000020
Is Nullable 00000000-0000-0000-0001-000500000011
Is Primary Key 00000000-0000-0000-0001-000500000015
Primary Key Name (if the column is the primary key) 00000000-0000-0000-0001-000500000016
Original Name 00000000-0000-0000-0001-000500000032

Technical Data Type

Tip 

For columns that have a structured technical data type, Array or Struct, you can click the button in the Column asset to see the structure of the data in a dialog box. You see the technical data type in the Technical Data Type field in the At a Glance sidebar of the Column asset. For columns that have a structured technical data type, Array or Struct, click the data type name to see the structure of the data in a dialog box.

00000000-0000-0000-0000-000000000219
Column is part of / contains Table 00000000-0000-0000-0000-000000007042
Foreign Key Mapping (if the column is part of a foreign key) 00000000-0000-0000-0000-000000007504
Foreign Key

Tip The full name of the Foreign Key asset has the following pattern : table_full_name > foreign_key_name (foreign_key)

 
Foreign Key Mapping 00000000-0000-0000-0000-000000007504

After the synchronization, Database, Schema, Table, and Column assets become available in Collibra.

Where are the integrated assets located?

  • If you didn't define domain include mappings, the integration automatically creates new domains for each Database and Schema asset, using the following naming conventions:
    • Database domain name: System asset domain name > Database name
    • Schema domain name: Database domain name > Schema name
    These new domains are placed in the same domain as the provided System asset.
  • If you defined domain include mappings, the assets are added to the specified location.

Note If a domain include mapping was defined for the database, but not for the schema, the automatically created domain for the schema is added to the specified database domain.

Important 
  • If you move the resulting Table and Column assets to another domain and you run the integration again, the Table and Column assets will be moved back to their initial domain. However, if you move the resulting Database or Schema asset to another domain, the Database asset will remain in the new domain.
    To move all resulting assets to another location permanently, select another System asset in the current synchronization configuration or create a new capability with a synchronization configuration that integrates the data in the new location.

    Example 

    You created System asset A in Domain A and synchronized Databricks. As a result, Table A and Column A have been added to Domain A. Then, you manually moved Table A and column A to Domain B.
    When you synchronize Databricks again, Table A and Column A will move back to Domain A.

  • The Databricks Unity Catalog integration uses different naming conventions for assets compared to the Edge JDBC naming conventions. The applied naming conventions are:

    Asset type Naming convention Example
    Database domainName>catalogName ay-tech-domain-4>oleg-test
    Schema databaseFullName>schemaName ay-tech-domain-4>oleg-test>demo
    Table schemaFullName>tableName ay-tech-domain-4>oleg-test>demo>dinner
    Column tableFullName>columnName(column) ay-tech-domain-4>oleg-test>demo>dinner>recipe(column)

What is the status of the assets?

The status of the assets depends on the selected value in the Default Asset Status field during synchronization.

  • If Implemented is selected, all assets receive the Implemented status.
  • If No Status is selected, newly created assets receive the first status listed in your operating model statuses and existing assets keep their assigned status.
Important 

When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a integrated asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. Their previous status is restored, if they are found in the source system during the next fully successful synchronization.

Which assets and data are added?

Database, Schema, Table, and Column assets are added.

Tip 

You can enable a multi-path hierarchy to show it in a tree structure. For the best result, use the following relations in the multi-path hierarchy:

  1. Technology Asset groups Technology Asset
  2. Database contains Table
  3. Technology Asset has Schema
  4. Schema contains Table
  5. Table contains Column

The following image shows the resulting hierarchical table.

Synchronized metadata per asset type

This table shows the metadata for each Databricks asset type.

Asset type

Synchronized metadata

Resource ID
Database

Description from source system

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

To have this field available in the asset page, an admin can navigate to the Database asset template and add Source Tags as a field.

00000000-0000-0000-0011-000500000019

Data Source Type

The value is automatically set to Databricks Unity Catalog.

00000000-0000-0000-0001-000500000018

Any extensible properties defined in the configuration.

 
Technology Asset groups / is grouped by Technology Asset 00000000-0000-0000-0000-000000007054
Schema

Description from source system

To have this field available in the asset page, an admin can navigate to the Schema asset template and add Description from source system as a field.

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019

Data Source Type

The value is automatically set to Databricks Unity Catalog.

00000000-0000-0000-0001-000500000018
Any extensible properties defined in the configuration.  
Technology Asset has / belongs to Schema 00000000-0000-0000-0000-000000007024
Table

Description from source system

00000000-0000-0000-0001-000500000074
Owner in source 00000000-0000-0000-0000-200000000001

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019

Any extensible properties defined in the configuration.

 
Schema contains / is part of Table 00000000-0000-0000-0000-000000007043
Column

Description from source system

00000000-0000-0000-0001-000500000074

Source Tags, if the HTTP path has been defined in the capability.

The tag naming convention is <source_tag_name>:<source_tag_value>.
If the value is empty, we show <source_tag_name>:.

Note We fetch source tags from the Databricks Unity Catalog information schema using SQL; everything else is fetched by REST API.

00000000-0000-0000-0011-000500000019
Column Position 00000000-0000-0000-0001-000500000020
Is Nullable 00000000-0000-0000-0001-000500000011
Is Primary Key 00000000-0000-0000-0001-000500000015
Primary Key Name (if the column is the primary key) 00000000-0000-0000-0001-000500000016
Original Name 00000000-0000-0000-0001-000500000032

Technical Data Type

Tip 

For columns that have a structured technical data type, Array or Struct, you can click the button in the Column asset to see the structure of the data in a dialog box. You see the technical data type in the Technical Data Type field in the At a Glance sidebar of the Column asset. For columns that have a structured technical data type, Array or Struct, click the hyperlink to see the structure of the data in a dialog box. In other locations, for example, in Table assets, click the View Array or View Struct button to open the dialog box.

00000000-0000-0000-0000-000000000219
Column is part of / contains Table 00000000-0000-0000-0000-000000007042
Foreign Key Mapping (if the column is part of a foreign key) 00000000-0000-0000-0000-000000007504
Foreign Key

Tip The full name of the Foreign Key asset has the following pattern: table_full_name > foreign_key_name (foreign_key)

 
Foreign Key Mapping 00000000-0000-0000-0000-000000007504

After the synchronization, Databricks AI Model assets become available in Collibra.

Where are the integrated assets located?

The Databricks AI Model assets are available in the domain defined in the configuration.

What is the status of the assets?

The status of the assets depends on the selected value in the Default Asset Status field during synchronization.

  • If Implemented is selected, all assets receive the Implemented status.
  • If No Status is selected, newly created assets receive the first status listed in your operating model statuses and existing assets keep their assigned status.
Important 

When you integrate a data source without applying Include or Exclude Mappings rules, and then later exclude a integrated asset using an Include or Exclude Mapping during resynchronization, the related assets receive the Missing from Source status.

Important 

Do not move the assets to another domain. Doing so may lead to errors during future synchronizations.

The following image shows the result.

Synchronized metadata

Note 
  • We always integrate the latest version of a model.
  • The integrated attributes depend on training run type.
  • If you do not see the listed synchronized metadata, you can add characteristics to the layout on the asset type page.

Asset type

Synchronized metadata

Resource ID
Databricks AI Model

Description from source system

00000000-0000-0000-0001-000500000074
Model Accuracy 00000000-0000-0000-0000-000000000328
Model Precision 00000000-0000-0000-0000-000000000329
Mean Squared Error 00000000-0000-0000-0000-000000000330
Mean Absolute Error 00000000-0000-0000-0000-000000000331
Model Type 00000000-0000-0000-0000-000000000334
Retrain Cycle 00000000-0000-0000-0000-000000000335
Feature Importance 00000000-0000-0000-0000-000000000333
Version 00000000-0000-0000-0000-000000000263
Repository 00000000-0000-0000-0000-000000003120

Any custom metrics defined via the configuration. If you use custom metrics, ensure that you add them to the assignment and layout on the asset type page.

 
AI Use Case uses / used in AI Model 00000000-0000-0000-0000-000000007098
AI Model trained by / trains Asset 00000000-0000-0000-0000-000000007102
AI Model infers from / used to infer Asset 00000000-0000-0000-0000-000000007103
AI Model has output / is output Asset 00000000-0000-0000-0000-000000007104
AI Model uses / is used by AI Model 00000000-0000-0000-0000-000000007106
AI Model is provided by / provides Vendor 00000000-0000-0000-0000-000000007105