Integrated Azure Data Lake Storage data

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After you have synchronized the data, the integration of the Azure Data Lake Storage file system is completed, and the resulting assets are available in the domain that was specified in the crawler. The status of assets depends on the selected value in the Default Asset Status field in the capability.If No Status is selected, newly created assets receive the first status listed in your Operating Model statuses, and existing assets keep their assigned status.If Implemented is selected, all assets receive the Implemented status.

Warning Do not move the assets to another domain. Doing so may lead to errors during future synchronizations.

Tip ADLS synchronization relies on UUIDs.

Note If a temporary communication issue results in a partial synchronization, the status of the assets that were not synchronized becomes Missing from source. If the assets are identified in the source system during the next fully successful synchronization, the previous statuses are restored.

By default, the assets are shown in a plain list, but you can enable a multi-path hierarchy to show it in a tree structure. The resulting assets depend on whether you use Microsoft Purview.

Synchronization results without Microsoft Purview

For the best result, use the following relations when you define a multi-path hierarchy:

  • File Storage contains Storage Container
  • Storage Container contains Storage Container
  • Directory contains Directory
  • Storage Container contains File

Synchronization results with Microsoft Purview

For the best result, use the following relations when you define a multi-path hierarchy:

  • File Storage contains Storage Container
  • Storage Container contains Storage Container
  • Directory contains Directory
  • Storage Container contains File
  • File contains Table
  • Table contains Column

Synchronized metadata per asset type

This table shows the metadata for each ADLS asset type.

Asset type

Synchronized metadata

Public ID
ADLS Storage Account File Storage contains / is part of Storage Container FileStorageContainsFileContainer
ADLS Container

Location

Location
Storage Container contains / is part of Storage Container FileContainerContainsFileContainer
Directory Storage Container contains / is part of Storage Container FileContainerContainsFileContainer
Directory contains / is part of directory DirectoryContainsDirectory
File File Type FileType
Size Size
Storage Container contains / contained in File FileContainerContainsFile
Table

Description

Description
File contains / is part of Table FileContainsTable
Column

Description

Description
Column Position ColumnPosition

Technical Data Type

Tip 

For columns that have a structured technical data type, Array or Struct, you can click the button in the Column asset to see the structure of the data in a dialog box. You see the technical data type in the Technical Data Type field in the At a glance sidebar of the Column asset. If this sidebar is hidden, click . For columns that have a structured technical data type, Array or Struct, click the hyperlink to see the structure of the data in a dialog box. In other locations, for example in Table assets, click the View Array or View Struct button to open the dialog box.
This is supported for AVRO, CSV, JSON, ORC, PARQUET, PSV, SSV, TSV, TXT, and XML.
In the capability settings, you can define the maximum level you want to see in the structure.

TechnicalDataType
Column is part of / contains Table ColumnIsPartOfTable