About the Azure Data Lake Storage file system integration

Important 

In Collibra 2024.05, we've launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

The Azure Data Lake Storage file system integration allows for the registration of Azure Data Lake Storage (ADLS) as a data source in Collibra and the synchronization of the metadata. ADLS is a service provided on Microsoft Azure Blob Storage. After the integration, the files and directories of the ADLS file system are represented in Collibra by specific asset types, retaining the original names.

Important 
  • The ADLS integration supports Azure Data Lake Storage Gen2.
    Azure Data Lake Storage Gen1 is not supported. To verify which Azure version you are using, check the Account Kind in the Overview section in your Azure storage account details. StorageV2 indicates you are using Gen2.

  • You can integrate an Azure Data Lake Storage file system only via Edge.

For detailed information on Microsoft Azure Data Lake Storage Gen2, go to the Azure documentation.

About Microsoft Purview

The ADLS integration supports Microsoft Purview, a service used for schema discovery.
This allows you to integrate the schemas, tables and columns from the files into one single File asset in Collibra rather than multiple File assets. For more details, go to the ADLS operating model.

Important 
  • Even if you use Microsoft Purview to integrate schemas and tables, we don't currently support profiling and sampling.
  • Currently, the ADLS integration can ingest up to 100,000 assets from Purview.

For detailed information on Microsoft Purview, go to the Purview documentation.

What's next?

Steps overview: Integrate an Azure Data Lake Storage file system

Learn more

To learn about the ADLS integration and watch videos, follow our University course.