Create an Azure Data Lake Storage connection to an Edge site

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Prerequisites

  • In Azure:
    • To integrate ADLS folders, you need an Azure Service Principal user that is defined in Azure and that has permissions to list the files which need to be integrated into Collibra. The Azure Service Principal user must have the "Reader" and "Storage Blob Data Reader" roles for the storage locations of your data. For information, go to the Azure documentation.
    • If you use Microsoft Purview:
      • The Azure Service Principal user must have the "Data reader" role to fetch entities/assets from the Microsoft Purview Rest API. For information, go to the Microsoft Purview documentation.
      • If your ADLS storage is private, make sure that the Allow Azure services on the trusted services list to access this storage account checkbox in the NetworkingFirewalls and virtual networks is selected.
  • You have created and installed an Edge site.
  • If you have configured a forward proxy for your Edge site and want the integration API calls to bypass this proxy, update the Edge nonProxy property:
    • Adding login.microsoftonline.com allows the API calls that get access tokens to bypass the proxy.
    • Adding dfs.core.windows.net or blob.core.windows.net allows the ADLS API calls to bypass the proxy.
    • Adding purview.azure.com allows the Purview APIs to bypass the proxy.
  • You have given the Edge Site role the required permissions.
  • You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.

Steps

  1. Open an Edge site.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of the Edge sites.
    3. In the table, click the name of the Edge site whose status is Healthy.
      The Edge site page opens.
  2. In the Connections section, click Create connection.
    The Create connection page appears.
  3. Enter the required information.
    FieldDescriptionRequired

    Connection settings

    This section contains the general settings of your connection.

    Name

    The name of the Edge connection for Azure Data Lake Storage.

    Yes
    Description

    The description of the connection.

    No
    Connection provider

    The connection provider, which determines the available connection parameters.

    Select the Azure connection to connect to Azure Data Lake Storage.

    Yes

    Connection parameters

    This section contains the settings to connect to your data source.
    Service Principal ID

    The Application account ID to connect to the Azure.
    For information on the Azure Service Principal user and the Application ID, go to the Azure documentation.

    Yes
    Service Principal Secret
    The application secret for the Service Principal.
    For information on the application secret value, go to the Azure documentation.
    Yes
    Encryption options

    Select the type of encryption used to store the Secret Access Key.

    The default is To be encrypted by Edge management server.

    Yes
    Tenant ID

    The Tenant ID of your Azure Active Directory.
    For information on the Directory (tenant) ID, go to the Azure documentation.

    Yes
  4. Click Create.
    The connection is added to the Edge site.

What's next?

You can now add the ADLS synchronization capability to an Edge site.

Available vaults

Tip 

You can use a vault to add your data source information to your Edge site connection.

None
AWS Secrets Manager
Azure Key Vault
CyberArk Vault
Google Secret Manager
HashiCorp Vault
 

Prerequisites

  • In Azure:
    • To integrate ADLS folders, you need an Azure Service Principal user that is defined in Azure and that has permissions to list the files which need to be integrated into Collibra. The Azure Service Principal user must have the "Reader" and "Storage Blob Data Reader" roles for the storage locations of your data. For information, go to the Azure documentation.
    • If you use Microsoft Purview:
      • The Azure Service Principal user must have the "Data reader" role to fetch entities/assets from the Microsoft Purview Rest API. For information, go to the Microsoft Purview documentation.
      • If your ADLS storage is private, make sure that the Allow Azure services on the trusted services list to access this storage account checkbox in the NetworkingFirewalls and virtual networks is selected.
  • You have created and installed an Edge site.
  • You have given the Edge Site role the required permissions.
  • You have added a vault to your Edge site.
  • If your data source connection requires a file from your vault, the file must be encoded into Base64 and stored as a regular secret in your vault.
  • If you have configured a forward proxy for your Edge site and want the integration API calls to bypass this proxy, update the Edge nonProxy property:
    • Adding login.microsoftonline.com allows the API calls that get access tokens to bypass the proxy.
    • Adding dfs.core.windows.net or blob.core.windows.net allows the ADLS API calls to bypass the proxy.
    • Adding purview.azure.com allows the Purview APIs to bypass the proxy.
  • You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.

Steps

  1. Open an Edge site.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of the Edge sites.
    3. In the table, click the name of the Edge site whose status is Healthy.
      The Edge site page opens.
  2. In the Connections section, click Create connection.
    The Create connection page appears.
  3. Select the Azure connection to connect to Azure Data Lake Storage.
  4. Enter the required information.
    FieldDescriptionRequired
    Name

    The name of the Edge connection for Azure Data Lake Storage.

    Yes
    Description

    The description of the connection.

    No
    Vault The vault where you store your data source values. No
    Service Principal ID

    The Application account ID to connect to the Azure.
    For information on the Azure Service Principal user and the Application ID, go to the Azure documentation.

    Yes
    Service Principal Secret

    The application secret for the Service Principal.
    For information on the application secret value, go to the Azure documentation.

    Yes
    Tenant ID

    The Tenant ID of your Azure Active Directory.
    For information on the Directory (tenant) ID, go to the Azure documentation.

    Yes
  5. Click Create.
    The connection is added to the Edge site.

What's next?

You can now add the ADLS synchronization capability to an Edge site.