Troubleshooting Azure Data Lake Storage integration

1 Where do I find the Edge Site Id and Job Id?

If you report an error with Azure Data Lake Storage (ADLS) integration, the Customer Support team can ask you for the Edge Site Id and Job Id. The team needs this information to access details about the error.

To retrieve the Job Id, see View the summary of an Azure Data Lake Storage synchronization.

To retrieve the Site Id:

  1. Go to Settings.
  2. In the Edge section, click Sites.
  3. Click the name of the Edge site.
  4. The Edge site Id is available in the ID field.

2 You receive the error when synchronizing ADLS: You are not allowed to perform this action.

Issue: You receive the following error: Error while processing crawler catalogingestion: Import job failed with message. You are not allowed to perform this action..

Reason: The ADLS synchronization capability does not store any user credentials. It calls the Import API using the Edge site user credentials. By default, the Edge site user cannot add any new assets in Collibra.

Solution: Make sure to give following permission to the Edge site user: 'Manage all resources'. To do so, go to SettingsGlobal Permissions and select the ResourcesManage all resources permission for the Edge site role.

3 You receive an error when synchronizing ADLS indicating there is an issue with the Service Principal permissions

Issue: You receive an error when synchronizing ADLS indicating there is an issue with the Service Principal permissions.

Possible reason: Something is wrong in the configuration of the Service Principal account or the definition of the ADLS Include path.

Troubleshooting:

  1. Install the Azure Command Line Interface (CLI) using the Azure documentation.
  2. Open Azure CLI and login as the service principal.
    Use the following format: az login --service-principal -u <app-id> -p <password-or-cert> --tenant <tenant>
    For more information, go to the Azure documentation.
  3. Ask to list the assigned roles.
    Use the following code: az role assignment list --all
    Output: You receive a list of permissions and roles. We expect that you see the roleDefinitionNames "Reader" and "Storage Blob Data Reader".
    For more information, go to the Azure documentation.
  4. Ask to list the containers
    Use the following format: az storage container list --auth-mode login --account-name <account name> | grep name
    Output: You receive a list of containers. We expect that you see the container in which your data is located, and which you reference in the Include Path. https://<storage account name>.blob.core.windows.net/<container name>/<blob name>.
    For more information, go to the Azure documentation.
  5. Check if the directory/blob you reference in the Include Path exists.
    Use following format: az storage fs directory exists -n <directory> -f <file system> --account-name <account name> --auth-mode login
    Output: We expect that you get True as output.
    For more information, go to the Azure documentation.
  6. To ingest purview collections into Collibra, the service principal needs read permissions on the Purview collection. You can give the add “Collection admins“ permission using CLI.
    1. Login in to CLI.
      Use the following code: az login
    2. Install the Azure Purview CLI module.
      Use the following code: az extension add --name purview
    3. Run Purview
      Use the following format: az purview account add-root-collection-admin --name <account name> --object-id <Service principal Object Id> --resource-group <SampleResourceGroup> >
      You can find the Service principal Object Id in the IAM account overview.
  7. If your ADLS storage is private, make sure that the Allow Azure services on the trusted services list to access this storage account checkbox in the NetworkingFirewalls and virtual networks is selected.

    Your ADLS storage is private if you selected the Public networking access option Enabled from selected virtual networks and IP adresses.

4 You receive the error when synchronizing ADLS: Crawler path does not exist

Issue: During the synchronization, you receive the following error: Crawler path does not exist on ADLS ....

Possible reason: The property Hierarchical namespace is probably not enabled in your ADLS account.

Solution: In Microsoft Azure, open your storage account, and in OverviewProperties, enable the Blob service property Hierarchical namespace.

5 You receive an error: Offset should not be greater than 100000

Issue: During the synchronization, you receive the following error: Illegal argument: Offset should not be greater than 100000..

Reason: Currently, the ADLS integration can ingest only up to 100,000 assets from Purview.