Connecting to Azure Data Lake Storage
This section contains an overview of Azure Data Lake Storage Gen2 (ADLS).
General information
Field | Description |
---|---|
Data source | Azure Data Lake Storage Gen2 (ADLS) |
Supported versions | N/A |
Connection string | abfss://
|
Packaged? |
|
Certified? |
|
Supported features | |
Analyze data
|
|
Archive breaking records
|
|
Estimate job
|
|
Pushdown
|
|
Processing capabilities | |
Spark agent
|
|
Yarn agent
|
|
Minimum user permissions
In order for Collibra DQ to access your Azure Storage containers, you need the following user permissions:
- ROLE_ADMIN in Collibra DQ.
- Read access on your Azure Storage containers to create a basic ADLS connection.
- Read and write access to the Azure Storage containers where breaking records are archived from Collibra DQ. This is only necessary if you use the archive breaking records feature.
Recommended and required connection properties
Required | Connection Property | Type | Value |
---|---|---|---|
|
Name | Text | The unique name used for your connection. |
|
Connection URL | String |
The connection string value of your ADLS connection. abfss://<file system>@<account name>.dfs.core.windows.net/<path>/file name> Tip See the Connection URL elements section for more information on the syntax. |
|
Target Agent | Text | The Agent used to submit your DQ Job. |
|
Auth Type | Option |
The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types. |
|
Driver Properties | String |
The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true |
Connection URL elements
Field | Description |
---|---|
File scheme | The abfss protocol is used as the scheme identifier. |
File system | The parent location that holds the files and folders. This is the same as containers in the Azure Storage Blob service. |
Account name | The name given to your storage account during creation. |
Path | A forward slash delimited / representation of the directory structure. |
File name | The name of the individual file. This parameter is optional when you address a directory. |
Authentication
Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.
Field | Description |
---|---|
Storage Account | The Azure Storage account name. |
Key | The authentication key for the storage account. |
TenantId | The Microsoft Online tenant being used to access data. If not specified, your default tenant will be used. |
ClientId | The client Id assigned when you register your application with an OAuth authorization server. |
Secret | The client secret assigned when you register your application with an OAuth authorization server. |
Configure private endpoints for key-based authentication
- Fill out the required fields on the Details tab.
- Select Key from the Authentication type dropdown menu, then fill out the Storage Account and Key fields.
- In the Driver Properties field on the Properties tab, add cloud.endpoint=<endpoint> to enable the Azure Data Lake Storage (Gen2) private endpoint for key-based authentication.
- Click Submit.
Note The <endpoint> value can be any Microsoft cloud endpoint. Depending on the location, Azure uses *.windows.net and *.chinacloudapi.cn, and Microsoft Entra ID uses https://login.microsoftonline.com and https://login.chinacloudapi.cn. For more information, go to the official Microsoft documentation.
Example To use the usgovcloudapi.net endpoint, add cloud.endpoint=usgovcloudapi.net.
Configure private endpoints for service principal-based authentication
- Fill out the required fields on the Details tab.
- Select Service Principal from the Authentication type dropdown menu, then fill out the TenantId, ClientId, and Secret fields.
- In the Driver Properties field on the Properties tab, add cloud.endpoint=<endpoint> to enable the Azure Data Lake Storage (Gen2) private endpoint for key-based authentication.
- Click Submit.
Note The <endpoint> value can be any Microsoft cloud endpoint. Depending on the location, Azure uses *.windows.net and *.chinacloudapi.cn, and Microsoft Entra ID uses https://login.microsoftonline.com and https://login.chinacloudapi.cn. For more information, go to the official Microsoft documentation.
Example To use the microsoftonline.us endpoint, add cloud.endpoint=microsoftonline.us.