Connecting to Hadoop Distributed File System (HDFS)
This section contains an overview of Hadoop Distributed File System (HDFS).
General information
Field | Description |
---|---|
Data source | Hadoop Distributed File System (HDFS) |
Supported versions | N/A |
Connection string | hdfs://
|
Packaged? |
|
Certified? |
|
Supported features | |
Analyze data
|
|
Archive breaking records
|
|
Estimate job
|
|
Pushdown
|
|
Processing capabilities | |
Spark agent
|
|
Yarn agent
|
|
Minimum user permissions
In order for Collibra DQ to access your HDFS bucket, you need the following permissions.
- Read access to the path in your HDFS connection.
Recommended and required connection properties
Required | Connection Property | Type | Value |
---|---|---|---|
|
Name | Text | The unique name of your connection. Do not use spaces in your connection name and only use valid characters. |
|
Connection URL | String |
The connection string path of your HDFS connection. The path must start with Example |
|
Target Agent | Option | The Agent used to submit your DQ Jobs. |
|
Auth Type | Option |
The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types. |
|
Save Credentials | Option | Select this option after you enter your connection details. |
|
Properties | String |
The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true To ensure that your remote procedures are secure within Collibra DQ, we recommend defining the following driver property: hadoop.rpc.protection=privacy |
Authentication
Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.
Field | Description |
---|---|
Principal | The service principal used to let Collibra DQ access your connection. |
Key |
The keytab used to authorize your connection. |
TGT |
The Ticket Granting Ticket used to authorize your connection. |