Registering a Databricks file system via the Databricks JDBC connector and Edge
If you register a specific Databricks data source via the Databricks JDBC connector, the resulting assets represent the columns and the tables in the Databricks database.
You can retrieve sample data, and can profile and classify the data.
Before you begin
-
You have enabled the following settings:
- Database registration via Edge to allow registering a data source via Edge.
- Database profiling via Edge to allow profiling and classification via Edge.
- You have created and installed an Edge site.
Steps
Step |
What? |
Description |
Results |
|
---|---|---|---|---|
Preparation | 1 | Adds a Databricks JDBC connection to your Edge site. | ||
2 |
Add the following capabilities:
|
Adds the required capabilities to the Databricks connection |
||
Setup |
3 |
Register the data source |
Registering a data source creates the structure for the metadata in Collibra. |
|
4 |
Making a selection of schemas and tables that you want to ingest. |
The information on the Configuration tab page of the Database asset is filled in. |
||
Registration |
5 |
Synchronizing the schema of a registered data source to make the metadata available in Collibra. |
Schema, Table, Column, and Foreign Keys assets are created in the specified domain, and registration data becomes available. |
|
6 | If needed, profile the synchronized data. |
Data Profiling creates a summary of a data source that is registered with Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the registered data is about. |
The Table and Column assets contain profiling information. |
For general information on working with Databricks, go to Ways to work with Databricks.