Registering a Databricks file system via Databricks JDBC connector and Edge
You can register a Databricks data source using the Databricks JDBC connector to ingest your metadata in Collibra. This process creates assets that represent your Databricks tables and columns, providing a clear view of your data landscape. You can also configure the connection to retrieve sample data, profile your data, and set up data classification.
Prerequisites
- You created and installed an Edge site.
- Required settings for database registration via Edge are enabled. For more information, go to Database registration via Edge.
- If needed, your environment is set up to allow profiling and classification via Edge. For more information, go to Database profiling via Edge and Set up Unified Data Classification.
Steps
|
Step |
What |
Description |
Results |
|
|---|---|---|---|---|
| Preparation | 1 | Adds a Databricks JDBC connection to your Edge site. | ||
|
2 |
Add the following capabilities:
|
Adds the required capabilities to the Databricks connection. |
||
| Setup |
3 |
Register the data source |
Registering a data source creates the metadata structure in CPSH. |
|
| 4 |
Making a selection of schemas and tables that you want to ingest. |
The information on the Configuration tab of the Database asset is filled. |
||
| Registration |
5 |
Synchronizing the schema of a registered data source to make the metadata available in CPSH. |
Schema, Table, Column, and Foreign Keys assets are created in the specified domain, and registration data becomes available. |
|
| 6 | Optionally, profile the synchronized data. |
Data Profiling creates a summary of a data source in Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the data is about. Data Profiling is available for registered JDBC data sources |
The Table and Column assets contain profiling information. | |
| 7 | Optionally, classify the synchronized data. | Creates data classification suggestions for the Column assets. | The Column assets are classified. | |
Helpful resources
For general information on working with Databricks in Collibra, go to Ways to work with Databricks.
For more information about Databricks, go to Databricks documentation.