Registering a Databricks file system via the Databricks JDBC connector and Edge
If you register a specific Databricks data source via the Databricks JDBC connector, the resulting assets represent the columns and the tables in the Databricks database.
You can also extend the setup to allow for the retrieval of sample data, profiling, and classification.
Before you begin
- You have created and installed an Edge site.
- You have enabled the required settings for database registration via Edge. For more information, go to Database registration via Edge.
- If needed, you've set up your environment to allow profiling and classification via Edge. For more information, go to Database profiling via Edge and Set up Unified Data Classification.
Steps
Step |
What? |
Description |
Results |
|
---|---|---|---|---|
Preparation | 1 | Adds a Databricks JDBC connection to your Edge site. | ||
2 |
Add the following capabilities:
|
Adds the required capabilities to the Databricks connection |
||
Setup |
3 |
Register the data source |
Registering a data source creates the structure for the metadata in Collibra. |
|
4 |
Making a selection of schemas and tables that you want to ingest. |
The information on the Configuration tab page of the Database asset is filled in. |
||
Registration |
5 |
Synchronizing the schema of a registered data source to make the metadata available in Collibra. |
Schema, Table, Column, and Foreign Keys assets are created in the specified domain, and registration data becomes available. |
|
6 | If needed, profile the synchronized data. |
Data Profiling creates a summary of a data source in Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the data is about. Data Profiling is available for registered data sources |
The Table and Column assets contain profiling information. | |
7 | If needed, classify the synchronized data. | Creates data classification suggestions for the Columns assets. | The Column assets are classified. |
For general information on working with Databricks, go to Ways to work with Databricks.