Synchronize Databricks Unity Catalog

Important 

In Collibra 2024.02, we've launched a new user interface (UI) in beta for Collibra Data Intelligence Cloud! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Synchronizing Databricks Unity Catalog is the process of integrating metadata from the databases connected to Databricks Unity Catalog and making the data available in Collibra Data Intelligence Cloud.

You can synchronize manually, or you can automate it by adding a synchronization schedule.

Before you begin

  • You have created a connection to Databricks in your Edge site.
  • You have added the Databricks Unity Catalog capability for the connection.
  • You know in which System asset you want to add the Databricks Unity Catalog assets.
    • If you have registered Databricks databases via the JDBC driver before, use the same System asset.
    • If you never registered Databricks databases before, create a new System asset manually and use that one.

Required permissions

Steps

  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. In the tab bar, click Data Source Registration.
    The Register Data Source page opens.
  3. On the main toolbar, click .
    The Create dialog box appears.
  4. In the Register with Edge section of the Create dialog box, click Register a data sourceRegister a Data Source.
    The Register contentRegister Data Source page opens.
  5. In the Connection name column, locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Data sources/Capabilities column.
    The Databricks Unity Catalog synchronization configuration page opens.
  6. In the Configuration SectionSynchronization Configuration section, click Add Configuration.
  7. Select the System asset in which you want to add the Databricks assets.
  8. Click Save Configuration.
  9. Click Synchronize.
    A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. In the tab bar, click Data Source Registration.
    The Register Data Source page opens.
  3. On the main toolbar, click .
    The Create dialog box appears.
  4. In the Register with Edge section of the Create dialog box, click Register a data sourceRegister a Data Source.
    The Register contentRegister Data Source page opens.
  5. In the Connection name column, locate the Databricks connection that you used when you added the Databricks Unity Catalog capability and click the link in the Data sources/Capabilities column.
    The Databricks Unity Catalog synchronization configuration page opens.
  6. In the Configuration SectionSynchronization Configuration section, click Add Configuration.
  7. Select the System asset in which you want to add the Databricks assets.
  8. Click Save Configuration.
  9. In the Synchronization Schedule section, click Add Schedule.
  10. Enter the required information and click Save:
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45. If you try to add it at 8:45, we will default it to 8:00. Use a cron expression if you don't want to schedule on the hour.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.

What's next?

The synchronization job integrates the metadata of all databases, schemas, tables and columns.
After the synchronization: