Synchronize Azure Data Lake Storage

Important 

Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.

Synchronizing Azure Data Lake Storage (ADLS) file system is the process of ingesting metadata from a selected ADLS repository and making the data available in Collibra Platform.

When you synchronize ADLS, the content of your repository is analyzed and represented in Collibra by means of assets and their characteristics. Collibra also takes into account the defined crawlers.

To synchronize, you can:

  • Manually start a synchronization job of an ADLS File System asset. This can be useful if you want to test your crawlers or synchronize immediately.
  • You can also add a synchronization schedule to synchronize automatically at a fixed interval. You can only create one synchronization schedule.

Important considerations:

  • You can only synchronize one ADLS File System at a time.
  • If a synchronization job is in progress and a second one is triggered, manually or automatically, it will be queued.
  • If a synchronization job is still running and a new synchronization of the same ADLS File System is triggered (manually or automatically), the running synchronization will continue and the new synchronization request is ignored.

Prerequisites

In your Collibra environment:

Steps

  1. Open the ADLS File System asset.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Crawlers section, click Synchronize now.
  4. A notification indicates that the synchronization has started.

Note If a temporary communication issue results in a partial synchronization, the status of the assets that were not synchronized becomes Missing from source. If the assets are identified in the source system during the next fully successful synchronization, the previous statuses are restored.

  1. Open the ADLS File System asset.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Synchronization Schedule section, click Add Schedule.
  4. Enter the required information.
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.
  5. Click Save.

What's next

When the synchronization finishes, the resulting assets, including their attributes and relations, are created, edited or deleted in the selected domain(s) and in the Data Sources page of Data Catalog.

After the synchronization: