Synchronize Google Cloud Storage
Synchronizing Google Cloud Storage (GCS) is the process of ingesting metadata from a selected GCS repository and making the data available in Collibra Platform.
When you synchronize GCS, the content of your repository is analyzed and represented in Collibra by means of assets and their characteristics. Collibra also takes into account the defined crawlers.
To synchronize, you can:
- Manually start a synchronization job of a GCS File System asset. This can be useful if you want to test your crawlers or synchronize immediately.
- You can also add a synchronization schedule to synchronize automatically at a fixed interval. You can only create one synchronization schedule.
Important considerations:
- If a synchronization job is in progress and a second one is triggered (manually or automatically), the second job is queued.
- If a synchronization job is still running and a new synchronization of the same GCS File System is triggered (manually or automatically), the running synchronization continues and the new synchronization request is ignored.
After the synchronization, the resulting assets are in the domain that was specified in the crawler. For information on the integrated data, go to Integrated Google Cloud Storage data.
Prerequisites
In your Collibra environment:
- You have registered a GCS file system.
- You have connected the GCS File System asset to the GCS Edge capability.
- If needed, you have defined crawlers.
- You have a resource role with the Configure external system resource permission, for example, Owner.
- You have a global role with the Catalog global permission, for example, Catalog Author.
- You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.
Steps
- Open the GCS File System asset.
- In the tab bar, click Configuration.
- In the Crawlers section, click Synchronize.
A notification indicates that the synchronization has started.
When the synchronization finishes, the resulting assets, including their attributes and relations, are created, edited or deleted in the selected domain(s) and in the Data Sources page of . If one of the directories in GCS doesn't have a name, we will create a unique name for the asset in Collibra.
Note If a temporary communication issue results in a partial synchronization, the status of the assets that were not synchronized becomes Missing from source. If the assets are identified in the source system during the next fully successful synchronization, the previous statuses are restored.
- Open the GCS File System asset.
- In the tab bar, click Configuration.
- In the Synchronization Schedule section, click Add Schedule.
- Enter the required information.
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule. - Click Save.
You can view a summary of the results from the Activities list.
You can view the assets in their domain.