Graph database infrastructure
Some features and applications in Collibra rely on a graph database that holds a partial copy of the main database, including the knowledge graph and operating model. This copy is updated regularly to reflect changes made to the main database.
When you commit a transaction, all changes are transferred to the graph database through a synchronization process called Change Data Capture (CDC). Features and applications that rely on the graph database may be affected by the synchronization status. Some capabilities may become temporarily unavailable or show outdated data. For more information about the specific impact, refer to the documentation for each feature or application.
Snapshot process
When a graph database is first connected to your environment, a full copy of the main database is transferred in a process called a snapshot. The graph database is unavailable during this process, which may also affect the availability of other features and applications.
Occasionally, Collibra may need to re-trigger a full or partial snapshot to deliver new features, fix issues, or prevent more significant problems. When this is planned as part of scheduled maintenance, you are notified in advance through an in-app notification.
Synchronization delay
Changes to the main database are transferred to the graph database after each committed transaction. Transactions are always processed in commit order to ensure database integrity. Large transactions can block the transfer of smaller ones, so it is important to keep your transaction size and duration as small as possible.
Important Large transactions can significantly delay synchronization and may affect the freshness of data in features that rely on the graph database.
Common sources of large transactions
Large transactions are most commonly observed in the following contexts:
| Source | Recommendations |
|---|---|
| Import jobs |
|
| Workflows | Prefer bulk operations that rely on the import module, output module, or knowledge graph API. These approaches are optimized for faster processing between user interactions. |
Graph database status
Collibra measures synchronization delay by regularly sending a heartbeat signal that is inserted between transactions. You can view the synchronization status in Collibra settings → General → System → Graph DB synchronization status row. The row shows the current status and, if the delay exceeds one minute, the synchronization delay in minutes. Some features and applications also report the delay or status directly. For more information, refer to the documentation for each feature.
| Status | Delay range | Description |
|---|---|---|
| Current | Under 5 minutes | The graph database is up to date. The delay is not shown when under 1 minute. |
| Slightly behind | 5–30 minutes | Minor synchronization lag. Data may be slightly outdated. |
| Moderately behind | 30 minutes–2 hours | Noticeable lag. Some features may show stale data. |
| Heavily behind | 2–4 hours | Significant lag. Review recent large transactions. |
| Critically behind | Over 4 hours | Severe lag. Consider contacting Collibra Support if this persists and you have not committed large transactions recently. |
| Snapshotting in progress | Not applicable | A full or partial snapshot is running. The graph database is temporarily unavailable. |
| Status unavailable | Not applicable | An error occurred while querying the status. Contact Collibra Support if the issue persists. |
Availability
The graph database synchronization status is only available in environments where the CDC process is enabled. The following availability restrictions apply:
- Only commercial cloud environments can be enabled at this time.
- This feature is not available for CPSH or GovCloud environments.
- Commercial cloud production environments are enabled on demand. Submit a support request to enable it for your environment.
Important The graph database infrastructure is currently in public preview. Service interruptions may occur more frequently than expected as Collibra stabilizes and scales the infrastructure.