Shared Storage and Cloud Storage connections

To create technical lineage for certain data sources, you must provide SQL scripts or data source files for Collibra Data Lineage to parse. You can use a Shared Storage connection or a Cloud Storage connection to provide access to the source files.

For a list of data sources that support Shared Storage and Cloud Storage connections, go to Supported data sources for technical lineage.

Shared Storage connection

A Shared Storage connection uses a directory on the Edge server, local or network folder, to store source files. This type of connection is supported only on Edge sites and is not available on Collibra Cloud sites.

Note Files in a Shared Storage connection are temporary and can be automatically deleted. If data persistence is important, use a Cloud Storage connection.

Cloud Storage connection

A Cloud Storage connection allows Collibra Data Lineage to retrieve source files directly from any of the following cloud storage buckets:

  • Amazon S3
  • Azure Data Lake Storage (ADLS)
  • Google Cloud Storage (GCS)

This method uses cloud-native permissions and offers superior reliability. Use this method for enterprise-scale lineage, long-term storage, and production environments.

Comparison of connection types

Feature Shared Storage Cloud Storage
Storage quota 6 GB hard limit Scales with your cloud provider
Data retention

Files are temporary and are automatically deleted in any of the following situations:

  • After a set time period. The default is 180 days.
  • When an Edge site update restarts the pod containing Shared Storage connection files.
Files are stored permanently until you delete them.
Best for Testing environments.

Production environments.

Because of retention limits and automated deletion on Edge, Cloud Storage is recommended to ensure data persistence.

Full-state synchronization

Collibra Data Lineage uses a full-state synchronization model.

  • Every time you start a synchronization, you must provide all source files that you want to include in the technical lineage graph.
  • If a file is missing during the synchronization, the corresponding lineage path is removed from the diagram.
  • Cloud Storage is recommended for maintaining historical lineage, because it allows permanent storage of source files and ensures consistent technical lineage graphs.