OpenLineage: Supported data sources

Collibra Data Lineage includes built-in recognition patterns that automatically identify and map data assets from OpenLineage events that follow the namespace format defined in the OpenLineage Naming Specification (Version 1.41.0).

If your OpenLineage producer emits a namespace format that is not supported by the built-in patterns, Collibra Data Lineage provides the flexibility to define a custom namespace and name format. This ensures that technical lineage is accurately created and stitched to your Data Catalog, regardless of the source system or ETL tool.

Built-in support

For systems following the standard naming specification, Collibra Data Lineage automatically recognizes the data structure and applies the correct SQL dialect for transformation analysis.

The following table lists the supported database systems.

Data source Namespace pattern Name template Dialect
Athena awsathena://athena.<region>.amazonaws.com {catalog}.{database}.{table} HIVE
BigQuery bigquery {project_id}.{dataset}.{table} BIGQUERY
Cassandra cassandra://<host>:<port> {keyspace}.{table} SYBASE
Cosmos azurecosmos://<host>/dbs/<database> colls/{table} AZURE
CrateDB crate://<host>:<port> {database}.{schema}.{table} POSTGRES
DB2 db2://<host>:<port> {database}.{schema}.{table} DB2
Glue arn:aws:glue:<region>:<account_id> table/{database}/{table} HIVE
Kusto azurekusto://<host>.kusto.windows.net {database}/{table} AZURE
MSSQL mssql://<host>:<port> {database}.{schema}.{table} MSSQL
MySQL mysql://<host>:<port> {database}.{table} MYSQL
OceanBase oceanbase://<host>:<port> {database}.{table} MYSQL
Oracle oracle://<host>:<port> {service}.{schema}.{table} ORACLE
Postgres postgres://<host>:<port> {database}.{schema}.{table} POSTGRES
Redshift redshift://<cluster>.<region>:<port> {database}.{schema}.{table} REDSHIFT
SnapLogic SnapLogic {virtual_database}.{virtual_schema}.{virtual_table_name} SNAPLOGIC
Snowflake snowflake://<org>-<acct> {database}.{schema}.{table} SNOWFLAKE
Spanner spanner://<project>:<instance> {database}.{schema}.{table} POSTGRES
Synapse sqlserver://<host>:<port> {schema}.{table} MSSQL
Teradata teradata://<host>:<port> {database}.{table} TERADATA
Trino trino://<host>:<port> {catalog}.{schema}.{table} POSTGRES

For the supported non-relational sources, Collibra identifies the resource by its path or topic name.

Data source Namespace pattern Name template
S3 s3://<bucket> {key}
GCS gs://<bucket> {key}
ABFSS abfss://<container>@<account>.dfs.core.windows.net {path}
WASBS wasbs://<container>@<account>.blob.core.windows.net/<key> {key}
HDFS hdfs://<host>:<port> {path}
DBFS dbfs://<workspace> {path}
Local file {path}
Remote file://<host> {path}
Kafka kafka://<host>:<port> {topic}
PubSub pubsub topic:{project}:{topic}
InMemory inmemory:// {dataset}