OpenLineage: Supported data sources
Collibra Data Lineage includes built-in recognition patterns that automatically identify and map data assets from OpenLineage events that follow the namespace format defined in the OpenLineage Naming Specification (Version 1.41.0).
If your OpenLineage producer emits a namespace format that is not supported by the built-in patterns, Collibra Data Lineage provides the flexibility to define a custom namespace and name format. This ensures that technical lineage is accurately created and stitched to your Data Catalog, regardless of the source system or ETL tool.
Built-in support
For systems following the standard naming specification, Collibra Data Lineage automatically recognizes the data structure and applies the correct SQL dialect for transformation analysis.
The following table lists the supported database systems.
| Data source | Namespace pattern | Name template | Dialect |
|---|---|---|---|
| Athena | awsathena://athena.<region>.amazonaws.com
|
{catalog}.{database}.{table} | HIVE |
| BigQuery | bigquery
|
{project_id}.{dataset}.{table} | BIGQUERY |
| Cassandra | cassandra://<host>:<port>
|
{keyspace}.{table} | SYBASE |
| Cosmos | azurecosmos://<host>/dbs/<database>
|
colls/{table} | AZURE |
| CrateDB | crate://<host>:<port>
|
{database}.{schema}.{table} | POSTGRES |
| DB2 | db2://<host>:<port>
|
{database}.{schema}.{table} | DB2 |
| Glue | arn:aws:glue:<region>:<account_id>
|
table/{database}/{table} | HIVE |
| Kusto | azurekusto://<host>.kusto.windows.net
|
{database}/{table} | AZURE |
| MSSQL | mssql://<host>:<port>
|
{database}.{schema}.{table} | MSSQL |
| MySQL | mysql://<host>:<port>
|
{database}.{table} | MYSQL |
| OceanBase | oceanbase://<host>:<port>
|
{database}.{table} | MYSQL |
| Oracle | oracle://<host>:<port>
|
{service}.{schema}.{table} | ORACLE |
| Postgres | postgres://<host>:<port>
|
{database}.{schema}.{table} | POSTGRES |
| Redshift | redshift://<cluster>.<region>:<port>
|
{database}.{schema}.{table} | REDSHIFT |
| SnapLogic | SnapLogic
|
{virtual_database}.{virtual_schema}.{virtual_table_name} | SNAPLOGIC |
| Snowflake | snowflake://<org>-<acct>
|
{database}.{schema}.{table} | SNOWFLAKE |
| Spanner | spanner://<project>:<instance>
|
{database}.{schema}.{table} | POSTGRES |
| Synapse | sqlserver://<host>:<port>
|
{schema}.{table} | MSSQL |
| Teradata | teradata://<host>:<port>
|
{database}.{table} | TERADATA |
| Trino | trino://<host>:<port>
|
{catalog}.{schema}.{table} | POSTGRES |
For the supported non-relational sources, Collibra identifies the resource by its path or topic name.
| Data source | Namespace pattern | Name template |
|---|---|---|
| S3 | s3://<bucket>
|
{key} |
| GCS | gs://<bucket>
|
{key} |
| ABFSS | abfss://<container>@<account>.dfs.core.windows.net
|
{path} |
| WASBS | wasbs://<container>@<account>.blob.core.windows.net/<key>
|
{key} |
| HDFS | hdfs://<host>:<port>
|
{path} |
| DBFS | dbfs://<workspace>
|
{path} |
| Local | file
|
{path} |
| Remote | file://<host>
|
{path} |
| Kafka | kafka://<host>:<port>
|
{topic} |
| PubSub | pubsub
|
topic:{project}:{topic} |
| InMemory | inmemory://
|
{dataset} |