OpenLineage: Defining custom namespace
When Collibra Data Lineage processes an OpenLineage dataset, it must extract the database, schema, and table names from the namespace and dataset name.
Collibra Data Lineage includes built-in recognition patterns for common systems as listed in OpenLineage: Supported data sources.
If your OpenLineage producer emits a namespace format that no built-in pattern supports, you can define a custom namespace definition. These definitions enable Collibra Data Lineage to handle unfamiliar namespace formats without requiring code changes.
When to use custom namespace definition
You need a custom namespace definition when datasets appear as analyze errors in the transformation code with a message similar to:
Dataset 'MY_DB.MY_SCHEMA.MY_TABLE' in namespace 'snowflake://lw08072.eu-central-1' does not match any supported dataset type.
This error indicates that no built-in pattern matched the namespace. Common scenarios include:
- Matillion sends
snowflake ://lw08072.eu-central-1. (legacy account format). Built-in patterns expectsnowflake://org-acct. - Salesforce uses
salesforce://{orgId}, which has no built-in pattern in Collibra Data Lineage. - Custom ETL tools may emit proprietary namespace schemes.
Create a custom namespace definition
To add a custom namespace definition when creating technical lineage for OpenLineage:
- Identify the namespace and name format from your OpenLineage events.
- Create the JSON configuration with the namespace pattern, name template, type, and dialect.
- Add the configuration to the Source Configuration field when you add the technical lineage for OpenLineage capability.
For steps on creating technical lineage for OpenLineage, go to Steps overview: Integrate OpenLineage via Edge.
1. Identify the namespace and name format
Review your OpenLineage events to find the exact namespace and name values. For example:
namespace: "snowflake://lw08072.eu-central-1"
name: "MY_DB.MY_SCHEMA.MY_TABLE"
You need these values to create the namespace pattern and name template in the next step.
2. Create the JSON configuration
Add the custom namespace definitions to the customNamespaceParsers array in your source configuration JSON:
{
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^snowflake://(?P<host>.+)$",
"nameTemplate": "{database}.{schema}.{table}",
"dialect": "SNOWFLAKE"
}
]
}
You can define multiple custom namespace definitions in the array:
{
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^snowflake://(?P<host>.+)$",
"nameTemplate": "{database}.{schema}.{table}",
"dialect": "SNOWFLAKE"
},
{
"type": "database",
"namespacePattern": "^salesforce://(?P<org_id>.+)$",
"nameTemplate": "{table}",
"dialect": "oracle"
}
]
}
Required fields
Each custom namespace definition requires the following fields:
| Field | Required | Default | Description |
|---|---|---|---|
| type | No | "database" | The dataset category. Options are "database" or "file". |
| namespacePattern | Yes | -- | Regular expression to match the namespace. Use (?P<name>...) for named capture groups. |
| nameTemplate | Yes | -- | Template describing dataset name structure using {field} placeholders. |
| dialect | No | "oracle" | SQL dialect for database datasets. This field is ignored for the file type. |
The following sections explain how to specify each field.
Specify the namespacepattern
Create a regular expression that matches the namespace string. Use (?P<name>...) named groups to capture values from the namespace.
Example:
"namespacePattern": "^snowflake://(?P<host>.+)$"
- Test your regular expression at regex101.com by selecting Python flavor.
- Use ^ and $ anchors to ensure the full namespace is matched.
Capture field values from the namespace:
Named groups from the namespace pattern are merged with fields from the name template. If you name a capture group database, schema, or table, that value is used as the corresponding field. This is useful when part of the dataset identity is in the namespace rather than the name.
Example:
{
"namespacePattern": "^mydb://(?P<database>[^/]+)/(?P<host>.+)$",
"nameTemplate": "{schema}.{table}"
}
For namespace mydb://MY_DB/some-host and name PUBLIC.orders, Collibra Data Lineage produces:
database=MY_DBschema=PUBLICtable=orders
Groups with names other than database, schema, or table, for example, host or org_id, are captured but not mapped to assets in Collibra.
Specify the nameTemplate
The name template describes the structure of the dataset name using {placeholder} syntax. Collibra Data Lineage uses the literal characters between placeholders as separators.
Template syntax:
| Element | Meaning | Example |
|---|---|---|
{fieldName}
|
A placeholder that captures a value from the dataset name | {database}, {schema}, {table} |
| Any other character | A literal separator that must appear exactly in the name | . in {database}.{schema}.{table} |
How matching works:
The template is converted to a regular expression internally. Each placeholder becomes a regex group:
- Middle placeholders stop at the next literal separator character. For
{database}.{schema}.{table}, the{database}group matches everything up to the first.. - Last placeholder is greedy and captures everything remaining, including separator characters.
Example:
Template {database}.{schema}.{table} matching name MY_DB.MY_SCHEMA."My.Table":
{database} → MY_DB (stops at first dot)
. → . (literal separator)
{schema} → MY_SCHEMA (stops at next dot)
. → . (literal separator)
{table} → "My.Table" (greedy — captures everything, including the dot)
Template examples:
| Template | Name | Result | Explanation |
|---|---|---|---|
{database}.{schema}.{table}
|
MY_DB.PUBLIC.orders
|
database=MY_DB
schema=PUBLIC
table=orders
|
Each field cleanly separated by dots |
{database}.{schema}.{table}
|
MY_DB.PUBLIC."My.Table"
|
database=MY_DB
schema=PUBLIC
table="My.Table"
|
Dot in table name is captured correctly |
{database}.{schema}.{table}
|
my.db.PUBLIC.orders
|
database=my
schema=db
table=PUBLIC.orders
|
Dot in my.db splits too early |
{database}/{table}
|
mydb/path/to/file
|
database=mydb
table=path/to/file
|
Table name can contain slashes because it's the last field |
{table}
|
anything.goes/here:too
|
table=anything.goes/here:too
|
Single placeholder captures everything |
Workarounds for separators in middle fields:
If your middle field values contain the separator character, you have two options:
- Restructure the template so the field containing the separator is last. For example, if the database name contains dots, place it last in the template:
{schema}/{table}/{database}with a slash separator, assuming the OpenLineage producer sends names in this order. - Use a single-field template like
{table}and let Collibra Data Lineage treat the entire name as the table name. You can then use adatasourcesentry to set the correct database and schema values.
Common templates:
The following table shows common template patterns for different database systems and file types:
| Template | Use case | Example name |
|---|---|---|
{database}.{schema}.{table}
|
Three-part name (Snowflake, Postgres, SQL Server) | MY_DB.public.orders
|
{database}.{table}
|
Two-part name (MySQL, Teradata) | mydb.orders
|
{table}
|
Single table name (Salesforce) | Account
|
{database}/{table}
|
Slash-separated (Kusto) | analytics/metrics
|
{path}
|
File path (any characters) | /data/reports/output.csv
|
Template rules for database type:
{table}is required. Collibra Data Lineage must know the table name.{database}and{schema}are optional. They default to<default>if omitted.- Use the exact field names
database,schema, andtable. These map to database, schema, and table hierarchy in Collibra. - These field names work in both
nameTemplateplaceholders andnamespacePatternnamed groups. Values from both are merged together.
Template rules for file type:
- Any placeholder name works. Collibra Data Lineage uses the raw dataset name directly.
- The template validates the name structure only.
Specify the type and dialect
- Type:
- Set the
typefield based on what the dataset represents:"database"(default): For datasets representing database tables"file": For datasets representing files, streams, or topics
- Dialect:
- For database datasets, set the
dialectfield to match the SQL dialect of the source system. This affects how SQL queries in the dataset are parsed.Dialect Systems SNOWFLAKESnowflake postgresPostgres, Trino, Spanner, CrateDB mssqlSQL Server, Synapse bigqueryBigQuery hiveAthena, Glue, Hive mysqlMySQL, OceanBase redshiftRedshift oracleOracle (default if omitted)
3. Add to the source configuration
When you create the technical lineage for OpenLineage capability, add your custom namespace definitions in the Source Configuration field.
How definitions are processed
This section explains how Collibra Data Lineage processes custom namespace definitions, including the order in which they are checked, how they are prioritized, and how errors are handled.
Multiple definitions and priority
You can define multiple custom namespace definitions. They are checked in order, and the first match is used.
If you define two definitions for the same namespace with different template sizes, for example, 3-part and 2-part, Collibra Data Lineage automatically tries the more specific template first, the one with more placeholders. You do not need to order them manually in the configuration.
Custom namespace definitions are always checked before built-in patterns. If your custom definition matches the same namespace as a built-in pattern, your custom definition takes priority.
Error handling
- Invalid regular expression in
namespacePattern - The entry is silently skipped. Other definitions continue to work.
- Invalid template: (No
{field}placeholders, or missing{table}for database type) - The entry is skipped and the error is reported in the output.
- No matching definition for a dataset:
- Reported as an analyze error. Check your
namespacePatternregular expression.
Use with datasources
The customNamespaceParsers and datasources fields in your source configuration serve different purposes:
customNamespaceParsers: Defines how to interpret an unknown namespace format.datasources: Overrides the values to use after integration for Collibra system name, database, schema, and dialect.
They work together. A custom namespace definition first determines how to parse the namespace structure. Then, if a datasources entry exists for the same namespace, it can override the parsed values.
Example:
{
"datasources": [
{
"type": "database",
"namespace": "snowflake://lw08072.eu-central-1",
"collibraSystemName": "My Snowflake",
"dialect": "SNOWFLAKE"
}
],
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^snowflake://(?P<host>.+)$",
"nameTemplate": "{database}.{schema}.{table}",
"dialect": "SNOWFLAKE"
}
]
}
Examples
Map Matillion namespaces to the Snowflake 3-part name format
This configuration addresses scenarios where Matillion or other integrators use a legacy hostname-based namespace. It captures the host from the namespace while ensuring the 3-part dataset name is correctly split into database, schema, and table assets.
{
"datasources": [
{
"type": "database",
"namespace": "snowflake://lw08072.eu-central-1",
"collibraSystemName": "My Snowflake",
"dialect": "SNOWFLAKE"
}
],
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^snowflake://(?P<host>.+)$",
"nameTemplate": "{database}.{schema}.{table}",
"dialect": "SNOWFLAKE"
}
]
}
Parse Salesforce objects as single-level table assets
Salesforce datasets typically consist of a flat object name without a database or schema prefix. This definition identifies the Salesforce organization ID from the namespace and maps the entire dataset name to a single Table asset in Collibra.
{
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^salesforce://(?P<org_id>.+)$",
"nameTemplate": "{table}",
"dialect": "oracle"
}
]
}
Extract file paths from custom cloud storage namespaces
Use this format for proprietary cloud storage schemes where the namespace identifies the storage bucket and the name represents a nested file path.
{
"customNamespaceParsers": [
{
"type": "file",
"namespacePattern": "^mycloud://(?P<bucket>.+)$",
"nameTemplate": "{path}"
}
]
}
Support multiple data source types in a single configuration
You can include multiple definitions in the customNamespaceParsers array to handle different systems in a single OpenLineage integration.
{
"datasources": [
{
"type": "database",
"namespace": "snowflake://lw08072.eu-central-1",
"collibraSystemName": "My Snowflake",
"dialect": "SNOWFLAKE"
},
{
"type": "database",
"namespace": "salesforce://00D3z000000gJOSKG1",
"collibraSystemName": "My Salesforce",
"dialect": "oracle"
}
],
"customNamespaceParsers": [
{
"type": "database",
"namespacePattern": "^snowflake://(?P<host>.+)$",
"nameTemplate": "{database}.{schema}.{table}",
"dialect": "SNOWFLAKE"
},
{
"type": "database",
"namespacePattern": "^salesforce://(?P<org_id>.+)$",
"nameTemplate": "{table}",
"dialect": "oracle"
},
{
"type": "file",
"namespacePattern": "^mycloud://(?P<bucket>.+)$",
"nameTemplate": "{path}"
}
]
}