OpenLineage: Defining custom namespace

When Collibra Data Lineage processes an OpenLineage dataset, it must extract the database, schema, and table names from the namespace and dataset name.

Collibra Data Lineage includes built-in recognition patterns for common systems as listed in OpenLineage: Supported data sources.

If your OpenLineage producer emits a namespace format that no built-in pattern supports, you can define a custom namespace definition. These definitions enable Collibra Data Lineage to handle unfamiliar namespace formats without requiring code changes.

When to use custom namespace definition

You need a custom namespace definition when datasets appear as analyze errors in the transformation code with a message similar to:

Dataset 'MY_DB.MY_SCHEMA.MY_TABLE' in namespace 'snowflake://lw08072.eu-central-1' does not match any supported dataset type.

This error indicates that no built-in pattern matched the namespace. Common scenarios include:

Matillion sends snowflake ://lw08072.eu-central-1. (legacy account format). Built-in patterns expect snowflake://org-acct.
Salesforce uses salesforce://{orgId}, which has no built-in pattern in Collibra Data Lineage.
Custom ETL tools may emit proprietary namespace schemes.

Create a custom namespace definition

To add a custom namespace definition when creating technical lineage for OpenLineage:

Identify the namespace and name format from your OpenLineage events.
Create the JSON configuration with the namespace pattern, name template, type, and dialect.
Add the configuration to the Source Configuration field when you add the technical lineage for OpenLineage capability.

For steps on creating technical lineage for OpenLineage, go to Steps overview: Integrate OpenLineage via Edge.

1. Identify the namespace and name format

Review your OpenLineage events to find the exact namespace and name values. For example:

Copy

namespace: "snowflake://lw08072.eu-central-1"
name:      "MY_DB.MY_SCHEMA.MY_TABLE"

You need these values to create the namespace pattern and name template in the next step.

2. Create the JSON configuration

Add the custom namespace definitions to the customNamespaceParsers array in your source configuration JSON:

Copy

{
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^snowflake://(?P<host>.+)$",
      "nameTemplate": "{database}.{schema}.{table}",
      "dialect": "SNOWFLAKE"
    }
  ]
}

You can define multiple custom namespace definitions in the array:

Copy

{
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^snowflake://(?P<host>.+)$",
      "nameTemplate": "{database}.{schema}.{table}",
      "dialect": "SNOWFLAKE"
    },
    {
      "type": "database",
      "namespacePattern": "^salesforce://(?P<org_id>.+)$",
      "nameTemplate": "{table}",
      "dialect": "oracle"
    }
  ]
}

Required fields

Each custom namespace definition requires the following fields:

Field	Required	Default	Description
type	No	"database"	The dataset category. Options are "database" or "file".
namespacePattern	Yes	--	Regular expression to match the namespace. Use `(?P<name>...)` for named capture groups.
nameTemplate	Yes	--	Template describing dataset name structure using `{field}` placeholders.
dialect	No	"oracle"	SQL dialect for database datasets. This field is ignored for the file type.

The following sections explain how to specify each field.

Specify the `namespacepattern`

Create a regular expression that matches the namespace string. Use (?P<name>...) named groups to capture values from the namespace.

Example:

Copy

"namespacePattern": "^snowflake://(?P<host>.+)$"

Tip

Test your regular expression at regex101.com by selecting Python flavor.
Use ^ and $ anchors to ensure the full namespace is matched.

Capture field values from the namespace:

Named groups from the namespace pattern are merged with fields from the name template. If you name a capture group database, schema, or table, that value is used as the corresponding field. This is useful when part of the dataset identity is in the namespace rather than the name.

Example:

Copy

{
  "namespacePattern": "^mydb://(?P<database>[^/]+)/(?P<host>.+)$",
  "nameTemplate": "{schema}.{table}"
}

For namespace mydb://MY_DB/some-host and name PUBLIC.orders, Collibra Data Lineage produces:

database=MY_DB
schema=PUBLIC
table=orders

Groups with names other than database, schema, or table, for example, host or org_id, are captured but not mapped to assets in Collibra.

Specify the `nameTemplate`

The name template describes the structure of the dataset name using {placeholder} syntax. Collibra Data Lineage uses the literal characters between placeholders as separators.

Template syntax:

Element	Meaning	Example
`{fieldName}`	A placeholder that captures a value from the dataset name	`{database}`, `{schema}`, `{table}`
Any other character	A literal separator that must appear exactly in the name	`.` in `{database}.{schema}.{table}`

How matching works:

The template is converted to a regular expression internally. Each placeholder becomes a regex group:

Middle placeholders stop at the next literal separator character. For {database}.{schema}.{table}, the {database} group matches everything up to the first ..
Last placeholder is greedy and captures everything remaining, including separator characters.

Example:

Template {database}.{schema}.{table} matching name MY_DB.MY_SCHEMA."My.Table":

Copy

{database}  →  MY_DB           (stops at first dot)
.           →  .               (literal separator)
{schema}    →  MY_SCHEMA       (stops at next dot)
.           →  .               (literal separator)
{table}     →  "My.Table"      (greedy — captures everything, including the dot)

Important Separator characters cannot appear in middle fields. Only the last placeholder can contain the separator. This is typically not an issue, as database and schema names do not normally contain dots or slashes. However, be aware of this limitation if your data has unusual naming conventions.

Template examples:

Template	Name	Result	Explanation
`{database}.{schema}.{table}`	`MY_DB.PUBLIC.orders`	`database=MY_DB` `schema=PUBLIC` `table=orders`	Each field cleanly separated by dots
`{database}.{schema}.{table}`	`MY_DB.PUBLIC."My.Table"`	`database=MY_DB` `schema=PUBLIC` `table="My.Table"`	Dot in table name is captured correctly
`{database}.{schema}.{table}`	`my.db.PUBLIC.orders`	`database=my` `schema=db` `table=PUBLIC.orders`	Dot in `my.db` splits too early
`{database}/{table}`	`mydb/path/to/file`	`database=mydb` `table=path/to/file`	Table name can contain slashes because it's the last field
`{table}`	`anything.goes/here:too`	`table=anything.goes/here:too`	Single placeholder captures everything

Workarounds for separators in middle fields:

If your middle field values contain the separator character, you have two options:

Restructure the template so the field containing the separator is last. For example, if the database name contains dots, place it last in the template: {schema}/{table}/{database} with a slash separator, assuming the OpenLineage producer sends names in this order.
Use a single-field template like {table} and let Collibra Data Lineage treat the entire name as the table name. You can then use a datasources entry to set the correct database and schema values.

Common templates:

The following table shows common template patterns for different database systems and file types:

Template	Use case	Example name
`{database}.{schema}.{table}`	Three-part name (Snowflake, Postgres, SQL Server)	`MY_DB.public.orders`
`{database}.{table}`	Two-part name (MySQL, Teradata)	`mydb.orders`
`{table}`	Single table name (Salesforce)	`Account`
`{database}/{table}`	Slash-separated (Kusto)	`analytics/metrics`
`{path}`	File path (any characters)	`/data/reports/output.csv`

Template rules for database type:

{table} is required. Collibra Data Lineage must know the table name.
{database} and {schema} are optional. They default to <default> if omitted.
Use the exact field names database, schema, and table. These map to database, schema, and table hierarchy in Collibra.
These field names work in both nameTemplate placeholders and namespacePattern named groups. Values from both are merged together.

Template rules for file type:

Any placeholder name works. Collibra Data Lineage uses the raw dataset name directly.
The template validates the name structure only.

Specify the `type` and `dialect`

Type:

Set the type field based on what the dataset represents:

"database" (default): For datasets representing database tables
"file": For datasets representing files, streams, or topics

Dialect:

For database datasets, set the dialect field to match the SQL dialect of the source system. This affects how SQL queries in the dataset are parsed.

Dialect	Systems
`SNOWFLAKE`	Snowflake
`postgres`	Postgres, Trino, Spanner, CrateDB
`mssql`	SQL Server, Synapse
`bigquery`	BigQuery
`hive`	Athena, Glue, Hive
`mysql`	MySQL, OceanBase
`redshift`	Redshift
`oracle`	Oracle (default if omitted)

3. Add to the source configuration

When you create the technical lineage for OpenLineage capability, add your custom namespace definitions in the Source Configuration field.

How definitions are processed

This section explains how Collibra Data Lineage processes custom namespace definitions, including the order in which they are checked, how they are prioritized, and how errors are handled.

Multiple definitions and priority

You can define multiple custom namespace definitions. They are checked in order, and the first match is used.

If you define two definitions for the same namespace with different template sizes, for example, 3-part and 2-part, Collibra Data Lineage automatically tries the more specific template first, the one with more placeholders. You do not need to order them manually in the configuration.

Custom namespace definitions are always checked before built-in patterns. If your custom definition matches the same namespace as a built-in pattern, your custom definition takes priority.

Error handling

Invalid regular expression in namespacePattern: The entry is silently skipped. Other definitions continue to work.
Invalid template: (No {field} placeholders, or missing {table} for database type): The entry is skipped and the error is reported in the output.
No matching definition for a dataset:: Reported as an analyze error. Check your namespacePattern regular expression.

Use with `datasources`

The customNamespaceParsers and datasources properties in your source configuration serve different purposes:

customNamespaceParsers: Defines how to interpret an unknown namespace format.
datasources: Overrides the values to use after integration for Collibra system name, database, schema, and dialect.

They work together. A custom namespace definition first determines how to parse the namespace structure. Then, if a datasources entry exists for the same namespace, it can override the parsed values.

Example:

Copy

{
  "datasources": [
    {
      "type": "database",
      "namespace": "snowflake://lw08072.eu-central-1",
      "collibraSystemName": "My Snowflake",
      "dialect": "SNOWFLAKE"
    }
  ],
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^snowflake://(?P<host>.+)$",
      "nameTemplate": "{database}.{schema}.{table}",
      "dialect": "SNOWFLAKE"
    }
  ]
}

Examples

Map Matillion namespaces to the Snowflake 3-part name format

This configuration addresses scenarios where Matillion or other integrators use a legacy hostname-based namespace. It captures the host from the namespace while ensuring the 3-part dataset name is correctly split into database, schema, and table assets.

Copy

{
  "datasources": [
    {
      "type": "database",
      "namespace": "snowflake://lw08072.eu-central-1",
      "collibraSystemName": "My Snowflake",
      "dialect": "SNOWFLAKE"
    }
  ],
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^snowflake://(?P<host>.+)$",
      "nameTemplate": "{database}.{schema}.{table}",
      "dialect": "SNOWFLAKE"
    }
  ]
}

Parse Salesforce objects as single-level table assets

Salesforce datasets typically consist of a flat object name without a database or schema prefix. This definition identifies the Salesforce organization ID from the namespace and maps the entire dataset name to a single Table asset in Collibra.

Copy

{
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^salesforce://(?P<org_id>.+)$",
      "nameTemplate": "{table}",
      "dialect": "oracle"
    }
  ]
}

Extract file paths from custom cloud storage namespaces

Use this format for proprietary cloud storage schemes where the namespace identifies the storage bucket and the name represents a nested file path.

Copy

{
  "customNamespaceParsers": [
    {
      "type": "file",
      "namespacePattern": "^mycloud://(?P<bucket>.+)$",
      "nameTemplate": "{path}"
    }
  ]
}

Support multiple data source types in a single configuration

You can include multiple definitions in the customNamespaceParsers array to handle different systems in a single OpenLineage integration.

Copy

{
  "datasources": [
    {
      "type": "database",
      "namespace": "snowflake://lw08072.eu-central-1",
      "collibraSystemName": "My Snowflake",
      "dialect": "SNOWFLAKE"
    },
    {
      "type": "database",
      "namespace": "salesforce://00D3z000000gJOSKG1",
      "collibraSystemName": "My Salesforce",
      "dialect": "oracle"
    }
  ],
  "customNamespaceParsers": [
    {
      "type": "database",
      "namespacePattern": "^snowflake://(?P<host>.+)$",
      "nameTemplate": "{database}.{schema}.{table}",
      "dialect": "SNOWFLAKE"
    },
    {
      "type": "database",
      "namespacePattern": "^salesforce://(?P<org_id>.+)$",
      "nameTemplate": "{table}",
      "dialect": "oracle"
    },
    {
      "type": "file",
      "namespacePattern": "^mycloud://(?P<bucket>.+)$",
      "nameTemplate": "{path}"
    }
  ]
}

OpenLineage: Defining custom namespace

When to use custom namespace definition

Create a custom namespace definition

1. Identify the namespace and name format

2. Create the JSON configuration

Required fields

Specify the namespacepattern

Capture field values from the namespace:

Specify the nameTemplate

Specify the type and dialect

3. Add to the source configuration

How definitions are processed

Multiple definitions and priority

Error handling

Use with datasources

Examples

Map Matillion namespaces to the Snowflake 3-part name format

Parse Salesforce objects as single-level table assets

Extract file paths from custom cloud storage namespaces

Support multiple data source types in a single configuration

Specify the `namespacepattern`

Specify the `nameTemplate`

Specify the `type` and `dialect`

Use with `datasources`