Matillion source configuration

Updated:

Note This topic is only relevant if you are creating technical lineage via Edge. If you are using the CLI lineage harvester (deprecated), you need to create a <source ID> configuration file. The CLI harvester will officially reach its End of Life on July 31, 2026.

The Source configuration field in the Technical Lineage for Matillion capability allows you to map the names of databases in Matillion to the names of their corresponding System assets in Data Catalog.

The value of the Source configuration field must be a valid block of JSON code, for example:

Copy
{
    "found_dbname=dbtest;found_hostname=test": {
        "collibraSystemName": "mssql-system-name"
    },
    "found_dbname=testsid;found_hostname=*": {
        "dbname": "oracle-database-name",
        "schema": "oracle-schema-name",
        "collibraSystemName": "oracle-system-name"
    }    
}

The following table describes the various properties you can use in your JSON code block.

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>

The information of the supported data sources in Matillion to be collected by Collibra Data Lineage.

<database name>
The database name in Matillion.
<server name>
The name of the server that the database is running on. You can specify found_hostname=* to include all servers.
Note Define a connection in the connection definitions only once; specifically, define a data source with the found_dbname and found_hostname properties specified only once in the connection definitions. If you define a connection multiple times, unexpected lineage and stitching issues might occur.

You can use wildcards to capture multiple connection string combinations:

Pattern Description
* Matches everything.
? Matches any single character.
[seq] Matches any character in "seq".
[!seq] Matches any character not in "seq".

Yes

dbname

The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

If you leave this property blank, the database is stitched to the database of DEFAULT in Data Catalog.

No

schema

The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

If you leave this property blank, the schema is stitched to the schema of DEFAULT in Data Catalog.

No

collibraSystemName

The system or server name of the data source.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

If you leave this property blank, the system is stitched to the system of DEFAULT in Data Catalog. If you are missing lineage or your lineage objects aren’t stitching to Catalog assets in Data Catalog as you expect, ensure this property is specified properly.

How to configure this property if you have two databases with the same name

For example, if you have two databases named Customers. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-Europe and Customers-USA. You can then configure this property as follows.

Copy
{
    "found_dbname=dbtest;found_hostname=test": {
        "dbname": "Customers",
        "collibraSystemName": "Customers-Europe"
    },
    "found_dbname=testsid;found_hostname=*": {
        "dbname": "Customers",
        "schema": "oracle-schema-name",
        "collibraSystemName": "Customers-USA"
    }    
}

No