Prepare Matillion <source ID> configuration file

You use the lineage harvester configuration file to access Matillion data objects. The lineage harvester processes the data objects to create a technical lineage. However, if the useCollibraSystemName property in the lineage harvester configuration file is set to true, you also have to provide a <source ID> configuration file to define the system name for all sources and targets in the Matillion integration.

This is useful if you have multiple databases with the same name and want to distinguish between them in the technical lineage harvester by specifying the system or server specific to each.

Note To preserve stitching, you need a System asset in Data Catalog of the same name of each system or server you specify in your <source ID> configuration file.

Prerequisites

  • The useCollibraSystemName in the lineage harvester configuration file is set to true.
  • You have Admin permission on all objects that you want to harvest.

Steps

  1. Create a new JSON configuration file in the lineage harvester config folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example If the value of the id property in the lineage harvester configuration file is matillion-source-1, then the name of your JSON file should be matillion-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each Matillion connection, you can add the following content to the JSON file:
    Example 

    Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>

    The information of the supported data sources in Matillion to be collected by Collibra Data Lineage.

    <database name>
    The database name in Matillion.
    <server name>
    The name of the server that the database is running on. You can specify found_hostname=* to include all servers.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname

    The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

    If you leave this property blank, the database is stitched to the database of DEFAULT in Data Catalog.

    No

    schema

    The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

    If you leave this property blank, the schema is stitched to the schema of DEFAULT in Data Catalog.

    No

    dialect

    Select one of the following dialects for your data source

    No

    collibraSystemName

    The system or server name of a database. Specify this property to ensure mapping between the Matillion source names and the System assets in your Collibra Data Catalog.

    If you leave this property blank, the system is stitched to the system of DEFAULT in Data Catalog. If you are missing lineage or your lineage objects aren’t stitching to Catalog assets in Data Catalog as you expect, ensure this property is specified properly.

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Matillion, you are not required to:
    • Set the useCollibraSystemName property in the lineage harvester configuration file to true.
    • Specify a Collibra system name in the <source ID> configuration file.
    However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, you must specify a Collibra system name in the <source ID> configuration file.

    Yes

  4. Save the configuration file.