Azure Data Factory source configuration

Updated:

Note This topic is only relevant if you are creating technical lineage via Edge. If you are using the CLI lineage harvester (deprecated), you need to create a <source ID> configuration file. The CLI harvester will officially reach its End of Life on July 31, 2026.

The Source configuration field in theAzure Data Factory technical lineage Edge capability allows you to map the names of databases in Azure Data Factory to the names of System assets in Data Catalog.

The value of the Source configuration field must be a valid block of JSON code, for example:

Copy
{
     "found_dbname=databasename1;found_hostname=server-name.onmicrosoft.com;found_schema=schema1": {
         "dbname": "mssql-database-name",
         "schema": "mssql-schema-name",
         "dialect": "mssql",
         "collibraSystemName": "mssql-system-name"
     },
     "found_dbname=datafactory_linkedservice;found_hostname=*": {
         "dbname": "linkedservice-dbname",
         "schema": "linkedservice-schema",
         "collibraSystemName": "linkedservice-system-name"
     }
}

The following table describes the various properties you can use in your JSON code block.

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name> | found_dbname=<datafactory_name>_<linkedservice_name>;found_hostname=*

The information of the supported data sources in Azure Data Factory to be collected by Collibra Data Lineage. You can specify any of the following values for the found_dbname property:

  • A database name. And then you can specify the following properties:
    • found_hostname=<server name>, where <server name> is the name of the server that the database is running on.
    • found_schema=<schema name>, where <schema name> is the name of the schema. This property is optional.
  • The combination of <datafactory_name>_<linkedservice_name>, where <datafactory_name> is a data factory name and <linkedservice_name> is a linked service name. If you use this combination, specify * for the found_hostname property.

You can use wildcards to capture multiple connection string combinations:

Pattern Description
* Matches everything.
? Matches any single character.
[seq] Matches any character in "seq".
[!seq] Matches any character not in "seq".

Yes

dbname

The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

No

schema

The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

If the Collibra Data Lineage fails to find the schema that you specify, it uses the default schema.

No

dialect

If you specify a database name for the found_dbname property, select one of the following dialects. If you specify a linked service name for the found_dbname property, ignore this property.

No

collibraSystemName

The system or server name of the data source.

The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

If you don't specify a value for this property, DEFAULT is shown in the technical lineage.

How to configure this property if you have two databases with the same name

Let's assume you have two databases named Customers. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-Europe and Customers-USA. You can then configure this property as follows.

Copy
"found_dbname=databasename1;found_hostname=*;found_schema=schema1": {
    "dbname": "Customers",
    "schema": "mssql-schema-name",
    "dialect": "mssql",
    "collibraSystemName": "Customers-Europe"
},
"found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": {
    "dbname": "Customers",
    "schema": "oracle-schema-name",
    "dialect": "oracle",
    "collibraSystemName": "Customers-USA"
},

No