Prepare a <source ID> configuration file

Warning The lineage harvester is now deprecated and will officially reach its end-of-life on July 31, 2026. To ensure a smooth transition, we encourage you to begin creating technical lineage via Edge, if you haven't already.

Updated: May 6, 2025

Depending on your data source, you might have to, or want to, prepare a <source ID> configuration file. Select your data source below for data source-specific information.

The lineage harvester uses a lineage harvester configuration file to collect the Azure Data Factory data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.

For each database in Azure Data Factory, add the following content to the JSON file:

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name> | found_dbname=<datafactory_name>_<linkedservice_name>;found_hostname=*

The information of the supported data sources in Azure Data Factory to be collected by Collibra Data Lineage. You can specify any of the following values for the found_dbname property:

A database name. And then you can specify the following properties:
- found_hostname=<server name>, where <server name> is the name of the server that the database is running on.
- found_schema=<schema name>, where <schema name> is the name of the schema. This property is optional.

The combination of <datafactory_name>_<linkedservice_name>, where <datafactory_name> is a data factory name and <linkedservice_name> is a linked service name. If you use this combination, specify * for the found_hostname property.

Tip

You can use wildcards to capture multiple connection string combinations:

Yes

dbname

The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

schema

The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

If the Collibra Data Lineage fails to find the schema that you specify, it uses the default schema.

dialect

If you specify a database name for the found_dbname property, select one of the following dialects. If you specify a linked service name for the found_dbname property, ignore this property.

collibraSystemName

The system or server name of the data source.

Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

If you don't specify a value for this property, DEFAULT is shown in the technical lineage.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the DataStage data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-datastage, the name of your JSON file must be my-datastage.conf.

For each database in DataStage, add the required content to the JSON file.

Property	Description
OdbcDataSources	Open Database Connectivity data sources in IBM InfoSphere DataStage for which you want to create a technical lineage.
<data-source-name>	The ODBC data source name that you use in your DataStage projects. This section contains the properties to translate the database, schema and dialect.
dbname	The name of your database, to which the ODBC data source connection refers.
schema	The name of your schema, to which the ODBC data source connection refers.
dialect	The dialect of the referenced database. See the list of allowed values. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source.
collibraSystemName	The system or server name of the data source. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog. This property is optional.
NonOdbcConnectors	Other data source connectors in IBM InfoSphere DataStage for which you want to create a technical lineage. For example, DB2, Oracle or Netezza. Note This section is optional.
<data-source-connector-ID>	The data source username and database of the connector that you use in your DataStage projects. This usually looks like for example admin@database-name. The combination of the username and database name should be unique. The following section contains the properties to translate the database, schema and dialect.
dbname	The name of your database, to which the data source connection refers.
schema	The name of your schema, to which the data source connection refers.
dialect	The dialect of the referenced database. See the list of allowed values. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source.
collibraSystemName	The system or server name of the data source. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog. This property is optional.
Jobs	The jobs that you want the lineage harvester to collect and process to create the technical lineage. This section is optional. The following rules apply when you specify this section: Specify jobs that are executed so that the technical lineage graph does not include any job parameters with undefined values. Specify only the first and parent jobs in a sequence of executed jobs. The lineage harvester automatically collects all jobs that are called by the parent jobs. For details about how CollibraData Lineage parses DataStage jobs and resolves parameters, see Transformation logic and common errors for DataStage.
JobParameters	The runtime parameters that are not in the DSX and ENV files. You can specify multiple job parameters.
name	The name of the job parameter. You can specify any of the following values: A parameter name A user variable A parameter set Important Do not enclose the name between "#" characters, for example `"name": "#name#"`
value	The value of the job parameter. You can specify one of the following values, depending on the value of the `name` property: If a parameter name is specified for the `name` property, specify one of the following values: A parameter value A parameter reference If a user variable is specified for the `name` property, specify one of the following values: A parameter value A parameter set reference If a parameter set is specified for the `name` property, specify this property with a value file name. For details about how the values are resolved, see the Parameter resolution section in Transformation logic and common errors for DataStage.
perJobParameters	The parameters of a specific job. For example, you ingest multiple jobs where the parameters have the same name, but different values. Note This value takes precedence over the values specified in the JobParameters property. Otherwise, the original jobParameters field is used as the “default” option.
jobID	The ID of the job.
name	The name of the job parameter. You can specify any of the following values: A parameter name A user variable A parameter set Important Do not enclose the name between "#" characters, for example `"name": "#name#"`
value	The value of the job parameter. You can specify one of the following values, depending on the value of the `name` property: If a parameter name is specified for the `name` property, specify one of the following values: A parameter value A parameter reference If a user variable is specified for the `name` property, specify one of the following values: A parameter value A parameter set reference If a parameter set is specified for the `name` property, specify a value file name as the value. For details about how the values are resolved, see the Parameter resolution section in Transformation logic and common errors for DataStage.

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the dbt Core data objects. It then sends the metadata to Collibra Data Lineage service for processing. You can use this <source ID> configuration file to reduce the amount of data objects to be processed and enhance the lineage harvester performance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-dbt-core, the name of your JSON file must be my-dbt-core.conf.

For each database in dbt Core, add the following content to the JSON file:

Property	Description	Required?
Collibra System Name	The system or server name of the data source. This field is also the full name of your System asset in Data Catalog. The value of this field must be the same as the full name of the System asset that you created when you registered the data source.	No
projects	This section contains the Collibra system names.	No
collibraSystemName	The system or server name of the data source. This is also the name of your System asset in Data Catalog: Specify this property with the same name as the name of the System asset that you created when you registered the data source. See an example. In this code example, the project is stitched to the `systemname1` System asset in Data Catalog. { "collibraSystemNames":{ "projects":[ {"collibraSystemName":"systemname1"} ] }, }	No
materializedMapping	Indicates how materializations in dbt are mapped. If you do not specify this property, CollibraData Lineage maps materializations to tables by default. You can change the mapping of a materialization to view. In the following example, the ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES materialization is mapped to a view. "materializedMapping":{ "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW" }	No

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the dbt Cloud data objects. It then sends the metadata to Collibra Data Lineage service for processing. By default, the lineage harvester downloads all accounts that are accessible with the API token that you provided in the lineage harvester configuration file. For each account, the lineage harvester downloads all jobs and the resulting dbt models for each job. You can use this <source ID> configuration file to reduce the amount of data objects to be downloaded and enhance the lineage harvester performance in the following ways:

Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
Map a materialization as a view instead of a table by specifying the materializedMapping property.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-dbt-cloud, the name of your JSON file must be my-dbt-cloud.conf.

For each database in dbt Cloud, add the following content to the JSON file:

Property	Description	Required?
collibraSystemNames	You can use this section to specify the Collibra System Name for each project.	No
projects	This section contains the project names and the Collibra system names.	No
project_id	Your project ID. You can find the project ID in the dbt URL right after `projects`. For example, if your dbt URL is `https://cloud.getdbt.com/develop/54321/projects/12345` , your project_id is `12345`.	No
collibraSystemName	The system or server name of the data source. This is also the name of your System asset in Data Catalog: Specify this property with the same name as the name of the System asset that you created when you registered the data source. See an example. In this code example, the project with the `12345` project ID is stitched to the `systemname1` System asset in Data Catalog. { "collibraSystemNames":{ "projects":[ {"project_id":"12345","collibraSystemName":"systemname1"} ] }, }	No
filter	You can use this section to include projects and jobs to be downloaded. Collibra Data Lineage downloads and processes only the specified jobs and projects. See an example. In this code example, the job with the 1234 job ID and the projects with the 98 and 5678 project IDs are downloaded. { "filter": { "jobIds": [ 1234 ], "projectIds": [ 98, 5678 ] } }	No
jobIds	The job IDs of the jobs that you want to include. Specify an integer. Do not specify a string. To get your job ID, in your dbt, select Deploy and then Jobs. Select a job and you can find your job ID in the URL. For example, if your URL is `cloud.getdbt.com/deploy/65432/projects/23456/jobs/123456`, `123456` is your job ID.	No
projectIds	The project IDs of the projects that you want to include. Specify an integer. Do not specify a string. You can find the project ID in the dbt URL right after `projects`. For example, if your dbt URL is `https://cloud.getdbt.com/develop/54321/projects/12345` , your project_id is `12345`.	No
materializedMapping	Indicates how materializations in dbt are mapped. If you do not specify this property, CollibraData Lineage maps materializations to tables by default. You can change the mapping of a materialization to view. In the following example, the ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES materialization is mapped to a view. "materializedMapping":{ "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW" }	No

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Informatica PowerCenter data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-infa, the name of your JSON file must be my-infa.conf.

For each database, add the required content to the JSON file.

If certain properties are not specified in the source ID file, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.

Property	Description
connectionDefinitions	This section contains the connection properties to a source in Informatica PowerCenter.
<connectionName>	The type of your source or target data source. This section contains the connection properties to a source or target in Informatica PowerCenter. Note Define a connection in the JSON file only once; specifically, define a data source with the `<connectionName>` property specified only once in the JSON file. If you define a connection multiple times, unexpected lineage and stitching issues might occur.
dbname	The name of your source or target database. When you specify the `dbname` and `schema` properties, Collibra Data Lineage can stitch the data objects to the assets in Data Catalog. If the properties are not specified, the data objects are not stitched.
schema	The name of your source or target schema. When you specify the `dbname` and `schema` properties, Collibra Data Lineage can stitch the data objects to the assets in Data Catalog. If the properties are not specified, the data objects are not stitched.
dialect	The dialect of the referenced database. If you specify a dialect for a database, the value overrides the dialect that you specify in the lineage harvester configuration file for this database. For any databases that do not have a dialect specified in the source ID file, the dialect that you specify in the lineage harvester configuration file is used as a global dialect. See the list of allowed values. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source.
collibraSystemNames	This section contains the system or server name that is specified in your database and referenced in your connection. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog. The following rules apply when you specify the `collibrasystemname` properties in this file and the lineage harvester configuration file: If you specify this property for a database or connection, the value of this property overrides the value in the lineage harvester configuration file for the database or connection. For any databases or connections that do not have a Collibra system name specified in the source ID file, the value of the `collibrasystemname` property in the lineage harvester configuration file is used as a global value.
databases	This section contains the database information. This is required to connect directly to the system or server of the database.
dbname	The name of the database. The database name is the same as the name you entered in the <connectionName> section.
collibraSystemName	The system or server name of the database.
connections	This section contains the connection information. This is required to reference to the system or server of the connection.
connectionName	The name of the connection.
collibraSystemName	The system or server name of the connection.

Important If you are using variables in Informatica PowerCenter, add the value of the variable instead of the name in the connection definitionsJSON file. For example, if the parameter file contains $DBConnection_dwh=DWH_EXPORT, add the following connection definitions to the JSON file:

{
	"DWH_EXPORT":

		{ "dbname": "DWH", "schema": "DBO" }
}

Save the <source ID> configuration file.

You use the lineage harvester configuration file to access Informatica Intelligent Cloud Services Data Integration data objects. The lineage harvester processes the data objects to create a technical lineage. You also have to prepare a specific <source ID> configuration file that defines the Intelligent Cloud Services system name.

Important You must prepare a <source ID> configuration file regardless of whether the useCollibraSystemName property in your lineage harvester configuration files is set to true or false.

Prerequisites

You have Admin permission on all objects that you want to harvest.

Example

Steps

Create a new JSON configuration file in the lineage harvester config folder.
If you have a data source with a large size for an Informatica Intelligent Cloud Services connection, consider creating more than one JSON file for the data source. Each JSON file must have a unique name. The contents in the JSON files are the same. In this way, you can avoid errors that might occur when the lineage harvester ingests metadata from one source with a large size.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example If the value of the Id property in your lineage harvester configuration file is iics-source-1, then the name of your JSON file should be iics-source-1.conf.

Important Your JSON file must have the file extension .conf.

For each Informatica Intelligent Cloud Services connection, you can add the following content to the JSON file:

Property	Description	Required?
collibraSystemNames	This section contains the system information for Informatica Intelligent Cloud Services.
connections	This section contains the system connection information. This is required to reference to the system or server of the connection.
connectionName	The name of the connection. The name must match the System asset name in Data Catalog for stitching.	Yes
collibraSystemName	The system or server name of the data source. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.	No
connectionDefinitions	This section contains the database, schema and dialect information for each connection in Informatica Intelligent Cloud Services. Note You can add connection information for each connection in the `connections` section.
connectionName	The name of the connection. The name must match with the name in a connection name in the `connections` section. This property is required.	Yes
databaseName	The name of your database. The name must match the Database asset name in Data Catalog for stitching.	Yes
schemaName	The name of your schema. The name must match the Schema asset name in Data Catalog for stitching.	Yes
dialect	The dialect of the connection. Specify this property for Collibra Data Lineage to properly extract and parse queries that are related to this connection. You can enter one of the following values: `bigquery` `db2` `hana` `hive` `greenplum` `mssql` `mysql` `netezza` `oracle` `postgres` `redshift` `snowflake` `spark` `teradata`	No

Save the configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the Looker data objects and send them to the Collibra Data Lineage service instance.

The <source ID> configuration file allows you to:

Filter on the Looker folders from which you want to ingest metadata.
If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example The value of the Id property in the lineage harvester configuration file is looker-source-1. As a result, the name of your JSON file should be looker-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in Looker, add the following content to the JSON file:

Property

Description

Mandatory?

Connections

This section contains all Looker connections for which you want to create a technical lineage.

Yes

The name of a connection object in Looker.

Yes

schema

The name of the default schema of a supported data source in Looker.

If the lineage harvester fails to find a specific schema, it uses the default schema.

dbname

The name of the database of a supported data source in Looker.

collibraSystemName

The system or server name of a database.

If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

Yes

filters

Optionally, use this section to specify the Looker folders from which you want to ingest metadata.

Note You can filter on Looker folders, but not on Looker data sets. That's because Looker data sets are linked directly to the server, instead of a folder, as shown in the Looker metadata overview. Looker data sets are ingested in the default domain, regardless of any filtering.

Let’s say, for example, you filter on folder B. A Looker Folder asset is created in the specified domain in Collibra, and all of the metadata in folder B is ingested. If folder B has a parent folder A, then a Looker Folder asset is created (in the domain specified for folder B) to preserve the hierarchy, but no metadata from folder A is ingested.

You can specify more than one Looker folder for ingestion into a single domain in Collibra.

Warning If you don't want to filter on Looker Folders, you must completely remove this filters section.

Tip There are significant benefits to filtering by folder ID. For information, see the filters > folderIdsproperty description.

Tip

You can use wildcards to capture multiple connection string combinations:

domainId

The unique resource ID of the domain (or domains), in Collibra, in which you want to ingest data objects from one or more Looker Folders.

Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

description

Any description, as you see fit.

folderNames

The name (or names) of the Looker Folders from which you want to ingest.

Note You must specify either a folder name, a folder ID, or both.

folderIds

The ID (or IDs) of the Looker Folder you want to ingest.

Note You must specify either a folder ID, a folder name, or both.

Tip If you filter by folder ID, filtering is carried out via the API, instead of on the Collibra Data Lineage service instances.

When you filter by folder ID, the lineage harvester accesses only the folders you specify via this property, and sends only that metadata to the Collibra Data Lineage service instance for processing and ingestion in Data Catalog. Conversely, if you filter by folder name (via the folderNames property), metadata from all Looker folders is sent to the Collibra Data Lineage service instance. Only then is filtering applied.

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Matillion data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-matillion, the name of your JSON file must be my-matillion.conf.

Add the required content to the JSON file.

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>

The information of the supported data sources in Matillion to be collected by Collibra Data Lineage.

<database name>: The database name in Matillion.
<server name>: The name of the server that the database is running on. You can specify found_hostname=* to include all servers.

Note Define a connection in the connection definitions only once; specifically, define a data source with the found_dbname and found_hostname properties specified only once in the connection definitions. If you define a connection multiple times, unexpected lineage and stitching issues might occur.

Tip

You can use wildcards to capture multiple connection string combinations:

Yes

dbname

The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

If you leave this property blank, the database is stitched to the database of DEFAULT in Data Catalog.

schema

The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

If you leave this property blank, the schema is stitched to the schema of DEFAULT in Data Catalog.

collibraSystemName

The system or server name of the data source.

Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

If you leave this property blank, the system is stitched to the system of DEFAULT in Data Catalog. If you are missing lineage or your lineage objects aren’t stitching to Catalog assets in Data Catalog as you expect, ensure this property is specified properly.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to MicroStrategy. You must also prepare a MicroStrategy <source ID> configuration file to:

Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
Note Mapping doesn't work for custom SQL.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
Example If the value of the Id property in the lineage harvester configuration file is mstr-source-1, then the name of your JSON file should be mstr-source-1.conf.
Important Your JSON file must have the file extension .conf.

Property	Description	Mandatory
default_domain_id	The domain in which you want the corresponding assets of MicroStrategy metadata to be ingested. Note If you configure filtering, only the MicroStrategy Server asset is ingested into this default domain.	Yes
filters	This section allows you to specify: From which MicroStrategy projects you want to harvest metadata. Into which domains in Collibra you want to ingest the corresponding assets. If you don't want to filter on projects, don't include this section in your <source ID> configuration file.	No
domainId	The unique resource ID of the domain (or domains) in Collibra in which you want to ingest the MicroStrategy assets. Tip If you use a `filters` section, you must include the `domainId` property in the section. If, by chance, you want to filter on certain projects, but you want to ingest all assets into the default domain, then the value of the `domainId` property must match the value of the `default_domain_id` property. Show me an example "default_domain_id": "1234567890", "filters": [ { "domainId": "1234567890", "projectNames": ["MicroStrategy Tutorial","Testing_MSTR"] }, How do I find a domain reference ID? Open the relevant domain in Collibra. The URL looks like: https://<yourcollibrainstance>/domain/22258f64-40b6-4b16-9c08-c95f8ec0da26?view=00000000-0000-0000-0000-000000040001. In this example, the reference ID is in bold.	Yes
projectIds	The IDs of the MicroStrategy projects from which you want to ingest metadata.	No
projectNames	The project names of the MicroStrategy projects from which you want to ingest metadata.	No
datasourceMapping	This optional section allows you to configure data source mapping. Include this section only if you need to differentiate between multiple data sources that have the same name. Note Mapping doesn't work for custom SQL.	No
found_datasource	The name of the data source that was returned by the lineage harvester, as shown in the technical lineage. Note The data source name is case-sensitive.	Yes
found_project	The name of the project in which the data source information resides. You can specify an asterisk (*) to search for data source information across all projects.	Yes
mapping	Use this section to map the data source name that was returned by the lineage harvester to the true name of the data source. Example You have a Redshift data source named "RD_pearl", but the lineage harvester has returned the name "Redshift_connection". You can configure the `datasourceMapping` section as follows: { "datasourceMapping": [ { "found_datasource": "REDSHIFT", "found_project": "*", "mapping": { "dbname": "RD_pearl", "collibraSystemName": "TV_dev" } } ] }	Yes
dbname	The name of the database to which you want to map the found data source.	Yes
schema_name	The name of the schema in MicroStrategy.	No
dialect	The dialect of the data source in MicroStrategy.	No
collibraSystemName	The system or server name of a database. If you set the `useCollibraSystemName` property to `true` in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the `collibraSystemName` property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT". If you set the `useCollibraSystemName` property to `false` in your lineage harvester configuration file, leave this property empty as follows: `"collibraSystemName": ""`. How do I configure this property if I have two databases with the same name? Let's assume that you have a data source named Customers. You use this data source connection in two different projects, Project_A and Project_B, but they are actually two different databases. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-North and Customers-South. You can then configure this property as follows. "datasourceMapping": [ { "found_datasource": "Customers", "found_project": "Project_A", "mapping": { "dbname": "Customers", "collibraSystemName": "Customers_North" } }, { "found_datasource": "Customers", "found_project": "Project_B", "mapping": { "dbname": "Customers", "collibraSystemName": "Customers_South" } } ] Warning The values of this property must exactly match the name of your System asset in Collibra.	Yes

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Power BI data objects. It then sends the metadata to the Collibra Data Lineage service instances.

The <source ID> configuration file allows you to:

Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
Note Mapping doesn't work for custom SQL.
Configure filtering. We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.

Tip You can now use filtering v2, which allows you to filter on dashboards and reports — including in-app reports — in addition to capacities and workspaces. You are still free to use your v1 filter configuration. Both methods are addressed in this topic.

Filtering v2
Filtering v1

Example

Filter validation

Filter configurations are validated against the following scenarios:

Duplicate keywords.
Unknown or unsupported keywords.
Contradicting inclusion and exclusion filters.
Mixed filter v1 and filter v2 keywords.
A single workspace is mapped to more than one domain. (In this case, only the first filter is considered.)

If validation fails for any of these scenarios, a warning with failure details is shown in an analyze error on the Technical lineage Sources tab page. Critical errors occur only if the <source ID> configuration file is incorrectly formatted or doesn’t contain valid keywords. In such cases, the filter configuration is not processed. If configured inclusion and exclusion filters are contradicting, only the exclusion filter is taken into consideration.

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the sourceId property in the lineage harvester configuration file.
The value of the sourceId property in the lineage harvester configuration file is power-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.
Important Your JSON file must have the file extension .conf.
For each database in Power BI, add the following content to the JSON file:

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

The database information of supported data sources in Power BI that is typically collected by the lineage harvester. Specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema). You then use the child properties to map the names collected by the lineage harvester to the true names.

Important The keys that you specify must be unique.

Note During metadata analysis, if Collibra Data Lineage cannot match a name that you provide in this mapping – let's say, for example, you mistype the name of the database – an analyze error is produced.

Tip

You can use wildcards to capture multiple connection string combinations:

Show the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

dbname

The true name (display name) of the database collected by the lineage harvester.

schema

The true name (display name) of the schema collected by the lineage harvester.

If the lineage harvester fails to find a specific schema, it uses the schema you specify in this property.

Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

dialect

The dialect of the supported data source in Power BI.

collibraSystemName

The system or server name of a database.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:

Set the useCollibraSystemName property in the lineage harvester configuration file to true.
Specify a Collibra system name in the <source ID> configuration file.

However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.

Yes (unless you are using the <source ID> file to provide the true system names of ODBC databases in Power BI.)

filters

This section allows you to specify the Power BI workspaces from which you want to ingest metadata.

If you specify a capacity, all of the workspaces in that capacity are also ingested.

Workspace filtering takes precedence over capacity filtering, meaning workspaces are filtered first. If there is no explicit exclusion of capacities containing workspaces, all capacities containing workspaces are ingested. Filtering of reports and dashboards is subordinate to workspace filtering, meaning that to include reports and dashboards from a certain workspace, that workspace has be ingested as well. Reports and dashboards from a single workspace cannot be ingested in different domains. Any configured dashboard and report filtering is then taken into consideration.

Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name Sale and Marketing [automobiles] must be formatted as follows:
Sale and Marketing [[]automobiles[]]

Important If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.

Tip

You can use wildcards to capture multiple connection string combinations:

Show the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

domainId

The unique resource ID of the domain (or domains), in Collibra Platform, in which you want to ingest the Power BI assets.

Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

Yes

description

Any description, as you see fit.

capacityFilter

This section allows you to specify the capacities from which you want to ingest metadata. You can include certain capacities and exclude others.

includedNames

The names of the capacities from which you want to ingest metadata.

includedIds

The IDs of the capacities from which you want to ingest metadata.

excludedNames

The names of the capacities that you want to exclude from metadata ingestion.

excludedIds

The IDs of the capacities that you want to exclude from metadata ingestion.

workspaceFilter

This section allows you to specify the workspaces from which you want to ingest metadata. You can include certain workspaces and exclude others.

Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.

includedNames

The names of the workspaces from which you want to ingest metadata.

includedIds

The IDs of the workspaces from which you want to ingest metadata.

excludedNames

The names of the workspaces that you want to exclude from metadata ingestion.

This is useful if you want to exclude, for example, dedicated development and testing workspaces.

Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.

excludedIds

The IDs of the workspaces that you want to exclude from metadata ingestion.

dashboardFilter

This section allows you to specify the dashboards from which you want to ingest metadata. You can include certain dashboards and exclude others.

includedNames

The names of the dashboards from which you want to ingest metadata.

includedIds

The IDs of the dashboards from which you want to ingest metadata.

excludedNames

The names of the dashboards that you want to exclude from metadata ingestion.

excludedIds

The IDs of the dashboards that you want to exclude from metadata ingestion.

reportFilter

This section allows you to specify the reports from which you want to ingest metadata. You can include certain reports and exclude others.

includedNames

The names of the reports from which you want to ingest metadata.

includedIds

The IDs of the reports from which you want to ingest metadata.

excludedNames

The names of the reports that you want to exclude from metadata ingestion.

excludedIds

The IDs of the reports that you want to exclude from metadata ingestion.

includedInApp

Use this property to specify that you want to ingest reports that are included in published Power BI apps.

To include in-app reports, specify the value "includedInApp": true.

To exclude in-app reports, specify the value "includedInApp": false.

Save the <source ID> configuration file.

Considerations

Workspace filtering takes precedence over capacity filtering, meaning workspaces are filtered first. Report filtering and dashboard filtering are subordinate to both capacity filtering and workspace filtering.

Capacities that are empty after workspace filtering and do not pass a filter are excluded. In the following example, workspace "workspace_1" is in capacity "CAPACITY_A". The metadata also includes a capacity "CAPACITY_B", but it is not mentioned for inclusion in the filtering, so it is empty. Only "CAPACITY_A" is included.

{
  "filters": [
    {
      "description": "description",
      "domainId": "d0f2966c-018b-4e8a-9085-266b3c01c46f",
      "capacityFilter": {
        "includedNames": ["CAPACITY_A"],
      },
      "workspaceFilter": {
        "includedNames": ["workspace_1"]
      }
    }
  ]
}

However, if there is no capacity filtering, all capacities are included, even if one or more capacities contain no workspaces due to filtering. This is because all capacities are treated as "included" in filters, unless otherwise specified. In the following example, workspace "workspace_1:" is in capacity "CAPACITY_A". The metadata also includes a capacity "CAPACITY_B". Because there is no capacity filtering, both capacities are included.

{
  "filters": [
    {
      "description": "description",
      "domainId": "d0f2966c-018b-4e8a-9085-266b3c01c46f",
      "workspaceFilter": {
        "includedNames": ["workspace_1"]
      }
    }
  ]
}

Inclusion and exclusion properties used with the includedInApp property for reports are applied using the AND logical operator. In the following example,“report1”, which is published in an app, is included. “report2”, which is not published in an app, is not included.

"reportFilter": {
  "includedNames": ["report1", "report2"]
  "includedInApp": true
}

Examples

In the following example:

Only reports with names that match ABC report* and are in workspace ABC1 are included.
Reports that are not in workspace ABC1 are not included.
Reports that are in capacity ABC Capacity are not included.

{
  "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
  "description": "Filter by display name",
  "capacityFilter": {
    "excludedNames": ["ABC Capacity"]
  },
  "workspaceFilter": {
     "includedNames": ["ABC1"]
  },
  "reportFilter": {
    "includedNames": ["ABC report*"]
  }
}

In the following example, reports with names that match ABC report*, in any workspace, are included.

{
  "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
  "description": "Filter by display name",
  "reportFilter": {
    "includedNames": ["ABC report*"]
  }
}

For report filtering, inclusion and exclusion filters used in combination with the includedInApp property are applied using the AND logical operator. In the following example:

In-app report named report1 is included.
Let's say that a report named report2 is not in an app. That report is not included.

"reportFilter": {
    "includedNames": ["report1", "report2"],
    "includedInApp": true
  }

In the following example, all reports with names that match report1* are included, with the exception of report report1_backup.

{
  "filters": [
    {
      "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
      "description": "Some description",
      "reportFilter": {
        "includedNames": "report1*",
        "excludedNames": "*_backup"
      }
    }
  ]
}

In the following example, all reports named report1 in workspace workspace_name_1 (only) are included.

{
  "filters": [
    {
      "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
      "description": "description",
      "workspaceFilter": {
        "includedNames": "workspace_name_1"
      },
      "reportFilter": {
        "includedNames": "report1"
      }
    }
  ]
}

In the following example, all workspaces in capacity capacity1 and workspace workspace_name_1 are included.

{
  "filters": [
    {
      "capacityFilter": {
        "includedNames": "capacity1"
      },
      "workspaceFilter": {
        "includedNames": "workspace_name_1"
      },
      "description": "workspace and capacity filter",
      "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356"
    }
  ]
}

Filter configuration validation

There are several validation rules to ensure that filter configurations are valid and non-contradictory. Failure to pass validation does not affect the integration; rather warnings are generated and included in analyze errors and the logs.

In the following example, the same workspace is specified for inclusion and exclusion. If this case, the exclusion filter takes precedence, meaning workspace ABC2 is not included.

"workspaceFilter": {
     "includedNames": ["ABC2"],
     "excludedNames": ["ABC2"]
  }

The following error is the same scenario as in the previous example, except that wildcards are used. The result is the same, meaning workspace ABC2 is not included.

"workspaceFilter": {
     "includedNames": ["ABC*"],
     "excludedNames": ["ABC2"]
  }

In the following example, a warning is included in an analysis error because workspace ABC2 is specified in multiple filters.

"workspaceFilter": {
     "includedNames": ["ABC2"]
  },
  "workspaceFilter": {
     "includedNames": ["ABC2", "ABC3"]
  }

The includedInApp property is valid only for reports. meaning in a reportFilter section. In the following example, an analysis error is generated because it is used in the dashboardFilter section.

"dashboardFilter": {
    "includedInApp": true
  }

In the following example, a report Test report would qualify for inclusion because it passes both report includedInApp and report includedNames properties. However, due to the order of filtering, the report was already excluded before the inclusion properties were considered. Therefore, the report Test report is not included.

In this case, the report Test report (and any other reports that match the inclusion criteria but will not be included) is considered to have an "absent" parent. Configurations that result in dashboards or reports with absent parents result in analysis errors, and metadata of such dashboards and reports are not ingested.

{
  "filters": [
    {
      "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
      "description": "Filter by display name",
      "capacityFilter": {
        "excludedNames": "Excluded Capacity"
      },
      "workspaceFilter": {
        "excludedNames": "Test1"
      },
      "reportFilter": {
        "includedInApp": true
      }
    },
    {
      "domainId": "default",
      "description": "Filter by display name",
      "reportFilter": {
        "includedNames": "Test report*"
      }
    }
  ]
}

Continuing with this example, if you want to include report Test report in one domain, but not another, consider the following configuration:

{
  "filters": [
    {
      "domainId": "12g6d0dc-8291-476a-9bb0-9b13g6cc1356",
      "description": "Filter by display name",
      "workspaceFilter": {
        "excludedNames": ["Test1"]
      }
    },
    {
      "domainId": "d0f2966c-018b-4e8a-9085-266b3c01c46f",
      "description": "Filter by display name",
      "workspaceFilter": {
        "includedNames": ["Test1"]
      },
      "reportFilter": {
        "includedNames": [
          "Test report"
        ]
      }
    }
  ]
}

In this case, in the domain with ID ending "1356", neither the capacity nor the workspace that includes the report Test report is included. Therefore, you can include the already excluded workspace in the second filter, for the domain with ID ending "c46f".

Warnings in generated analysis errors about "absent" parents can help explain filtering behavior. In the following example, workspace filtering happens first, so the report Report in app is ingested in the domain with ID ending "c46f", thereby rendering obsolete the first filter, a report filter that targets the default domain.

{
  "filters": [
    {
      "domainId": "default",
      "description": "Filter by display name",
      "reportFilter": {
        "includedInApp": true
      }
    },
    {
      "domainId": "d0f2966c-018b-4e8a-9085-266b3c01c46f",
      "description": "Filter by display name",
      "workspaceFilter": {
        "includedNames": ["workspace_name_1"]
                        },
      "reportFilter": {
        "includedNames": ["Report in app"]
      }
    }
  ]
}

If you want to ingest reports into multiple domains, the following example shows the recommended configuration.

{
  "filters": [
    {
      "domainId": "d0f2966c-018b-4e8a-9085-266b3c01c46f",
      "description": "Filter by display name",
      "workspaceFilter": {
        "includedNames": ["workspace_name_1"]
      },
      "reportFilter": {
         "includedInApp": true
      }
    },
    {
      "domainId": "default",
      "description": "Filter by display name",
      "workspaceFilter": {
        "includedNames": ["workspace_name_2"]
    },
      "reportFilter": {
        "includedInApp": true
      }
    }
  ]
}

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the sourceId property in the lineage harvester configuration file.
Example The value of the sourceId property in the lineage harvester configuration file is power-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.
Important Your JSON file must have the file extension .conf.
For each database in Power BI, add the following content to the JSON file:

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

Important The keys that you specify must be unique.

Tip

You can use wildcards to capture multiple connection string combinations:

Show the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

dbname

The true name (display name) of the database collected by the lineage harvester.

schema

The true name (display name) of the schema collected by the lineage harvester.

If the lineage harvester fails to find a specific schema, it uses the schema you specify in this property.

dialect

The dialect of the supported data source in Power BI.

collibraSystemName

The system or server name of a database.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:

Set the useCollibraSystemName property in the lineage harvester configuration file to true.
Specify a Collibra system name in the <source ID> configuration file.

However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.

Yes (unless you are using the <source ID> file to provide the true system names of ODBC databases in Power BI.)

filters

This section allows you to specify the Power BI workspaces from which you want to ingest metadata.

If you specify a capacity, all of the workspaces in that capacity are also ingested.

Important If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.

Tip

You can use wildcards to capture multiple connection string combinations:

Show the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

domainId

The unique resource ID of the domain (or domains), in Collibra Platform, in which you want to ingest the Power BI assets.

Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

Yes

description

Any description, as you see fit.

workspaceNames

The names of Power BI workspaces from which you want to ingest metadata.

Important Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name "Sale and Marketing [automobiles]" should be formatted as follows:
Sale and Marketing [[]automobiles[]]

workspaceIds

The IDs of Power BI workspaces from which you want to ingest metadata.

Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.

capacityNames

The names of capacities on which you want to filter.

capacityIds

The IDs of capacities on which you want to filter.

Warning Any letters in a capacity ID must be in upper case.

excludeWorkspaceNames

The names of Power BI workspaces that you want to exclude from the ingestion job.

This is useful if you want to exclude, for example, dedicated development and testing workspaces.

For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

excludeWorkspaceIds

The IDs of Power BI workspaces that you want to exclude from the ingestion job.

This is useful if you want to exclude, for example, dedicated development and testing workspaces.

For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

Save the <source ID> configuration file.

Filter configuration validation

In the following example, the same workspace is specified for inclusion and exclusion. If this case, the exclusion filter takes precedence, meaning workspace ABC2 is not included.

"workspaceNames": ["ABC2"],
"excludeWorkspaceNames": ["ABC2"]

The following error is the same scenario as in the previous example, except that wildcards are used. The result is the same, meaning workspace ABC2 is not included.

"workspaceNames": ["ABC*"],
"excludeWorkspaceNames": ["ABC2"]

In the following example, a warning is included in an analysis error because workspace ABC2 is specified in multiple filters.

{
  "domainId": "<domain-ref-id>",
  "description": "FirstFilter",
  "workspaceNames": ["ABC2"]
},
{
  "domainId": "<domain-ref-id>",
  "description": "SecondFilter",
  "workspaceNames": ["ABC2", "ABC3"]
}

The lineage harvester uses the lineage harvester configuration file to collect the SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) data objects and send them to the Collibra Data Lineage service.

The <source ID> configuration file allows you to:

If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example The value of the Id property in the lineage harvester configuration file is ssrs-source-1. As a result, the name of your JSON file should be ssrs-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in SSRS and PBRS, add the following content to the JSON file:

Property	Description	Required?
DataSources	This section contains all connections for which you want to create a technical lineage. The `DataSources` section refers to shared data sources in SSRS and PBRS. For more information about shared data sources, see the Microsoft documentation.	Yes
<data source type>	The path of a connection object in SSRS and PBRS.	Yes
dbname	The name of the database of a supported data source in SSRS and PBRS.	No
schema	The name of the default schema of a supported data source in SSRS and PBRS.	No
dialect	The dialect of the supported data source in SSRS and PBRS.	No
collibraSystemName	The system or server name of the database. If you set the `useCollibraSystemName` property to `true` in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the `collibraSystemName` property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT". How do I configure this property if I have two databases with the same name? Let's assume you have two databases named Customers. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-Europe and Customers-USA. You can then configure this property as follows. "Redshift": { "dbname": "Customer", "schema": "redshift-schema-name", "dialect": "redshift", "collibraSystemName": "Customers-Europe" }, "Oracle": { "dbname": "Customer", "schema": "oracle-schema-name", "dialect": "oracle", "collibraSystemName": "Customers-USA" }	Yes
CustomDataSources	You can use custom data processing extensions that are used to support embedded data sources of which the data source definition is specified locally in a report or embedded data set. The `CustomDataSources` section refers to embedded data sources in SSRS and PBRS. For more information about embedded data sources, see the Microsoft documentation.	No
<path to report>/<custom data source name>	The full path to the report and the custom data source name. You can use wildcards to match multiple folders, reports or data sets. The connection information is this section is used to add missing information or to overwrite parsed information.	No
dbname	The name of the database of a custom data source in SSRS and PBRS.	No
schema	The name of the schema of a custom data source in power. If you don't provide the schema name, the default schema is used.	No
dialect	The dialect of the custom data source in SSRS and PBRS. Click for possible values: azure, for an Azure SQL Server data source. bigquery, for a Google BigQuery data source. db2, for an IBM DB2 data source. hana, for a SAP Hana data source. hive, for a HiveQL data source. greenplum, for a Greenplum data source. mssql, for a Microsoft SQL Server data source. mysql, for a MySQL data source. netezza, for a Netezza data source. oracle, for an Oracle data source. postgres, for a PostgreSQL data source. redshift, for an Amazon Redshift data source. snowflake, for a Snowflake data source. spark, for a Spark SQL data source. sybase, for a Sybase data source. teradata, for a Teradata data source.	No

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the SQL Server Integration Services data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-ssis, the name of your JSON file must be my-ssis.conf.

For each database, add the required content to the JSON file.

Property	Description	Required?
DataSources	The parent element that contains the connection definitions of your data sources in SQL Server Integration Services. If you specify the properties in this section and also the ConnStringRegExTranslation property for a data source, the connection definitions in the ConnStringRegExTranslation property takes precedence.	No
DataSourceName	The name of your data source.	No
dialect	The dialect of the database. See the list of allowed values. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source. If you want to use a Spark SQL data source, make sure that you have an AWS host.	No
collibraSystemName	The system or server name of the data source. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.	No
ConnStringRegExTranslation	The parent element that opens the connection definitions. If you specify this property and also the properties in the DataSources section for a data source, the connection definitions in this property takes precedence.	No
<regular expression>	A regular expression that must match one or more connection strings. Note Important considerations: By default, the regular expression is not case sensitive. As a consequence, a regular expression can match with connection strings containing uppercase characters or lowercase characters. The connection string is part of the SSIS connection manager. SSIS connection managers are included in an SSIS package files (DTSX) or in connection manager files (CONMGR). Example Regular expression: `Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306.` Explanation: The first section, up to ., is a literal, but not case-sensitive, match of the characters. The dot (.) can match any single character. The asterisk (*) means zero or more of the previous, in this case any character. Match: Any connection string that starts with `Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306`. Example: `Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306;Persist Security Info=True;Auto Translate=False;`.	No
dbname	The name of your database, to which the data source connection refers.	No
schema	The name of your schema, to which the regular expression refers.	No
dialect	The dialect of the database. See the list of allowed values. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source. You can enter one of the following values: `azure`, for an Azure SQL Server data source. `bigquery`, for a Google BigQuery data source. `db2`, for an IBM DB2 data source. `hana`, for an SAP HANA data source. `hana-cviews`, for getting lineage from calculated views in an SAP HANA Classic on-premises data source. `hana-cviews-v2`, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source. Important To get technical lineage including calculated views, you must harvest SAP HANA by specifying two data sources in the lineage harvester configuration file. In one data source, specify the `hana` dialect, and in the other, specify the `hana-cviews` or `hana-cviews-v2` dialect. `hive`, for a HiveQL data source. `greenplum`, for a Greenplum data source. `mssql`, for a Microsoft SQL Server data source. `mysql`, for a MySQL data source. `netezza`, for a Netezza data source. `oracle`, for an Oracle data source. `postgres`, for a PostgreSQL data source. `redshift`, for an Amazon Redshift data source. `snowflake`, for a Snowflake data source. `spark`, for a Spark SQL data source. `sybase`, for a Sybase data source. `teradata`, for a Teradata data source. If you want to use a Spark SQL data source, make sure that you have an AWS host.	No
collibraSystemName	The system or server name of the data source. Use this property with the `useCollibraSystemName` property in the lineage harvester configuration file to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.	No

Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to Tableau. You are not required to create a <source ID> configuration file, but you need one if you want to:

Define your Tableau operating model.
Provide additional information about databases and files in Tableau. For example, you can define the system name of files and connectors in Tableau.
Use the hostnameMapping property to map the database, schema or system names that were returned by the Tableau APIs to the actual names of the assets in Data Catalog. For complete information, go to Tableau hostname, schema, and system name mapping.
Note Mapping doesn't work for custom SQL.
Define in which domains in Collibra you want to ingest assets from your Tableau sites and projects. See the domainMapping and filters properties.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
Example If the value of the Id property in the lineage harvester configuration file is tableau-source-1, then the name of your JSON file should be tableau-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in Tableau, add the following content to the JSON file:

Tip You can use wildcards to capture multiple string combinations for any of these properties.

Property	Description	Mandatory?
collibraSystemNames	This section contains the system information for different Tableau data sources. Depending on the kind of data source or connection, you have to specify how to connect to this data source. Tip For more information, see the Tableau documentation. We also recommend to check the list of supported connectors in Tableau.	No
files	This section contains connection information to one or more files in Tableau. Tip If you do not have files in Tableau, you can remove this section.	No
filePath	The full path to the file. For example, the path to a JSON file.	No
collibraSystemName	The system name of the file.	No
connectors	This section contains connection information to one or more connectors in Tableau. Tip If you do not have connectors in Tableau, you can remove this section. The values that you specify for this property are not case-sensitive.	No
connectorUrl	The URL of the connector. For example, the URL to Google Analytics.	No
collibraSystemName	The system name of the connector.	No
cloudFiles	This section contains connection information to one or more cloud files in Tableau's input data. Tip If you do not have cloud files in Tableau, you can remove this section.	No
name	The name of the file. For example, the name of a Zendesk file.	No
collibraSystemName	The system name of the cloud file.	No
hostnameMapping	This section allows you to map Tableau technical database, server and schema names to the respective real names, to preserve stitching. Warning `hostnameMapping` replaces the following deprecated properties, which have been removed from this topic: The `databaseMapping` property. The `databases` sub-section of the `collibraSystemNames` section. `hostnameMapping` must not be used in combination with either of these properties. If you use the `hostnameMapping` section, you can still use the `collibraSystemName` property in conjunction with the `files`, `connectors` or `cloudfiles` sub-sections. Example configuration "hostnameMapping": { "found_dbname=databasename1;found_hostname=*;found_schema=test": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql", "collibraSystemName": "mssql-system-name" } } For more example configurations, go to Tableau hostname, schema, and system name mapping.	No
found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>	The database information of supported data sources in Tableau that is typically collected by the lineage harvester. It allows you to specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema).	No
dbname	The name of the database of a supported data source in Tableau.	No
schema	The name of the default schema of a supported data source in Tableau. If the lineage harvester fails to find a specific schema, it uses the default schema.	No
dialect	The dialect of the supported data source in Tableau. You don't have to specify a dialect; it will automatically be detected. If, however, you are using a dialect that is not supported, you can use this property to specify a supported dialect that is a close comparison. That way, most of your queries will be detected and processed. Show me a list of dialects of supported data sources in Tableau. redshift, for an Amazon Redshift data source. azure, for an Azure SQL Server data source. bigquery, for a Google BigQuery data source. greenplum, for a Greenplum data source. hive, for a HiveQL data source. oracle, for an Oracle data source. postgres, for a PostgreSQL data source. mssql, for a Microsoft SQL Server data source. mysql, for a MySQL data source. netezza, for a Netezza data source. hana, for a SAP HANA data source. spark, for a Spark SQL data source. sybase, for a Sybase data source. teradata, for a Teradata data source.	No
filters	This section defines: From which Tableau sites and projects you want to harvest metadata. Into which domains in Collibra you want to ingest the corresponding assets. Filtering is transitive, which means that all resources in a specified project, such as Tableau workbooks and all sub-projects, are ingested. Tableau assets that are not mapped to the specified domains, for example the Tableau Server assets and the parent projects (if you specify their sub-projects), are ingested in the default domain. Important Filtering does not affect the amount of raw metadata that is harvested from Tableau and sent to the Collibra Data Lineage service instance. Rather, it determines which metadata is ingested as assets in Data Catalog. The `domainMapping` and `filters` sections are mutually exclusive. Do not include both `domainMapping` and `filters` sections in your JSON file. Tip If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use the `domainMapping` section. If you want to ingest all of the projects in a Tableau site into the default domain, use only the `domainID` property in the lineage harvester configuration file. The `domainID` property represents the default domain. If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering. If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering. You can use site filtering and project filtering together: If filtering on the same site, this "filtering" is actually domain mapping, because nothing is filtered out. The contents of the projects are ingested in the specified domains, and the rest of the contents of the site are ingested in a different, specified domain. If you are site filtering on a specific site and project filtering a different site, then site filtering is again a form of domain mapping, and the filtered projects are ingested in their specified domains. If your lineage harvester configuration file includes sites that are not mentioned in the `filters` section of your <source ID> configuration file, those sites are ingested in the default domain.	No
sites	The Tableau sites to be ingested and the domain in which you want to ingest metadata from the Tableau sites. Tip If you have only one Tableau site, do not include a `sites` section in your <source ID> file. Instead, use a `projects` section, to filter on Tableau projects. Include a `sites` section only if all of the following are true: You have more than one Tableau site. You want to ingest all of the metadata from only one Tableau site into a single domain in Collibra. The domain into which you want to ingest is not the default domain, meaning the domain specified in the `domainId` property in your lineage harvester configuration file.	No
site_name: domain_id	`site_name` The name of the site to be ingested. The site name is case-sensitive. `domain_id` The unique reference ID of the domain in Collibra in which you want to ingest metadata. The domain ID is case-sensitive. To ingest all metadata from a Tableau site in the specified domain, specify the site name and a separate domain ID for each site that you list on the `siteIds` property in the lineage harvester configuration file for Tableau. If the `site_name` or `domain_id` property is not specified for a site, the metadata from the site is ingested in the default domain. How do I find a domain reference ID? Open the relevant domain in Collibra. The URL looks like: https://<yourcollibrainstance>/domain/22258f64-40b6-4b16-9c08-c95f8ec0da26?view=00000000-0000-0000-0000-000000040001. In this example, the reference ID is in bold. Show me the example { "filters":{ "sites":{ "Training":"ca60b822-781b-4b3a-b44d-f65bd107ff92" }, "projects":{ "Testing > Databricks":"e8f4e4a8-4062-4a33-9b44-3ce3e18e4e22", "Product Demo > Customer Insights":"a305e6f7-7a49-49aa-aa85-41b1e689121b" } } }	No
projects	The Tableau projects to be ingested and the domain in which you want to ingest metadata from the Tableau projects or sub-projects. Tip Project filtering is not relevant for those who have an Explorer role in Tableau, because Explorers need to configure permissions for each data object in Tableau that they want to ingest. As the Administrator role has access to all data objects, project filtering allows Administrators to specify which projects to ingest.	No
site_name > project_name : domain_id	The `site_name` should be the Tableau site name. The `project_name` should be the Tableau project name. The `domain_id` should be the unique reference ID of the domain in Collibra in which you want to ingest metadata. When you specify the site and project names, the following rules apply: Add spaces before and after >. The spaces are separators between the site and project. Specify the full exact site and project names. The values are case-sensitive. When you specify a Tableau project, all assets in the project are ingested in the specified domain. If you want to ingest assets from different Tableau projects in one domain, you can specify the same value for `domain id` for different projects. Example `"Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`	No
site_name > project_name > sub-project_name : domain_id	The `site_name` should be the Tableau site name. The `project_name` should be the Tableau project name. Optionally, use `sub-project_name` to specify the Tableau sub-project name. The `domain_id` property should be the unique reference ID of the domain in Collibra in which you want to ingest metadata. When you specify the site, project and sub-project names, the following rules apply: Add spaces before and after >. The spaces are separators between the site and project. Specify the full exact site and project names. The values are case-sensitive. Example `"Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`	No
domainMapping	This section defines in which domains in Collibra you want to ingest assets from your Tableau sites and Tableau projects. Domain mapping is transitive, meaning that all resources, such as Tableau workbooks and data attributes in a parent Tableau site, project or sub-project, are ingested in the same domain as the parent. Important The `domainMapping` and `filters` sections are mutually exclusive. Do not include both `domainMapping` and `filters` sections in your JSON file. Tip If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use this `domainMapping` section. If you want to ingest all of the projects in a Tableau site into the default domain, use only the `domainID` property in the lineage harvester configuration file. The `domainID` property represents the default domain. Note Tableau assets that are not mapped to specific domains via this `domainMapping` section, for example Tableau Server assets, are ingested in that default domain. If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering. If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering. Show me an example Let's say that you have a Tableau site named "Site-1". You want to ingest all Tableau projects in "Site-1" in a domain named "Domain-1" in Collibra, with the exception of one Tableau project named "Project-Default", which you want to ingest in "Domain-2". You should configure the `domainMapping` section as follows. "domainMapping": { "<Site-1>": "reference-id-of-Domain-1", "<Site-1> > <Project-Default>": "reference-id-of-Domain-2" } If you want to specify a domain for a sub-project of "Project-Default", use the `<site name> > <project name> > <sub-project name>` property, as described below. Tip For the properties in this `domainMapping` section, ensure that you maintain the spaces before and after "`>`", for example `"Site-1 > Project-Default"`. The spaces serve as a separator between the site and the projects.	No
site name	The Tableau site name, followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau site. Important In the configuration file, use the actual site name, along with the domain reference ID, for example: `"Collibra_tab_partner_site": "afc8cfb0-91f1-4075-a3e5-7ce6d1f9bcc9"`	No
site name > project name	The Tableau project name, preceded by the name of the Tableau site to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau project. Important In the configuration file, use the actual site and project names, along with the domain reference ID, for example: `"Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`	No
site name > project name > sub-project name	The Tableau sub-project name, preceded by the name of the Tableau site and project to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau sub-project. Important In the configuration file, use the actual site, project and sub-project names, along with the domain reference ID, for example: `"Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`	No

Save the <source ID> configuration file.