Prepare a <source ID> configuration file

Depending on your data source, you might have to, or want to, prepare a <source ID> configuration file. Select your data source below for data source-specific information.

The lineage harvester uses a lineage harvester configuration file to collect the Azure Data Factory data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.

For each database in Azure Data Factory, add the following content to the JSON file:

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name> | found_dbname=<datafactory_name>_<linkedservice_name>;found_hostname=*

The information of the supported data sources in Azure Data Factory to be collected by Collibra Data Lineage. You can specify any of the following values for the found_dbname property:

A database name. And then you can specify the following properties:
- found_hostname=<server name>, where <server name> is the name of the server that the database is running on.
- found_schema=<schema name>, where <schema name> is the name of the schema. This property is optional.

The combination of <datafactory_name>_<linkedservice_name>, where <datafactory_name> is a data factory name and <linkedservice_name> is a linked service name. If you use this combination, specify * for the found_hostname property.

Tip

You can use wildcards to capture multiple connection string combinations:

Yes

dbname

The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer. Specify this property with the database name that you created when you registered the data source.

schema

The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

If the Collibra Data Lineage fails to find the schema that you specify, it uses the default schema.

dialect

If you specify a database name for the found_dbname property, select one of the following dialects. If you specify a linked service name for the found_dbname property, ignore this property.

collibraSystemName

The system or server name of a database.

If you don't specify a value for this property, "DEFAULT" is shown in the technical lineage.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Azure Data Factory, you are not required to:

Set the useCollibraSystemName property in the lineage harvester configuration file to true.
Specify a Collibra system name in the <source ID> configuration file.

However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, you must specify a Collibra system name in the <source ID> configuration file.

Important If you use the Source Configuration field for the purpose of providing the true system name of an ODBC database in Azure Data Factory, you are not required to:

Set the value of the Collibra system name setting to True.
Specify a Collibra system name in the Source Configuration field.

However, if the value of the Collibra system name setting is set to true, you must specify a Collibra system name in the Source Configuration field.

Yes

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the DataStage data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
For each database in DataStage, add the required content to the JSON file.
Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the dbt data objects. It then sends the metadata to Collibra Data Lineage service for processing. By default, the lineage harvester downloads all accounts that are accessible with the API token that you provided in the lineage harvester configuration file. For each account, the lineage harvester downloads all jobs and the resulting dbt models for each job. You can use this <source ID> configuration file to reduce the amount of data objects to be downloaded and enhance the lineage harvester performance in the following ways:

Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
Map a materialization as a view instead of a table by specifying the materializedMapping property.

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-dbt, the name of your JSON file must be my-dbt.conf.

For each database in dbt, add the following content to the JSON file:

Property	Description	Required?
collibraSystemNames	You can use this section to specify the Collibra System Name for each project.	No
projects	This section contains the project names and the Collibra system names.	No
project_id	Your project ID. You can find the project ID in the dbt URL right after `projects`. For example, if your dbt URL is `https://cloud.getdbt.com/develop/54321/projects/12345` , your project_id is `12345`.	No
collibraSystemName	The system or server name of the data source. This is also the name of your System asset in Data Catalog. If you specify this property in the <source ID> configuration file, the `collibraSystemName` property in the lineage harvester configuration file is ignored. Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog. See an example. In this code example, the project with the `12345` project ID is stitched to the `systemname1` System asset in Data Catalog. { "collibraSystemNames":{ "projects":[ {"project_id":"12345","collibraSystemName":"systemname1"} ] }, }	No
filter	You can use this section to include projects and jobs to be downloaded. Collibra Data Lineage downloads and processes only the specified jobs and projects. See an example. In this code example, the job with the 1234 job ID and the projects with the 98 and 5678 project IDs are downloaded. { "filter": { "jobIds": [ 1234 ], "projectIds": [ 98, 5678 ] } }	No
jobIds	The job IDs of the jobs that you want to include. Specify an integer. Do not specify a string. To get your job ID, in your dbt, select Deploy and then Jobs. Select a job and you can find your job ID in the URL. For example, if your URL is `cloud.getdbt.com/deploy/65432/projects/23456/jobs/123456`, `123456` is your job ID.	No
projectIds	The account IDs of the accounts that you want to include. Specify an integer. Do not specify a string. To get your account ID, in dbt, click the gear icon in the upper right, select Account Settings and find your account ID in the URL. For example, if your URL is `cloud.getdbt.com/settings/accounts/65432`, `65432` is your account ID.	No
materializedMapping	Indicates how materializations in dbt are mapped. If you do not specify this property, CollibraData Lineage maps materializations to tables by default. You can change the mapping of a materialization to view. In the following example, the ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES materialization is mapped to a view. "materializedMapping":{ "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW" }	No

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Informatica PowerCenter data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
For each database, add the required content to the JSON file.
If certain properties are not specified in the source ID file, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.
Save the <source ID> configuration file.

You use the lineage harvester configuration file to access Informatica Intelligent Cloud Services Data Integration data objects. The lineage harvester processes the data objects to create a technical lineage. You also have to prepare a specific <source ID> configuration file that defines the Intelligent Cloud Services system name.

Important You must prepare a <source ID> configuration file regardless of whether the useCollibraSystemName property in your lineage harvester configuration files is set to true or false.

Prerequisites

You have Admin permission on all objects that you want to harvest.

Example

Steps

Create a new JSON configuration file in the lineage harvesterconfig folder.
If you have a data source with a large size for an Informatica Intelligent Cloud Services connection, consider creating more than one JSON file for the data source. Each JSON file must have a unique name. The contents in the JSON files are the same. In this way, you can avoid errors that might occur when the lineage harvester ingests metadata from one source with a large size.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example If the value of the Id property in your lineage harvester configuration file is iics-source-1, then the name of your JSON file should be iics-source-1.conf.

Important Your JSON file must have the file extension .conf.

For each Informatica Intelligent Cloud Services connection, you can add the following content to the JSON file:

Property	Description	Required?
collibraSystemNames	This section contains the system information for Informatica Intelligent Cloud Services.
connections	This section contains the system connection information. This is required to reference to the system or server of the connection.
connectionName	The name of the connection. The name must match the System asset name in Data Catalog for stitching.	Yes
collibraSystemName	The system or server name of the connection.	Yes
connectionDefinitions	This section contains the database, schema and dialect information for each connection in Informatica Intelligent Cloud Services. Note You can add connection information for each connection in the `connections` section.
connectionName	The name of the connection. The name must match with the name in a connection name in the `connections` section. This property is required.	Yes
databaseName	The name of your database. The name must match the Database asset name in Data Catalog for stitching.	Yes
schemaName	The name of your schema. The name must match the Schema asset name in Data Catalog for stitching.	Yes
dialect	The dialect of the connection. Specify this property for Collibra Data Lineage to properly extract and parse queries that are related to this connection. You can enter one of the following values: `bigquery` `db2` `hana` `hive` `greenplum` `mssql` `mysql` `netezza` `oracle` `postgres` `redshift` `snowflake` `spark` `teradata`	No

Save the configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the Looker data objects and send them to the Collibra Data Lineage service instance.

The <source ID> configuration file allows you to:

Filter on the Looker folders from which you want to ingest metadata.
If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example The value of the Id property in the lineage harvester configuration file is looker-source-1. As a result, the name of your JSON file should be looker-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in Looker, add the following content to the JSON file:

Property

Description

Mandatory?

Connections

This section contains all Looker connections for which you want to create a technical lineage.

Yes

The name of a connection object in Looker.

Yes

dialect

The dialect of the supported data source in Looker.

schema

The name of the default schema of a supported data source in Looker.

If the lineage harvester fails to find a specific schema, it uses the default schema.

dbname

The name of the database of a supported data source in Looker.

collibraSystemName

The system or server name of a database.

If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

Yes

filters

Optionally, use this section to specify the Looker folders from which you want to ingest metadata.

Note You can filter on Looker folders, but not on Looker data sets. That's because Looker data sets are linked directly to the server, instead of a folder, as shown in the Looker metadata overview. Looker data sets are ingested in the default domain, regardless of any filtering.

Let’s say, for example, you filter on folder B. A Looker Folder asset is created in the specified domain in Collibra, and all of the metadata in folder B is ingested. If folder B has a parent folder A, then a Looker Folder asset is created (in the domain specified for folder B) to preserve the hierarchy, but no metadata from folder A is ingested.

You can specify more than one Looker folder for ingestion into a single domain in Collibra.

Warning If you don't want to filter on Looker Folders, you must completely remove this filters section.

Tip

You can use wildcards to capture multiple connection string combinations:

domainId

The unique resource ID of the domain (or domains), in Collibra, in which you want to ingest data objects from one or more Looker Folders.

Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

description

Any description, as you see fit.

folderNames

The name (or names) of the Looker Folders from which you want to ingest.

Note You must specify either a folder name, a folder ID, or both.

folderIds

The ID (or IDs) of the Looker Folder you want to ingest.

Note You must specify either a folder ID, a folder name, or both.

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Matillion data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
Add the required content to the JSON file.
Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to MicroStrategy. You must also prepare a MicroStrategy <source ID> configuration file to:

Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
Note Mapping doesn't work for custom SQL.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
Example If the value of the Id property in the lineage harvester configuration file is mstr-source-1, then the name of your JSON file should be mstr-source-1.conf.
Important Your JSON file must have the file extension .conf.

Property	Description	Mandatory
default_domain_id	The domain in which you want the corresponding assets of MicroStrategy metadata to be ingested. Note If you configure filtering, only the MicroStrategy Server asset is ingested into this default domain.	Yes
filters	This section allows you to specify: From which MicroStrategy projects you want to harvest metadata. Into which domains in Collibra you want to ingest the corresponding assets. If you don't want to filter on projects, don't include this section in your <source ID> configuration file.	No
domainId	The unique resource ID of the domain (or domains) in Collibra in which you want to ingest the MicroStrategy assets. Tip If you use a `filters` section, you must include the `domainId` property in the section. If, by chance, you want to filter on certain projects, but you want to ingest all assets into the default domain, then the value of the `domainId` property must match the value of the `default_domain_id` property. Show me an example "default_domain_id": "1234567890", "filters": [ { "domainId": "1234567890", "projectNames": ["MicroStrategy Tutorial","Testing_MSTR"] }, How do I find a domain reference ID? Open the relevant domain in Collibra. The URL looks like: https://<yourcollibrainstance>/domain/22258f64-40b6-4b16-9c08-c95f8ec0da26?view=00000000-0000-0000-0000-000000040001. In this example, the reference ID is in bold.	No
projectIds	The IDs of the MicroStrategy projects from which you want to ingest metadata.	No
projectNames	The project names of the MicroStrategy projects from which you want to ingest metadata.	No
datasourceMapping	This optional section allows you to configure data source mapping. Include this section only if you need to differentiate between multiple data sources that have the same name. Note Mapping doesn't work for custom SQL.	No
found_datasource	The name of the data source that was returned by the lineage harvester, as shown in the technical lineage. Note The data source name is case-sensitive.	Yes
found_project	The name of the project in which the data source information resides. You can specify an asterisk (*) to search for data source information across all projects.	Yes
mapping	Use this section to map the data source name that was returned by the lineage harvester to the true name of the data source. Example You have a Redshift data source named "RD_pearl", but the lineage harvester has returned the name "Redshift_connection". You can configure the `datasourceMapping` section as follows: { "datasourceMapping": [ { "found_datasource": "REDSHIFT", "found_project": "*", "mapping": { "dbname": "RD_pearl", "collibraSystemName": "TV_dev" } } ] }	Yes
dbname	The name of the database to which you want to map the found data source.	Yes
schema	The name of the schema in MicroStrategy.	No
dialect	The dialect of the data source in MicroStrategy.	No
collibraSystemName	The system or server name of a database. If you set the `useCollibraSystemName` property to `true` in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the `collibraSystemName` property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT". If you set the `useCollibraSystemName` property to `false` in your lineage harvester configuration file, leave this property empty as follows: `"collibraSystemName": ""`. How do I configure this property if I have two databases with the same name? Let's assume that you have a data source named Customers. You use this data source connection in two different projects, Project_A and Project_B, but they are actually two different databases. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-North and Customers-South. You can then configure this property as follows. "datasourceMapping": [ { "found_datasource": "Customers", "found_project": "Project_A", "mapping": { "dbname": "Customers", "collibraSystemName": "Customers_North" } }, { "found_datasource": "Customers", "found_project": "Project_B", "mapping": { "dbname": "Customers", "collibraSystemName": "Customers_South" } } ] Warning The values of this property must exactly match the name of your System asset in Collibra.	Yes

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Power BI data objects. It then sends the metadata to the Collibra Data Lineage service instances.

The <source ID> configuration file allows you to:

Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
Note Mapping doesn't work for custom SQL.
Configure workspace filtering.
Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.

Example

Steps

Tip Watch a video on how to do this:

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the sourceId property in the lineage harvester configuration file.
Example The value of the sourceId property in the lineage harvester configuration file is power-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.
Important Your JSON file must have the file extension .conf.
For each database in Power BI, add the following content to the JSON file:

Property

Description

Mandatory?

found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

The database information of supported data sources in Power BI that is typically collected by the lineage harvester. Specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema). You then use the child properties to map the names collected by the lineage harvester to the true names.

Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

Important The keys that you specify must be unique.

Tip

You can use wildcards to capture multiple connection string combinations:

Show me the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

Yes

dbname

The name of the database of a supported data source in Power BI.

schema

The name of the default schema of a supported data source in Power BI.

If the lineage harvester fails to find a specific schema, it uses the default schema you specify in this property.

dialect

The dialect of the supported data source in Power BI.

collibraSystemName

The system or server name of a database.

Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:

Set the useCollibraSystemName property in the lineage harvester configuration file to true.
Specify a Collibra system name in the <source ID> configuration file.

However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.

Yes (unless you are using the <source ID> file to provide the true system names of ODBC databases in Power BI.)

filters

This section allows you to specify the Power BI workspaces from which you want to ingest metadata.

The filters work as "workspace AND workspace AND capacity AND capacity", meaning that if you specify a capacity, all of the workspaces in that capacity are also ingested.

Warning If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.

Tip

You can use wildcards to capture multiple connection string combinations:

Show me the supported wildcards

Pattern	Description
*	Matches everything.
?	Matches any single character.
[seq]	Matches any character in "seq".
[!seq]	Matches any character not in "seq".

domainId

The unique resource ID of the domain (or domains), in Collibra Data Intelligence Cloud, in which you want to ingest the Power BI assets.

Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

Yes

description

Any description, as you see fit.

Yes

workspaceNames

The names of Power BI workspaces from which you want to ingest metadata.

Important Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name "Sale and Marketing [automobiles]" should be formatted as follows:
Sale and Marketing [[]automobiles[]]

workspaceIds

The IDs of Power BI workspaces from which you want to ingest metadata.

Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.

capacityNames

The names of capacities on which you want to filter.

capacityIds

The IDs of capacities on which you want to filter.

Warning Any letters in a capacity ID must be in upper case.

excludeWorkspaceNames

The names of Power BI workspaces that you want to exclude from the ingestion job.

This is useful if you want to exclude, for example, dedicated development and testing workspaces.

Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.

For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

excludeWorkspaceIds

The IDs of Power BI workspaces that you want to exclude from the ingestion job.

This is useful if you want to exclude, for example, dedicated development and testing workspaces.

For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

Save the <source ID> configuration file.

When you create technical lineage for Snowflake by using the SQL-API ingestion method, you can create a <source ID> configuration file to configure the metadata that Collibra Data Lineage collects

Example

Steps

Create a new JSON file in the lineage harvester config folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.

For each database in Snowflake, add the following content to the JSON file:

Property	Description	Required?
displaySampleQueries	Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display `WHERE amount < 100` or `WHERE amount < ?`. Specify one of the following values: `true` Actual values from queries are displayed. `false` A question mark (?) is displayed. This is the default value.	No
analyzeTemporaryTables	Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values: `true` Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met: The query starts with the CREATE TEMPORARY TABLE statement. Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query. `false` Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.	No

Property

Description

Required?

displaySampleQueries

Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display WHERE amount < 100 or WHERE amount < ?.

Specify one of the following values:

true: Actual values from queries are displayed.
false: A question mark (?) is displayed. This is the default value.

analyzeTemporaryTables

Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values:

true

Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met:

The query starts with the CREATE TEMPORARY TABLE statement.
Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query.

false

Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.

Save the <source ID> configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) data objects and send them to the Collibra Data Lineage service.

The <source ID> configuration file allows you to:

If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
Example The value of the Id property in the lineage harvester configuration file is ssrs-source-1. As a result, the name of your JSON file should be ssrs-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in SSRS and PBRS, add the following content to the JSON file:

Property	Description	Required?
DataSources	This section contains all connections for which you want to create a technical lineage. The `DataSources` section refers to shared data sources in SSRS and PBRS. For more information about shared data sources, see the Microsoft documentation.	Yes
<data source type>	The path of a connection object in SSRS and PBRS.	Yes
dbname	The name of the database of a supported data source in SSRS and PBRS.	No
schema	The name of the default schema of a supported data source in SSRS and PBRS.	No
dialect	The dialect of the supported data source in SSRS and PBRS.	No
collibraSystemName	The system or server name of the database. If you set the `useCollibraSystemName` property to `true` in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the `collibraSystemName` property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT". How do I configure this property if I have two databases with the same name? Let's assume you have two databases named Customers. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-Europe and Customers-USA. You can then configure this property as follows. "Redshift": { "dbname": "Customer", "schema": "redshift-schema-name", "dialect": "redshift", "collibraSystemName": "Customers-Europe" }, "Oracle": { "dbname": "Customer", "schema": "oracle-schema-name", "dialect": "oracle", "collibraSystemName": "Customers-USA" }	Yes
CustomDataSources	You can use custom data processing extensions that are used to support embedded data sources of which the data source definition is specified locally in a report or embedded data set. The `CustomDataSources` section refers to embedded data sources in SSRS and PBRS. For more information about embedded data sources, see the Microsoft documentation.	No
<path to report>/<custom data source name>	The full path to the report and the custom data source name. You can use wildcards to match multiple folders, reports or data sets. The connection information is this section is used to add missing information or to overwrite parsed information.	No
dbname	The name of the database of a custom data source in SSRS and PBRS..	No
schema	The name of the schema of a custom data source in power. If you don't provide the schema name, the default schema is used.	No
dialect	The dialect of the custom data source in SSRS and PBRS.. Click for possible values: azure, for an Azure SQL Server data source. bigquery, for a Google BigQuery data source. db2, for an IBM DB2 data source. hana, for a SAP Hana data source. hive, for a HiveQL data source. greenplum, for a Greenplum data source. mssql, for a Microsoft SQL Server data source. mysql, for a MySQL data source. netezza, for a Netezza data source. oracle, for an Oracle data source. postgres, for a PostgreSQL data source. redshift, for an Amazon Redshift data source. snowflake, for a Snowflake data source. spark, for a Spark SQL data source. sybase, for a Sybase data source. teradata, for a Teradata data source.	No

Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the SQL Server Integration Services data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example

Steps

Create a new JSON file in the lineage harvesterconfig folder.
Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
For each database, add the required content to the JSON file.
Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to Tableau. You are not required to create a <source ID> configuration file, but you need one if you want to:

Define your Tableau operating model.
Provide additional information about databases and files in Tableau. For example, you can define the system name of files and connectors in Tableau.
Use the hostnameMapping property to map the database, schema or system names that were returned by the Tableau APIs to the actual names of the assets in Data Catalog. For complete information, go to Tableau hostname, schema, and system name mapping.
Note Mapping doesn't work for custom SQL.
Define in which domains in Collibra you want to ingest assets from your Tableau sites and projects. See the domainMapping and filters properties.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example

Steps

Tip Watch a video on how to do this:

Create a new JSON file in the lineage harvesterconfig folder.
Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
Example If the value of the Id property in the lineage harvester configuration file is tableau-source-1, then the name of your JSON file should be tableau-source-1.conf.
Important Your JSON file must have the file extension .conf.

For each database in Tableau, add the following content to the JSON file:

Tip You can use wildcards to capture multiple string combinations for any of these properties.

Property	Description
collibraSystemNames	This section contains the system information for different Tableau data sources. Depending on the kind of data source or connection, you have to specify how to connect to this data source. Tip For more information, see the Tableau documentation. We also recommend to check the list of supported connectors in Tableau.
hostnameMapping	This section allows you to map Tableau technical database, server and schema names to the respective real names, to preserve stitching. Warning `hostnameMapping` replaces the following deprecated properties, which have been removed from this topic: The `databaseMapping` property. The `databases` sub-section of the `collibraSystemNames` section. `hostnameMapping` must not be used in combination with either of these properties. If you use the `hostnameMapping` section, you can still use the `collibraSystemName` property in conjunction with the `files`, `connectors` or `cloudfiles` sub-sections. Example configuration "hostnameMapping": { "found_dbname=databasename1;found_hostname=*;found_schema=test": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql", "collibraSystemName": "mssql-system-name" } }	No
found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>	The database information of supported data sources in Tableau that is typically collected by the lineage harvester. It allows you to specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema).	No
dbname	The name of the database of a supported data source in Tableau.	No
schema	The name of the default schema of a supported data source in Tableau. If the lineage harvester fails to find a specific schema, it uses the default schema.	No
dialect	The dialect of the supported data source in Tableau. You don't have to specify a dialect; it will automatically be detected. If, however, you are using a dialect that is not supported, you can use this property to specify a supported dialect that is a close comparison. That way, most of your queries will be detected and processed. Show me a list of dialects of supported data sources in Tableau. redshift, for an Amazon Redshift data source. azure, for an Azure SQL Server data source. bigquery, for a Google BigQuery data source. greenplum, for a Greenplum data source. hive, for a HiveQL data source. oracle, for an Oracle data source. postgres, for a PostgreSQL data source. mssql, for a Microsoft SQL Server data source. mysql, for a MySQL data source. netezza, for a Netezza data source. hana, for a SAP HANA data source. spark, for a Spark SQL data source. sybase, for a Sybase data source. teradata, for a Teradata data source.	No
collibraSystemName	The system or server name of the database. Warning The value of this property must exactly match the name of your System asset in Collibra.	No
files	This section contains connection information to one or more files in Tableau. Tip If you do not have files in Tableau, you can remove this section.
filePath	The full path to the file. For example, the path to a JSON file.
collibraSystemName	The system name of the file.
connectors	This section contains connection information to one or more connectors in Tableau. Tip If you do not have connectors in Tableau, you can remove this section. The values that you specify for this property are not case-sensitive.
connectorUrl	The URL of the connector. For example, the URL to Google Analytics.
collibraSystemName	The system name of the connector.
cloudFiles	This section contains connection information to one or more cloud files in Tableau's input data. Tip If you do not have cloud files in Tableau, you can remove this section.
name	The name of the file. For example, the name of a Zendesk file.
collibraSystemName	The system name of the cloud file.
filters	This section defines: From which Tableau sites and projects you want to harvest metadata. Into which domains in Collibra you want to ingest the corresponding assets. Filtering is transitive, which means that all resources in a specified project, such as Tableau workbooks and all sub-projects, are ingested. Tableau assets that are not mapped to the specified domains, for example the Tableau Server assets and the parent projects (if you specify their sub-projects), are ingested in the default domain. Important Filtering does not affect the amount of raw metadata that is harvested from Tableau and sent to the Collibra Data Lineage service instance. Rather, it determines which metadata is ingested as assets in Data Catalog. The `domainMapping` and `filters` sections are mutually exclusive. Do not include both `domainMapping` and `filters` sections in your JSON file. Tip If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use the `domainMapping` section. If you want to ingest all of the projects in a Tableau site into the default domain, use only the `domainID` property in the lineage harvester configuration file. The `domainID` property represents the default domain. If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering. If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering. You can use site filtering and project filtering together: If filtering on the same site, this "filtering" is actually domain mapping, because nothing is filtered out. The contents of the projects are ingested in the specified domains, and the rest of the contents of the site are ingested in a different, specified domain. If you are site filtering on a specific site and project filtering a different site, then site filtering is again a form of domain mapping, and the filtered projects are ingested in their specified domains. If your lineage harvester configuration file includes sites that are not mentioned in the `filters` section of your <source ID> configuration file, those sites are ingested in the default domain.
sites	The Tableau sites to be ingested and the domain in which you want to ingest metadata from the Tableau sites. Tip If you have only one Tableau site, do not include a `sites` section in your <source ID> file. Instead, use a `projects` section, to filter on Tableau projects. Include a `sites` section only if all of the following are true: You have more than one Tableau site. You want to ingest all of the metadata from only one Tableau site into a single domain in Collibra. The domain into which you want to ingest is not the default domain, meaning the domain specified in the `domainId` property in your lineage harvester configuration file.
site_name: domain_id	`site_name` The name of the site to be ingested. The site name is case-sensitive. `domain_id` The unique reference ID of the domain in Collibra in which you want to ingest metadata. The domain ID is case-sensitive. To ingest all metadata from a Tableau site in the specified domain, specify the site name and a separate domain ID for each site that you list on the `siteIds` property in the lineage harvester configuration file for Tableau. If the `site_name` or `domain_id` property is not specified for a site, the metadata from the site is ingested in the default domain. How do I find a domain reference ID? Open the relevant domain in Collibra. The URL looks like: https://<yourcollibrainstance>/domain/22258f64-40b6-4b16-9c08-c95f8ec0da26?view=00000000-0000-0000-0000-000000040001. In this example, the reference ID is in bold. Show me the example { "filters":{ "sites":{ "Training":"ca60b822-781b-4b3a-b44d-f65bd107ff92" }, "projects":{ "Testing > Databricks":"e8f4e4a8-4062-4a33-9b44-3ce3e18e4e22", "Product Demo > Customer Insights":"a305e6f7-7a49-49aa-aa85-41b1e689121b" } } }
projects	The Tableau projects to be ingested and the domain in which you want to ingest metadata from the Tableau projects or sub-projects. Tip Project filtering is not relevant for those who have an Explorer role in Tableau, because Explorers need to configure permissions for each data object in Tableau that they want to ingest. As the Administrator role has access to all data objects, project filtering allows Administrators to specify which projects to ingest.
site_name > project_name : domain_id	The `site_name` should be the Tableau site name. The `project_name` should be the Tableau project name. The `domain_id` should be the unique reference ID of the domain in Collibra in which you want to ingest metadata. When you specify the site and project names, the following rules apply: Add spaces before and after >. The spaces are separators between the site and project. Specify the full exact site and project names. The values are case-sensitive. When you specify a Tableau project, all assets in the project are ingested in the specified domain. If you want to ingest assets from different Tableau projects in one domain, you can specify the same value for `domain id` for different projects. Example `"Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`
site_name > project_name > sub-project_name : domain_id	The `site_name` should be the Tableau site name. The `project_name` should be the Tableau project name. Optionally, use `sub-project_name` to specify the Tableau sub-project name. The `domain_id` property should be the unique reference ID of the domain in Collibra in which you want to ingest metadata. When you specify the site, project and sub-project names, the following rules apply: Add spaces before and after >. The spaces are separators between the site and project. Specify the full exact site and project names. The values are case-sensitive. Example `"Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`
domainMapping	This section defines in which domains in Collibra you want to ingest assets from your Tableau sites and Tableau projects. Domain mapping is transitive, meaning that all resources, such as Tableau workbooks and data attributes in a parent Tableau site, project or sub-project, are ingested in the same domain as the parent. Important The `domainMapping` and `filters` sections are mutually exclusive. Do not include both `domainMapping` and `filters` sections in your JSON file. Tip If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use this `domainMapping` section. If you want to ingest all of the projects in a Tableau site into the default domain, use only the `domainID` property in the lineage harvester configuration file. The `domainID` property represents the default domain. Note Tableau assets that are not mapped to specific domains via this `domainMapping` section, for example Tableau Server assets, are ingested in that default domain. If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering. If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering. Show me an example Let's say that you have a Tableau site named "Site-1". You want to ingest all Tableau projects in "Site-1" in a domain named "Domain-1" in Collibra, with the exception of one Tableau project named "Project-Default", which you want to ingest in "Domain-2". You should configure the `domainMapping` section as follows. "domainMapping": { "<Site-1>": "reference-id-of-Domain-1", "<Site-1> > <Project-Default>": "reference-id-of-Domain-2" } If you want to specify a domain for a sub-project of "Project-Default", use the `<site name> > <project name> > <sub-project name>` property, as described below. Tip For the properties in this `domainMapping` section, ensure that you maintain the spaces before and after "`>`", for example `"Site-1 > Project-Default"`. The spaces serve as a separator between the site and the projects.
site name	The Tableau site name, followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau site. Important In the configuration file, use the actual site name, along with the domain reference ID, for example: `"Collibra_tab_partner_site": "afc8cfb0-91f1-4075-a3e5-7ce6d1f9bcc9"`
site name > project name	The Tableau project name, preceded by the name of the Tableau site to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau project. Important In the configuration file, use the actual site and project names, along with the domain reference ID, for example: `"Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`
site name > project name > sub-project name	The Tableau sub-project name, preceded by the name of the Tableau site and project to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau sub-project. Important In the configuration file, use the actual site, project and sub-project names, along with the domain reference ID, for example: `"Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"`

Save the <source ID> configuration file.