Prepare a <source ID> configuration file

Depending on your data source, you might have to, or want to, prepare a <source ID> configuration file. Select your data source below for data source-specific information.

Tip 

Select a data source.

Currently, the information is shown for:

 

The lineage harvester uses a lineage harvester configuration file to collect the Azure Data Factory data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database in Azure Data Factory, add the following content to the JSON file:

    Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name> | found_dbname=<datafactory_name>_<linkedservice_name>;found_hostname=*

    The information of the supported data sources in Azure Data Factory to be collected by Collibra Data Lineage. You can specify any of the following values for the found_dbname property:

    • A database name. And then you can specify the following properties:
      • found_hostname=<server name>, where <server name> is the name of the server that the database is running on.
      • found_schema=<schema name>, where <schema name> is the name of the schema. This property is optional.
    • The combination of <datafactory_name>_<linkedservice_name>, where <datafactory_name> is a data factory name and <linkedservice_name> is a linked service name. If you use this combination, specify * for the found_hostname property.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname

    The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

    No

    schema

    The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

    If the Collibra Data Lineage fails to find the schema that you specify, it uses the default schema.

    No

    dialect

    If you specify a database name for the found_dbname property, select one of the following dialects. If you specify a linked service name for the found_dbname property, ignore this property.

    No

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    If you don't specify a value for this property, DEFAULT is shown in the technical lineage.

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    No

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the DataStage data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-datastage, the name of your JSON file must be my-datastage.conf.
  3. For each database in DataStage, add the required content to the JSON file.

    Property

    Description

    OdbcDataSources

    Open Database Connectivity data sources in IBM InfoSphere DataStage for which you want to create a technical lineage.

    <data-source-name>

    The ODBC data source name that you use in your DataStage projects.

    This section contains the properties to translate the database, schema and dialect.

    dbname
    The name of your database, to which the ODBC data source connection refers.
    schema

    The name of your schema, to which the ODBC data source connection refers.

    dialect

    The dialect of the referenced database.

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    This property is optional.

    NonOdbcConnectors

    Other data source connectors in IBM InfoSphere DataStage for which you want to create a technical lineage. For example, DB2, Oracle or Netezza.

    Note This section is optional.

    <data-source-connector-ID>

    The data source username and database of the connector that you use in your DataStage projects. This usually looks like for example admin@database-name. The combination of the username and database name should be unique.

    The following section contains the properties to translate the database, schema and dialect.

    dbname
    The name of your database, to which the data source connection refers.
    schema

    The name of your schema, to which the data source connection refers.

    dialect

    The dialect of the referenced database.

    collibraSystemName
    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    This property is optional.

    Jobs

    The jobs that you want the lineage harvester to collect and process to create the technical lineage.

    This section is optional. The following rules apply when you specify this section:

    • Specify jobs that are executed so that the technical lineage graph does not include any job parameters with undefined values.
    • Specify only the first and parent jobs in a sequence of executed jobs. The lineage harvester automatically collects all jobs that are called by the parent jobs.
      For example, if you have the a sequence of jobs that include job1, job2, job3, job4, and job5, where job1 calls job2, job2 calls job3, job3 calls job5, and job4 calls job3. Specify only job1 and job4, and the lineage harvester collects and processes all five jobs based on the sequence.

    If you do not specify this section, the lineage harvester collects all jobs, but without proper sequencing. Therefore, some inherited parameters might not be parsed.

    JobParameters
    The runtime parameters that are not in the DSX and ENV files. You can specify multiple job parameters.
    name
    The name of the job parameter.
    value
    The value of the job parameter.
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the dbt data objects. It then sends the metadata to Collibra Data Lineage service for processing. By default, the lineage harvester downloads all accounts that are accessible with the API token that you provided in the lineage harvester configuration file. For each account, the lineage harvester downloads all jobs and the resulting dbt models for each job. You can use this <source ID> configuration file to reduce the amount of data objects to be downloaded and enhance the lineage harvester performance in the following ways:

  • Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
  • Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
  • Map a materialization as a view instead of a table by specifying the materializedMapping property.
Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-dbt, the name of your JSON file must be my-dbt.conf.
  3. For each database in dbt, add the following content to the JSON file:

    Property

    Description

    Required?

    collibraSystemNames

    You can use this section to specify the Collibra System Name for each project.

    No

    projects

    This section contains the project names and the Collibra system names.

    No

    project_id

    Your project ID. You can find the project ID in the dbt URL right after projects. For example, if your dbt URL is https://cloud.getdbt.com/develop/54321/projects/12345 , your project_id is 12345.

    No

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    No
    filter

    You can use this section to include projects and jobs to be downloaded. Collibra Data Lineage downloads and processes only the specified jobs and projects.

    No

    jobIds

    The job IDs of the jobs that you want to include.

    Specify an integer. Do not specify a string.

    To get your job ID, in your dbt, select Deploy and then Jobs. Select a job and you can find your job ID in the URL. For example, if your URL is cloud.getdbt.com/deploy/65432/projects/23456/jobs/123456, 123456 is your job ID.

    No

    projectIds

    The account IDs of the accounts that you want to include.

    Specify an integer. Do not specify a string.

    To get your account ID, in dbt, click the gear icon in the upper right, select Account Settings and find your account ID in the URL. For example, if your URL is cloud.getdbt.com/settings/accounts/65432, 65432 is your account ID.

    No
    materializedMapping

    Indicates how materializations in dbt are mapped. If you do not specify this property, CollibraData Lineage maps materializations to tables by default. You can change the mapping of a materialization to view.

    In the following example, the ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES materialization is mapped to a view.

    	"materializedMapping":{
    	    "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW"
    	}
    No
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Informatica PowerCenter data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-infa, the name of your JSON file must be my-infa.conf.
  3. For each database, add the required content to the JSON file.

    If certain properties are not specified in the source ID file, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.

    Property

    Description

    connectionDefinitions

    This section contains the connection properties to a source in Informatica PowerCenter.

    <connectionName>

    The type of your source or target data source.

    This section contains the connection properties to a source or target in Informatica PowerCenter.

    Note Define a connection in the JSON file only once; specifically, define a data source with the <connectionName> property specified only once in the JSON file. If you define a connection multiple times, unexpected lineage and stitching issues might occur.
    dbname

    The name of your source or target database.

    When you specify the dbname and schema properties, Collibra Data Lineage can stitch the data objects to the assets in Data Catalog. If the properties are not specified, the data objects are not stitched.

    schema

    The name of your source or target schema.

    When you specify the dbname and schema properties, Collibra Data Lineage can stitch the data objects to the assets in Data Catalog. If the properties are not specified, the data objects are not stitched.

    dialect

    The dialect of the referenced database.

    If you specify a dialect for a database, the value overrides the dialect that you specify in the lineage harvester configuration file for this database.

    For any databases that do not have a dialect specified in the source ID file, the dialect that you specify in the lineage harvester configuration file is used as a global dialect.

    collibraSystemNames

    This section contains the system or server name that is specified in your database and referenced in your connection.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    The following rules apply when you specify the collibrasystemname properties in this file and the lineage harvester configuration file:

    • If you specify this property for a database or connection, the value of this property overrides the value in the lineage harvester configuration file for the database or connection.
    • For any databases or connections that do not have a Collibra system name specified in the source ID file, the value of the collibrasystemname property in the lineage harvester configuration file is used as a global value.
    databases

    This section contains the database information. This is required to connect directly to the system or server of the database.

    dbname
    The name of the database. The database name is the same as the name you entered in the <connectionName> section.
    collibraSystemName

    The system or server name of the database.

    connections

    This section contains the connection information. This is required to reference to the system or server of the connection.

    connectionName

    The name of the connection.

    collibraSystemName

    The system or server name of the connection.

    Important If you are using variables in Informatica PowerCenter, add the value of the variable instead of the name in the connection definitionsJSON file. For example, if the parameter file contains $DBConnection_dwh=DWH_EXPORT, add the following connection definitions to the JSON file:
    {
    	"DWH_EXPORT":
    
    		{ "dbname": "DWH", "schema": "DBO" }
    }

    ©

  4. Save the <source ID> configuration file.

You use the lineage harvester configuration file to access Informatica Intelligent Cloud Services Data Integration data objects. The lineage harvester processes the data objects to create a technical lineage. You also have to prepare a specific <source ID> configuration file that defines the Intelligent Cloud Services system name.

Important You must prepare a <source ID> configuration file regardless of whether the useCollibraSystemName property in your lineage harvester configuration files is set to true or false.

Prerequisites

You have Admin permission on all objects that you want to harvest.

Example 

Steps

  1. Create a new JSON configuration file in the lineage harvesterconfig folder.

    If you have a data source with a large size for an Informatica Intelligent Cloud Services connection, consider creating more than one JSON file for the data source. Each JSON file must have a unique name. The contents in the JSON files are the same. In this way, you can avoid errors that might occur when the lineage harvester ingests metadata from one source with a large size.

  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example If the value of the Id property in your lineage harvester configuration file is iics-source-1, then the name of your JSON file should be iics-source-1.conf.
  3. Important Your JSON file must have the file extension .conf.
  4. For each Informatica Intelligent Cloud Services connection, you can add the following content to the JSON file:

    Property

    DescriptionRequired?

    collibraSystemNames

    This section contains the system information for Informatica Intelligent Cloud Services.

    connections

    This section contains the system connection information. This is required to reference to the system or server of the connection.

    connectionName

    The name of the connection. The name must match the System asset name in Data Catalog for stitching.

    Yes
    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    No

    connectionDefinitions

    This section contains the database, schema and dialect information for each connection in Informatica Intelligent Cloud Services.

    Note You can add connection information for each connection in the connections section.

    connectionName

    The name of the connection. The name must match with the name in a connection name in the connections section.

    This property is required.

    Yes
    databaseName

    The name of your database. The name must match the Database asset name in Data Catalog for stitching.

    Yes
    schemaName

    The name of your schema. The name must match the Schema asset name in Data Catalog for stitching.

    Yes
    dialect

    The dialect of the connection. Specify this property for Collibra Data Lineage to properly extract and parse queries that are related to this connection.

    You can enter one of the following values:

    • bigquery
    • db2
    • hana
    • hive
    • greenplum
    • mssql
    • mysql
    • netezza
    • oracle
    • postgres
    • redshift
    • snowflake
    • spark
    • teradata
    No
  5. Save the configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the Looker data objects and send them to the Collibra Data Lineage service instance.

The <source ID> configuration file allows you to:

  • Filter on the Looker folders from which you want to ingest metadata.
  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
    Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.
Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example The value of the Id property in the lineage harvester configuration file is looker-source-1. As a result, the name of your JSON file should be looker-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Looker, add the following content to the JSON file:

    Property

    Description

    Mandatory?

    Connections

    This section contains all Looker connections for which you want to create a technical lineage.

    Yes

    <connection name>

    The name of a connection object in Looker.

    Yes

    dialect

    The dialect of the supported data source in Looker.

    No

    schema

    The name of the default schema of a supported data source in Looker.

    If the lineage harvester fails to find a specific schema, it uses the default schema.

    No

    dbname
    The name of the database of a supported data source in Looker.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Yes

    filters

    Optionally, use this section to specify the Looker folders from which you want to ingest metadata.

    Note You can filter on Looker folders, but not on Looker data sets. That's because Looker data sets are linked directly to the server, instead of a folder, as shown in the Looker metadata overview. Looker data sets are ingested in the default domain, regardless of any filtering.

    Let’s say, for example, you filter on folder B. A Looker Folder asset is created in the specified domain in Collibra, and all of the metadata in folder B is ingested. If folder B has a parent folder A, then a Looker Folder asset is created (in the domain specified for folder B) to preserve the hierarchy, but no metadata from folder A is ingested.

    You can specify more than one Looker folder for ingestion into a single domain in Collibra.

    Warning If you don't want to filter on Looker Folders, you must completely remove this filters section.

    Tip There are significant benefits to filtering by folder ID. For information, see the filters > folderIds property description.

    Tip 

    You can use wildcards to capture multiple connection string combinations:

    No
    domainId

    The unique resource ID of the domain (or domains), in Collibra, in which you want to ingest data objects from one or more Looker Folders.

    Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

     
    description
    Any description, as you see fit. 
    folderNames

    The name (or names) of the Looker Folders from which you want to ingest.

    Note You must specify either a folder name, a folder ID, or both.

     
    folderIds

    The ID (or IDs) of the Looker Folder you want to ingest.

    Note You must specify either a folder ID, a folder name, or both.

    Tip If you filter by folder ID, filtering is carried out via the API, instead of on the Collibra Data Lineage service instances.

    When you filter by folder ID, the lineage harvester accesses only the folders you specify via this property, and sends only that metadata to the Collibra Data Lineage service instance for processing and ingestion in Data Catalog. Conversely, if you filter by folder name (via the folderNames property), metadata from all Looker folders is sent to the Collibra Data Lineage service instance. Only then is filtering applied.
     
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Matillion data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-matillion, the name of your JSON file must be my-matillion.conf.
  3. Add the required content to the JSON file.

    Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>

    The information of the supported data sources in Matillion to be collected by Collibra Data Lineage.

    <database name>
    The database name in Matillion.
    <server name>
    The name of the server that the database is running on. You can specify found_hostname=* to include all servers.
    Note Define a connection in the connection definitions only once; specifically, define a data source with the found_dbname and found_hostname properties specified only once in the connection definitions. If you define a connection multiple times, unexpected lineage and stitching issues might occur.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname

    The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer.

    If you leave this property blank, the database is stitched to the database of DEFAULT in Data Catalog.

    No

    schema

    The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

    If you leave this property blank, the schema is stitched to the schema of DEFAULT in Data Catalog.

    No

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    If you leave this property blank, the system is stitched to the system of DEFAULT in Data Catalog. If you are missing lineage or your lineage objects aren’t stitching to Catalog assets in Data Catalog as you expect, ensure this property is specified properly.

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    No

  4. Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to MicroStrategy. You must also prepare a MicroStrategy <source ID> configuration file to:

  • Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
    Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
  • Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
  • Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
    Note Mapping doesn't work for custom SQL.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
    Example If the value of the Id property in the lineage harvester configuration file is mstr-source-1, then the name of your JSON file should be mstr-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. Property

    Description

    Mandatory

    default_domain_id

    The domain in which you want the corresponding assets of MicroStrategy metadata to be ingested.

    Note If you configure filtering, only the MicroStrategy Server asset is ingested into this default domain.

    Yes

    filters

    This section allows you to specify:

    • From which MicroStrategy projects you want to harvest metadata.
    • Into which domains in Collibra you want to ingest the corresponding assets.

    If you don't want to filter on projects, don't include this section in your <source ID> configuration file.

    No

    domainId

    The unique resource ID of the domain (or domains) in Collibra in which you want to ingest the MicroStrategy assets.

    Tip If you use a filters section, you must include the domainId property in the section. If, by chance, you want to filter on certain projects, but you want to ingest all assets into the default domain, then the value of the domainId property must match the value of the default_domain_id property.

    No

    projectIds
    The IDs of the MicroStrategy projects from which you want to ingest metadata.

    No

    projectNames
    The project names of the MicroStrategy projects from which you want to ingest metadata.

    No

    datasourceMapping

    This optional section allows you to configure data source mapping. Include this section only if you need to differentiate between multiple data sources that have the same name.

    Note Mapping doesn't work for custom SQL.

    No

    found_datasource

    The name of the data source that was returned by the lineage harvester, as shown in the technical lineage.

    Note The data source name is case-sensitive.

    Yes

    found_project

    The name of the project in which the data source information resides. You can specify an asterisk (*) to search for data source information across all projects.

    Yes

    mapping

    Use this section to map the data source name that was returned by the lineage harvester to the true name of the data source.

    Example You have a Redshift data source named "RD_pearl", but the lineage harvester has returned the name "Redshift_connection". You can configure the datasourceMapping section as follows:
    {
        "datasourceMapping": [
    	 {
    	     "found_datasource": "REDSHIFT",
    	     "found_project": "*",
    	     "mapping": {
    		  "dbname": "RD_pearl",
    		  "collibraSystemName": "TV_dev"
    	     }
    	 }
        ]
    }

    Yes

    dbname

    The name of the database to which you want to map the found data source.

    Yes

    schema

    The name of the schema in MicroStrategy.

    No

    dialect

    The dialect of the data source in MicroStrategy.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    If you set the useCollibraSystemName property to false in your lineage harvester configuration file, leave this property empty as follows: "collibraSystemName": "".

    Warning The values of this property must exactly match the name of your System asset in Collibra.

    Yes

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Power BI data objects. It then sends the metadata to the Collibra Data Lineage service instances.

The <source ID> configuration file allows you to:

  • Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
    Note Mapping doesn't work for custom SQL.
  • Configure workspace filtering.
    Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.
Example 

Steps

Tip Watch a video on how to do this:
  1. Create a new JSON file in the lineage harvester config folder.
  2. Give the JSON file the same name as the value of the sourceId property in the lineage harvester configuration file.
    Example The value of the sourceId property in the lineage harvester configuration file is power-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Power BI, add the following content to the JSON file:
  4. Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

    The database information of supported data sources in Power BI that is typically collected by the lineage harvester. Specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema). You then use the child properties to map the names collected by the lineage harvester to the true names.

    Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

    Important The keys that you specify must be unique.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname
    The name of the database of a supported data source in Power BI.

    No

    schema

    The name of the schema of a supported data source in Power BI.

    If the lineage harvester fails to find a specific schema, it uses the schema you specify in this property.

    Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

    No

    dialect

    The dialect of the supported data source in Power BI.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:
    • Set the useCollibraSystemName property in the lineage harvester configuration file to true.
    • Specify a Collibra system name in the <source ID> configuration file.
    However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.

    Yes (unless you are using the <source ID> file to provide the true system names of ODBC databases in Power BI.)

    filters

    This section allows you to specify the Power BI workspaces from which you want to ingest metadata.

    The filters work as "workspace AND workspace AND capacity AND capacity", meaning that if you specify a capacity, all of the workspaces in that capacity are also ingested.

    Warning If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.

    Tip 

    You can use wildcards to capture multiple connection string combinations:

    No

    domainId

    The unique resource ID of the domain (or domains), in Collibra Data Intelligence Platform, in which you want to ingest the Power BI assets.

    Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

    Yes

    description

    Any description, as you see fit.

    Yes

    workspaceNames

    The names of Power BI workspaces from which you want to ingest metadata.

    Important Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name "Sale and Marketing [automobiles]" should be formatted as follows:
    Sale and Marketing [[]automobiles[]]

    No

    workspaceIds

    The IDs of Power BI workspaces from which you want to ingest metadata.

    Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.

    No
    capacityNames

    The names of capacities on which you want to filter.

    No
    capacityIds

    The IDs of capacities on which you want to filter.

    Warning Any letters in a capacity ID must be in upper case.

    No
    excludeWorkspaceNames

    The names of Power BI workspaces that you want to exclude from the ingestion job.

    This is useful if you want to exclude, for example, dedicated development and testing workspaces.

    Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.

    For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

    No
    excludeWorkspaceIds

    The IDs of Power BI workspaces that you want to exclude from the ingestion job.

    This is useful if you want to exclude, for example, dedicated development and testing workspaces.

    For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

    No
  5. Save the <source ID> configuration file.

When you create technical lineage for Snowflake by using the SQL-API ingestion method, you can create a <source ID> configuration file to configure the metadata that Collibra Data Lineage collects

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database in Snowflake, add the following content to the JSON file:

    Property

    Description

    Required?

    displaySampleQueries

    Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display WHERE amount < 100 or WHERE amount < ?.

    Specify one of the following values:

    true
    Actual values from queries are displayed.
    false
    A question mark (?) is displayed. This is the default value.
    No
    analyzeTemporaryTables

    Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values: 

    true
    Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met:
    • The query starts with the CREATE TEMPORARY TABLE statement.

    • Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query.

    false
    Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.
    No
  4. Save the <source ID> configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) data objects and send them to the Collibra Data Lineage service.

The <source ID> configuration file allows you to:

  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
  • Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.
Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example The value of the Id property in the lineage harvester configuration file is ssrs-source-1. As a result, the name of your JSON file should be ssrs-source-1.conf.

    Important Your JSON file must have the file extension .conf.

  3. For each database in SSRS and PBRS, add the following content to the JSON file:

    Property

    Description

    Required?

    DataSources

    This section contains all connections for which you want to create a technical lineage.

    The DataSources section refers to shared data sources in SSRS and PBRS. For more information about shared data sources, see the Microsoft documentation.

    Yes

    <data source type>

    The path of a connection object in SSRS and PBRS.

    Yes

    dbname
    The name of the database of a supported data source in SSRS and PBRS.

    No

    schema

    The name of the default schema of a supported data source in SSRS and PBRS.

    No

    dialect

    The dialect of the supported data source in SSRS and PBRS.

    No

    collibraSystemName

    The system or server name of the database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Yes

    CustomDataSources

    You can use custom data processing extensions that are used to support embedded data sources of which the data source definition is specified locally in a report or embedded data set.

    The CustomDataSources section refers to embedded data sources in SSRS and PBRS. For more information about embedded data sources, see the Microsoft documentation.

    No

    <path to report>/<custom data source name>

    The full path to the report and the custom data source name.

    You can use wildcards to match multiple folders, reports or data sets. The connection information is this section is used to add missing information or to overwrite parsed information.

    No

    dbname
    The name of the database of a custom data source in SSRS and PBRS..

    No

    schema

    The name of the schema of a custom data source in power. If you don't provide the schema name, the default schema is used.

    No

    dialect

    The dialect of the custom data source in SSRS and PBRS..

    No

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the SQL Server Integration Services data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-ssis, the name of your JSON file must be my-ssis.conf.
  3. For each database, add the required content to the JSON file.

    Property

    DescriptionRequired?

    DataSources

    The parent element that contains the connection definitions of your data sources in SQL Server Integration Services.

    If you specify the properties in this section and also the ConnStringRegExTranslation property for a data source, the connection definitions in the ConnStringRegExTranslation property takes precedence.

    No

    DataSourceName

    The name of your data source.

    No

    dialect

    The dialect of the referenced database.

    No

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    No

    ConnStringRegExTranslation

    The parent element that opens the connection definitions.

    If you specify this property and also the properties in the DataSources section for a data source, the connection definitions in this property takes precedence.

    No

    <regular expression>

    A regular expression that must match one or more connection strings.

    Note 

    Important considerations:

    • By default, the regular expression is not case sensitive. As a consequence, a regular expression can match with connection strings containing uppercase characters or lowercase characters.
    • The connection string is part of the SSIS connection manager.
    • SSIS connection managers are included in an SSIS package files (DTSX) or in connection manager files (CONMGR).
    Example 

    Regular expression: Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306.*
    Explanation: The first section, up to .*, is a literal, but not case-sensitive, match of the characters. The dot (.) can match any single character. The asterisk (*) means zero or more of the previous, in this case any character.
    Match: Any connection string that starts with Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306.
    Example: Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306;Persist Security Info=True;Auto Translate=False;.

    No

    dbname

    The name of your database, to which the data source connection refers.

    No

    schema

    The name of your schema, to which the regular expression refers.

    No

    dialect

    The dialect of the referenced database.

    No

    collibraSystemName

    The system or server name of the data source.

    Use this property with the useCollibraSystemName property in the lineage harvester configuration file to override the default Collibra System asset name for this data source.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.

    No

  4. Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to Tableau. You are not required to create a <source ID> configuration file, but you need one if you want to:

  • Define your Tableau operating model.
  • Provide additional information about databases and files in Tableau. For example, you can define the system name of files and connectors in Tableau.
  • Use the hostnameMapping property to map the database, schema or system names that were returned by the Tableau APIs to the actual names of the assets in Data Catalog. For complete information, go to Tableau hostname, schema, and system name mapping.
    Note Mapping doesn't work for custom SQL.
  • Define in which domains in Collibra you want to ingest assets from your Tableau sites and projects. See the domainMapping and filters properties.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example 

Steps

Tip Watch a video on how to do this:
  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
    Example If the value of the Id property in the lineage harvester configuration file is tableau-source-1, then the name of your JSON file should be tableau-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Tableau, add the following content to the JSON file:
    Tip You can use wildcards to capture multiple string combinations for any of these properties.

    Property

    Description
    collibraSystemNames

    This section contains the system information for different Tableau data sources. Depending on the kind of data source or connection, you have to specify how to connect to this data source.

    Tip For more information, see the Tableau documentation. We also recommend to check the list of supported connectors in Tableau.

    files

    This section contains connection information to one or more files in Tableau.

    Tip If you do not have files in Tableau, you can remove this section.

    filePath
    The full path to the file. For example, the path to a JSON file.
    collibraSystemName
    The system name of the file.
    connectors

    This section contains connection information to one or more connectors in Tableau.

    Tip 
    • If you do not have connectors in Tableau, you can remove this section.
    • The values that you specify for this property are not case-sensitive.
    connectorUrl
    The URL of the connector. For example, the URL to Google Analytics.
    collibraSystemName
    The system name of the connector.
    cloudFiles

    This section contains connection information to one or more cloud files in Tableau's input data.

    Tip If you do not have cloud files in Tableau, you can remove this section.

    name
    The name of the file. For example, the name of a Zendesk file.
    collibraSystemName
    The system name of the cloud file.
    hostnameMapping

    This section allows you to map Tableau technical database, server and schema names to the respective real names, to preserve stitching.

    Warning 
    • hostnameMapping replaces the following deprecated properties, which have been removed from this topic:
      • The databaseMapping property.
      • The databases sub-section of the collibraSystemNames section.

    hostnameMapping must not be used in combination with either of these properties.

    If you use the hostnameMapping section, you can still use the collibraSystemName property in conjunction with the files, connectors or cloudfiles sub-sections.

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

    The database information of supported data sources in Tableau that is typically collected by the lineage harvester. It allows you to specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema).

    dbname

    The name of the database of a supported data source in Tableau.

    schema

    The name of the default schema of a supported data source in Tableau.

    If the lineage harvester fails to find a specific schema, it uses the default schema.

    dialect

    The dialect of the supported data source in Tableau.

    You don't have to specify a dialect; it will automatically be detected. If, however, you are using a dialect that is not supported, you can use this property to specify a supported dialect that is a close comparison. That way, most of your queries will be detected and processed.

    filters

    This section defines:

    • From which Tableau sites and projects you want to harvest metadata.
    • Into which domains in Collibra you want to ingest the corresponding assets.

    Filtering is transitive, which means that all resources in a specified project, such as Tableau workbooks and all sub-projects, are ingested.

    Tableau assets that are not mapped to the specified domains, for example the Tableau Server assets and the parent projects (if you specify their sub-projects), are ingested in the default domain.

    Important 
    • Filtering does not affect the amount of raw metadata that is harvested from Tableau and sent to the Collibra Data Lineage service instance. Rather, it determines which metadata is ingested as assets in Data Catalog.
    • The domainMapping and filters sections are mutually exclusive. Do not include both domainMapping and filters sections in your JSON file.
    Tip 
    • If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use the domainMapping section.
    • If you want to ingest all of the projects in a Tableau site into the default domain, use only the domainID property in the lineage harvester configuration file. The domainID property represents the default domain.
    • If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering.
    • If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering.
    • You can use site filtering and project filtering together:
      • If filtering on the same site, this "filtering" is actually domain mapping, because nothing is filtered out. The contents of the projects are ingested in the specified domains, and the rest of the contents of the site are ingested in a different, specified domain.
      • If you are site filtering on a specific site and project filtering a different site, then site filtering is again a form of domain mapping, and the filtered projects are ingested in their specified domains.
      • If your lineage harvester configuration file includes sites that are not mentioned in the filters section of your <source ID> configuration file, those sites are ingested in the default domain.
    sites

    The Tableau sites to be ingested and the domain in which you want to ingest metadata from the Tableau sites.

    Tip If you have only one Tableau site, do not include a sites section in your <source ID> file. Instead, use a projects section, to filter on Tableau projects. Include a sites section only if all of the following are true:
    • You have more than one Tableau site.
    • You want to ingest all of the metadata from only one Tableau site into a single domain in Collibra.
    • The domain into which you want to ingest is not the default domain, meaning the domain specified in the domainId property in your lineage harvester configuration file.
    site_name: domain_id
    site_name
    The name of the site to be ingested. The site name is case-sensitive.
    domain_id
    The unique reference ID of the domain in Collibra in which you want to ingest metadata. The domain ID is case-sensitive.
    To ingest all metadata from a Tableau site in the specified domain, specify the site name and a separate domain ID for each site that you list on the siteIds property in the lineage harvester configuration file for Tableau. If the site_name or domain_id property is not specified for a site, the metadata from the site is ingested in the default domain.
    projects

    The Tableau projects to be ingested and the domain in which you want to ingest metadata from the Tableau projects or sub-projects.

    Tip Project filtering is not relevant for those who have an Explorer role in Tableau, because Explorers need to configure permissions for each data object in Tableau that they want to ingest. As the Administrator role has access to all data objects, project filtering allows Administrators to specify which projects to ingest.

    site_name > project_name : domain_id

    The site_name should be the Tableau site name. The project_name should be the Tableau project name.

    The domain_id should be the unique reference ID of the domain in Collibra in which you want to ingest metadata.

    When you specify the site and project names, the following rules apply:

    • Add spaces before and after >. The spaces are separators between the site and project.
    • Specify the full exact site and project names.
    • The values are case-sensitive.

    When you specify a Tableau project, all assets in the project are ingested in the specified domain. If you want to ingest assets from different Tableau projects in one domain, you can specify the same value for domain id for different projects.

    Example

    "Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"

    site_name > project_name > sub-project_name : domain_id

    The site_name should be the Tableau site name. The project_name should be the Tableau project name. Optionally, use sub-project_name to specify the Tableau sub-project name.

    The domain_id property should be the unique reference ID of the domain in Collibra in which you want to ingest metadata.

    When you specify the site, project and sub-project names, the following rules apply:

    • Add spaces before and after >. The spaces are separators between the site and project.
    • Specify the full exact site and project names.
    • The values are case-sensitive.

    Example

    "Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"

    domainMapping

    This section defines in which domains in Collibra you want to ingest assets from your Tableau sites and Tableau projects.

    Domain mapping is transitive, meaning that all resources, such as Tableau workbooks and data attributes in a parent Tableau site, project or sub-project, are ingested in the same domain as the parent.

    Important The domainMapping and filters sections are mutually exclusive. Do not include both domainMapping and filters sections in your JSON file.

    Tip 
    • If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use this domainMapping section.
    • If you want to ingest all of the projects in a Tableau site into the default domain, use only the domainID property in the lineage harvester configuration file. The domainID property represents the default domain.
      Note Tableau assets that are not mapped to specific domains via this domainMapping section, for example Tableau Server assets, are ingested in that default domain.
    • If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering.
    • If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering.
    site name

    The Tableau site name, followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau site.

    Important In the configuration file, use the actual site name, along with the domain reference ID, for example: "Collibra_tab_partner_site": "afc8cfb0-91f1-4075-a3e5-7ce6d1f9bcc9"
    site name > project name

    The Tableau project name, preceded by the name of the Tableau site to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau project.

    Important In the configuration file, use the actual site and project names, along with the domain reference ID, for example: "Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"
    site name > project name > sub-project name

    The Tableau sub-project name, preceded by the name of the Tableau site and project to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau sub-project.

    Important In the configuration file, use the actual site, project and sub-project names, along with the domain reference ID, for example: "Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"
  4. Save the <source ID> configuration file.