Prepare a <source ID> configuration file

Depending on your data source, you might have to, or want to, prepare a <source ID> configuration file. Select your data source below for data source-specific information.

Tip 

Select a data source.

Currently, the information is shown for:

 

The lineage harvester uses a lineage harvester configuration file to collect the Azure Data Factory data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database in Azure Data Factory, add the following content to the JSON file:

    Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name> | found_dbname=<datafactory_name>_<linkedservice_name>;found_hostname=*

    The information of the supported data sources in Azure Data Factory to be collected by Collibra Data Lineage. You can specify any of the following values for the found_dbname property:

    • A database name. And then you can specify the following properties:
      • found_hostname=<server name>, where <server name> is the name of the server that the database is running on.
      • found_schema=<schema name>, where <schema name> is the name of the schema. This property is optional.
    • The combination of <datafactory_name>_<linkedservice_name>, where <datafactory_name> is a data factory name and <linkedservice_name> is a linked service name. If you use this combination, specify * for the found_hostname property.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname

    The name of the database asset in Data Catalog. Specify this property with the database name that you created when you prepared the Data Catalog physical data layer. Specify this property with the database name that you created when you registered the data source.

    No

    schema

    The name of the schema asset in Data Catalog. Specify this property with the schema name that you created when you registered the data source.

    If the Collibra Data Lineage fails to find the schema that you specify, it uses the default schema.

    No

    dialect

    If you specify a database name for the found_dbname property, select one of the following dialects. If you specify a linked service name for the found_dbname property, ignore this property.

    No

    collibraSystemName

    The system or server name of a database.

    If you don't specify a value for this property, "DEFAULT" is shown in the technical lineage.

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Azure Data Factory, you are not required to:
    • Set the useCollibraSystemName property in the lineage harvester configuration file to true.
    • Specify a Collibra system name in the <source ID> configuration file.
    However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, you must specify a Collibra system name in the <source ID> configuration file.
    Important If you use the Source Configuration field for the purpose of providing the true system name of an ODBC database in Azure Data Factory, you are not required to:However, if the value of the Collibra system name setting is set to true, you must specify a Collibra system name in the Source Configuration field.

    Yes

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the DataStage data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database in DataStage, add the required content to the JSON file.
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the dbt data objects. It then sends the metadata to Collibra Data Lineage service for processing. By default, the lineage harvester downloads all accounts that are accessible with the API token that you provided in the lineage harvester configuration file. For each account, the lineage harvester downloads all jobs and the resulting dbt models for each job. You can use this <source ID> configuration file to reduce the amount of data objects to be downloaded and enhance the lineage harvester performance in the following ways:

  • Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
  • Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
  • Map a materialization as a view instead of a table by specifying the materializedMapping property.
Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-dbt, the name of your JSON file must be my-dbt.conf.
  3. For each database in dbt, add the following content to the JSON file:

    Property

    Description

    Required?

    collibraSystemNames

    You can use this section to specify the Collibra System Name for each project.

    No

    projects

    This section contains the project names and the Collibra system names.

    No

    project_id

    Your project ID. You can find the project ID in the dbt URL right after projects. For example, if your dbt URL is https://cloud.getdbt.com/develop/54321/projects/12345 , your project_id is 12345.

    No

    collibraSystemName

    The system or server name of the data source. This is also the name of your System asset in Data Catalog. If you specify this property in the <source ID> configuration file, the collibraSystemName property in the lineage harvester configuration file is ignored.

    Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.
    No
    filter

    You can use this section to include projects and jobs to be downloaded. Collibra Data Lineage downloads and processes only the specified jobs and projects.

    No

    jobIds

    The job IDs of the jobs that you want to include.

    Specify an integer. Do not specify a string.

    To get your job ID, in your dbt, select Deploy and then Jobs. Select a job and you can find your job ID in the URL. For example, if your URL is cloud.getdbt.com/deploy/65432/projects/23456/jobs/123456, 123456 is your job ID.

    No

    projectIds

    The account IDs of the accounts that you want to include.

    Specify an integer. Do not specify a string.

    To get your account ID, in dbt, click the gear icon in the upper right, select Account Settings and find your account ID in the URL. For example, if your URL is cloud.getdbt.com/settings/accounts/65432, 65432 is your account ID.

    No
    materializedMapping

    Indicates how materializations in dbt are mapped. If you do not specify this property, CollibraData Lineage maps materializations to tables by default. You can change the mapping of a materialization to view.

    In the following example, the ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES materialization is mapped to a view.

    	"materializedMapping":{
    	    "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW"
    	}
    No
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Informatica PowerCenter data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database, add the required content to the JSON file.

    If certain properties are not specified in the source ID file, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.

  4. Save the <source ID> configuration file.

You use the lineage harvester configuration file to access Informatica Intelligent Cloud Services Data Integration data objects. The lineage harvester processes the data objects to create a technical lineage. You also have to prepare a specific <source ID> configuration file that defines the Intelligent Cloud Services system name.

Important You must prepare a <source ID> configuration file regardless of whether the useCollibraSystemName property in your lineage harvester configuration files is set to true or false.

Prerequisites

You have Admin permission on all objects that you want to harvest.

Example 

Steps

  1. Create a new JSON configuration file in the lineage harvesterconfig folder.

    If you have a data source with a large size for an Informatica Intelligent Cloud Services connection, consider creating more than one JSON file for the data source. Each JSON file must have a unique name. The contents in the JSON files are the same. In this way, you can avoid errors that might occur when the lineage harvester ingests metadata from one source with a large size.

  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example If the value of the Id property in your lineage harvester configuration file is iics-source-1, then the name of your JSON file should be iics-source-1.conf.
  3. Important Your JSON file must have the file extension .conf.
  4. For each Informatica Intelligent Cloud Services connection, you can add the following content to the JSON file:

    Property

    DescriptionRequired?

    collibraSystemNames

    This section contains the system information for Informatica Intelligent Cloud Services.

    connections

    This section contains the system connection information. This is required to reference to the system or server of the connection.

    connectionName

    The name of the connection. The name must match the System asset name in Data Catalog for stitching.

    Yes
    collibraSystemName

    The system or server name of the connection.

    Yes

    connectionDefinitions

    This section contains the database, schema and dialect information for each connection in Informatica Intelligent Cloud Services.

    Note You can add connection information for each connection in the connections section.

    connectionName

    The name of the connection. The name must match with the name in a connection name in the connections section.

    This property is required.

    Yes
    databaseName

    The name of your database. The name must match the Database asset name in Data Catalog for stitching.

    Yes
    schemaName

    The name of your schema. The name must match the Schema asset name in Data Catalog for stitching.

    Yes
    dialect

    The dialect of the connection. Specify this property for Collibra Data Lineage to properly extract and parse queries that are related to this connection.

    You can enter one of the following values:

    • bigquery
    • db2
    • hana
    • hive
    • greenplum
    • mssql
    • mysql
    • netezza
    • oracle
    • postgres
    • redshift
    • snowflake
    • spark
    • teradata
    No
  5. Save the configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the Looker data objects and send them to the Collibra Data Lineage service instance.

The <source ID> configuration file allows you to:

  • Filter on the Looker folders from which you want to ingest metadata.
  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
    Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.
Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example The value of the Id property in the lineage harvester configuration file is looker-source-1. As a result, the name of your JSON file should be looker-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Looker, add the following content to the JSON file:

    Property

    Description

    Mandatory?

    Connections

    This section contains all Looker connections for which you want to create a technical lineage.

    Yes

    <connection name>

    The name of a connection object in Looker.

    Yes

    dialect

    The dialect of the supported data source in Looker.

    No

    schema

    The name of the default schema of a supported data source in Looker.

    If the lineage harvester fails to find a specific schema, it uses the default schema.

    No

    dbname
    The name of the database of a supported data source in Looker.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Yes

    filters

    Optionally, use this section to specify the Looker folders from which you want to ingest metadata.

    Note You can filter on Looker folders, but not on Looker data sets. That's because Looker data sets are linked directly to the server, instead of a folder, as shown in the Looker metadata overview. Looker data sets are ingested in the default domain, regardless of any filtering.

    Let’s say, for example, you filter on folder B. A Looker Folder asset is created in the specified domain in Collibra, and all of the metadata in folder B is ingested. If folder B has a parent folder A, then a Looker Folder asset is created (in the domain specified for folder B) to preserve the hierarchy, but no metadata from folder A is ingested.

    You can specify more than one Looker folder for ingestion into a single domain in Collibra.

    Warning If you don't want to filter on Looker Folders, you must completely remove this filters section.

    Tip 

    You can use wildcards to capture multiple connection string combinations:

    No
    domainId

    The unique resource ID of the domain (or domains), in Collibra, in which you want to ingest data objects from one or more Looker Folders.

    Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

     
    description
    Any description, as you see fit. 
    folderNames

    The name (or names) of the Looker Folders from which you want to ingest.

    Note You must specify either a folder name, a folder ID, or both.

     
    folderIds

    The ID (or IDs) of the Looker Folder you want to ingest.

    Note You must specify either a folder ID, a folder name, or both.

     
  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Matillion data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. Add the required content to the JSON file.
  4. Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to MicroStrategy. You must also prepare a MicroStrategy <source ID> configuration file to:

  • Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
    Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
  • Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
  • Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
    Note Mapping doesn't work for custom SQL.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
    Example If the value of the Id property in the lineage harvester configuration file is mstr-source-1, then the name of your JSON file should be mstr-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. Property

    Description

    Mandatory

    default_domain_id

    The domain in which you want the corresponding assets of MicroStrategy metadata to be ingested.

    Note If you configure filtering, only the MicroStrategy Server asset is ingested into this default domain.

    Yes

    filters

    This section allows you to specify:

    • From which MicroStrategy projects you want to harvest metadata.
    • Into which domains in Collibra you want to ingest the corresponding assets.

    If you don't want to filter on projects, don't include this section in your <source ID> configuration file.

    No

    domainId

    The unique resource ID of the domain (or domains) in Collibra in which you want to ingest the MicroStrategy assets.

    Tip If you use a filters section, you must include the domainId property in the section. If, by chance, you want to filter on certain projects, but you want to ingest all assets into the default domain, then the value of the domainId property must match the value of the default_domain_id property.

    No

    projectIds
    The IDs of the MicroStrategy projects from which you want to ingest metadata.

    No

    projectNames
    The project names of the MicroStrategy projects from which you want to ingest metadata.

    No

    datasourceMapping

    This optional section allows you to configure data source mapping. Include this section only if you need to differentiate between multiple data sources that have the same name.

    Note Mapping doesn't work for custom SQL.

    No

    found_datasource

    The name of the data source that was returned by the lineage harvester, as shown in the technical lineage.

    Note The data source name is case-sensitive.

    Yes

    found_project

    The name of the project in which the data source information resides. You can specify an asterisk (*) to search for data source information across all projects.

    Yes

    mapping

    Use this section to map the data source name that was returned by the lineage harvester to the true name of the data source.

    Example You have a Redshift data source named "RD_pearl", but the lineage harvester has returned the name "Redshift_connection". You can configure the datasourceMapping section as follows:
    {
        "datasourceMapping": [
    	 {
    	     "found_datasource": "REDSHIFT",
    	     "found_project": "*",
    	     "mapping": {
    		  "dbname": "RD_pearl",
    		  "collibraSystemName": "TV_dev"
    	     }
    	 }
        ]
    }

    Yes

    dbname

    The name of the database to which you want to map the found data source.

    Yes

    schema

    The name of the schema in MicroStrategy.

    No

    dialect

    The dialect of the data source in MicroStrategy.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    If you set the useCollibraSystemName property to false in your lineage harvester configuration file, leave this property empty as follows: "collibraSystemName": "".

    Warning The values of this property must exactly match the name of your System asset in Collibra.

    Yes

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the Power BI data objects. It then sends the metadata to the Collibra Data Lineage service instances.

The <source ID> configuration file allows you to:

  • Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
    Note Mapping doesn't work for custom SQL.
  • Configure workspace filtering.
    Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.
Example 

Steps

Tip Watch a video on how to do this:
  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the sourceId property in the lineage harvester configuration file.
    Example The value of the sourceId property in the lineage harvester configuration file is power-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Power BI, add the following content to the JSON file:
  4. Property

    Description

    Mandatory?

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

    The database information of supported data sources in Power BI that is typically collected by the lineage harvester. Specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema). You then use the child properties to map the names collected by the lineage harvester to the true names.

    Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

    Important The keys that you specify must be unique.
    Tip 

    You can use wildcards to capture multiple connection string combinations:

    Yes

    dbname
    The name of the database of a supported data source in Power BI.

    No

    schema

    The name of the default schema of a supported data source in Power BI.

    If the lineage harvester fails to find a specific schema, it uses the default schema you specify in this property.

    Important Schema mapping is available for schemas that come from Power Query connections. It is not available, however, if a Power Query connection is created with SQL (or MDX) statements and the schema is specified in those statements.

    No

    dialect

    The dialect of the supported data source in Power BI.

    No

    collibraSystemName

    The system or server name of a database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Warning The value of this property must exactly match (including for case-sensitivity) the name of your System asset in Collibra.

    Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:
    • Set the useCollibraSystemName property in the lineage harvester configuration file to true.
    • Specify a Collibra system name in the <source ID> configuration file.
    However, if the useCollibraSystemName property is set to true in the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.

    Yes (unless you are using the <source ID> file to provide the true system names of ODBC databases in Power BI.)

    filters

    This section allows you to specify the Power BI workspaces from which you want to ingest metadata.

    The filters work as "workspace AND workspace AND capacity AND capacity", meaning that if you specify a capacity, all of the workspaces in that capacity are also ingested.

    Warning If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.

    Tip 

    You can use wildcards to capture multiple connection string combinations:

    No

    domainId

    The unique resource ID of the domain (or domains), in Collibra Data Intelligence Cloud, in which you want to ingest the Power BI assets.

    Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

    Yes

    description

    Any description, as you see fit.

    Yes

    workspaceNames

    The names of Power BI workspaces from which you want to ingest metadata.

    Important Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name "Sale and Marketing [automobiles]" should be formatted as follows:
    Sale and Marketing [[]automobiles[]]

    No

    workspaceIds

    The IDs of Power BI workspaces from which you want to ingest metadata.

    Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.

    No
    capacityNames

    The names of capacities on which you want to filter.

    No
    capacityIds

    The IDs of capacities on which you want to filter.

    Warning Any letters in a capacity ID must be in upper case.

    No
    excludeWorkspaceNames

    The names of Power BI workspaces that you want to exclude from the ingestion job.

    This is useful if you want to exclude, for example, dedicated development and testing workspaces.

    Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.

    For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

    No
    excludeWorkspaceIds

    The IDs of Power BI workspaces that you want to exclude from the ingestion job.

    This is useful if you want to exclude, for example, dedicated development and testing workspaces.

    For complete details on the advantages, limitations and configuration considerations of this property, see Filtering Power BI workspaces.

    No
  5. Save the <source ID> configuration file.

When you create technical lineage for Snowflake by using the SQL-API ingestion method, you can create a <source ID> configuration file to configure the metadata that Collibra Data Lineage collects

Example 

Steps

  1. Create a new JSON file in the lineage harvester config folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database in Snowflake, add the following content to the JSON file:

    Property

    Description

    Required?

    displaySampleQueries

    Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display WHERE amount < 100 or WHERE amount < ?.

    Specify one of the following values:

    true
    Actual values from queries are displayed.
    false
    A question mark (?) is displayed. This is the default value.
    No
    analyzeTemporaryTables

    Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values: 

    true
    Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met:
    • The query starts with the CREATE TEMPORARY TABLE statement.

    • Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query.

    false
    Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.
    No
  4. Save the <source ID> configuration file.

The lineage harvester uses the lineage harvester configuration file to collect the SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) data objects and send them to the Collibra Data Lineage service.

The <source ID> configuration file allows you to:

  • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
  • Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.
Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example The value of the Id property in the lineage harvester configuration file is ssrs-source-1. As a result, the name of your JSON file should be ssrs-source-1.conf.

    Important Your JSON file must have the file extension .conf.

  3. For each database in SSRS and PBRS, add the following content to the JSON file:

    Property

    Description

    Required?

    DataSources

    This section contains all connections for which you want to create a technical lineage.

    The DataSources section refers to shared data sources in SSRS and PBRS. For more information about shared data sources, see the Microsoft documentation.

    Yes

    <data source type>

    The path of a connection object in SSRS and PBRS.

    Yes

    dbname
    The name of the database of a supported data source in SSRS and PBRS.

    No

    schema

    The name of the default schema of a supported data source in SSRS and PBRS.

    No

    dialect

    The dialect of the supported data source in SSRS and PBRS.

    No

    collibraSystemName

    The system or server name of the database.

    If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

    Yes

    CustomDataSources

    You can use custom data processing extensions that are used to support embedded data sources of which the data source definition is specified locally in a report or embedded data set.

    The CustomDataSources section refers to embedded data sources in SSRS and PBRS. For more information about embedded data sources, see the Microsoft documentation.

    No

    <path to report>/<custom data source name>

    The full path to the report and the custom data source name.

    You can use wildcards to match multiple folders, reports or data sets. The connection information is this section is used to add missing information or to overwrite parsed information.

    No

    dbname
    The name of the database of a custom data source in SSRS and PBRS..

    No

    schema

    The name of the schema of a custom data source in power. If you don't provide the schema name, the default schema is used.

    No

    dialect

    The dialect of the custom data source in SSRS and PBRS..

    No

  4. Save the <source ID> configuration file.

The lineage harvester uses a lineage harvester configuration file to collect the SQL Server Integration Services data objects. It then sends the metadata to the Collibra Data Lineage service instance.

Example 

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Name the JSON file as <sourceId>.conf, where <sourceId> is the same as the value of the sourceId property in the lineage harvester configuration file and the file extension must be .conf.
    Example If the value of the sourceId property in the lineage harvester configuration file is my-adf, the name of your JSON file must be my-adf.conf.
  3. For each database, add the required content to the JSON file.
  4. Save the <source ID> configuration file.

The lineage harvester uses the configuration file to connect to Tableau. You are not required to create a <source ID> configuration file, but you need one if you want to:

  • Define your Tableau operating model.
  • Provide additional information about databases and files in Tableau. For example, you can define the system name of files and connectors in Tableau.
  • Use the hostnameMapping property to map the database, schema or system names that were returned by the Tableau APIs to the actual names of the assets in Data Catalog. For complete information, go to Tableau hostname, schema, and system name mapping.
    Note Mapping doesn't work for custom SQL.
  • Define in which domains in Collibra you want to ingest assets from your Tableau sites and projects. See the domainMapping and filters properties.

Tip "<source ID>" refers to the value of the Id property in the lineage harvester configuration file.

Example 

Steps

Tip Watch a video on how to do this:
  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvesterconfiguration file.
    Example If the value of the Id property in the lineage harvester configuration file is tableau-source-1, then the name of your JSON file should be tableau-source-1.conf.
    Important Your JSON file must have the file extension .conf.
  3. For each database in Tableau, add the following content to the JSON file:
    Tip You can use wildcards to capture multiple string combinations for any of these properties.

    Property

    Description
    collibraSystemNames

    This section contains the system information for different Tableau data sources. Depending on the kind of data source or connection, you have to specify how to connect to this data source.

    Tip For more information, see the Tableau documentation. We also recommend to check the list of supported connectors in Tableau.

    hostnameMapping

    This section allows you to map Tableau technical database, server and schema names to the respective real names, to preserve stitching.

    Warning 
    • hostnameMapping replaces the following deprecated properties, which have been removed from this topic:
      • The databaseMapping property.
      • The databases sub-section of the collibraSystemNames section.

    hostnameMapping must not be used in combination with either of these properties.

    If you use the hostnameMapping section, you can still use the collibraSystemName property in conjunction with the files, connectors or cloudfiles sub-sections.

    No

    found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>

    The database information of supported data sources in Tableau that is typically collected by the lineage harvester. It allows you to specify the name of the database (found_dbname), on which server a database is running (found_hostname), and optionally, the name of the schema (found_schema).

    No

    dbname

    The name of the database of a supported data source in Tableau.

    No

    schema

    The name of the default schema of a supported data source in Tableau.

    If the lineage harvester fails to find a specific schema, it uses the default schema.

    No

    dialect

    The dialect of the supported data source in Tableau.

    You don't have to specify a dialect; it will automatically be detected. If, however, you are using a dialect that is not supported, you can use this property to specify a supported dialect that is a close comparison. That way, most of your queries will be detected and processed.

    No

    collibraSystemName

    The system or server name of the database.

    Warning The value of this property must exactly match the name of your System asset in Collibra.

    No

    files

    This section contains connection information to one or more files in Tableau.

    Tip If you do not have files in Tableau, you can remove this section.

    filePath
    The full path to the file. For example, the path to a JSON file.
    collibraSystemName
    The system name of the file.
    connectors

    This section contains connection information to one or more connectors in Tableau.

    Tip 
    • If you do not have connectors in Tableau, you can remove this section.
    • The values that you specify for this property are not case-sensitive.
    connectorUrl
    The URL of the connector. For example, the URL to Google Analytics.
    collibraSystemName
    The system name of the connector.
    cloudFiles

    This section contains connection information to one or more cloud files in Tableau's input data.

    Tip If you do not have cloud files in Tableau, you can remove this section.

    name
    The name of the file. For example, the name of a Zendesk file.
    collibraSystemName
    The system name of the cloud file.

    filters

    This section defines:

    • From which Tableau sites and projects you want to harvest metadata.
    • Into which domains in Collibra you want to ingest the corresponding assets.

    Filtering is transitive, which means that all resources in a specified project, such as Tableau workbooks and all sub-projects, are ingested.

    Tableau assets that are not mapped to the specified domains, for example the Tableau Server assets and the parent projects (if you specify their sub-projects), are ingested in the default domain.

    Important 
    • Filtering does not affect the amount of raw metadata that is harvested from Tableau and sent to the Collibra Data Lineage service instance. Rather, it determines which metadata is ingested as assets in Data Catalog.
    • The domainMapping and filters sections are mutually exclusive. Do not include both domainMapping and filters sections in your JSON file.
    Tip 
    • If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use the domainMapping section.
    • If you want to ingest all of the projects in a Tableau site into the default domain, use only the domainID property in the lineage harvester configuration file. The domainID property represents the default domain.
    • If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering.
    • If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering.
    • You can use site filtering and project filtering together:
      • If filtering on the same site, this "filtering" is actually domain mapping, because nothing is filtered out. The contents of the projects are ingested in the specified domains, and the rest of the contents of the site are ingested in a different, specified domain.
      • If you are site filtering on a specific site and project filtering a different site, then site filtering is again a form of domain mapping, and the filtered projects are ingested in their specified domains.
      • If your lineage harvester configuration file includes sites that are not mentioned in the filters section of your <source ID> configuration file, those sites are ingested in the default domain.
    sites

    The Tableau sites to be ingested and the domain in which you want to ingest metadata from the Tableau sites.

    Tip If you have only one Tableau site, do not include a sites section in your <source ID> file. Instead, use a projects section, to filter on Tableau projects. Include a sites section only if all of the following are true:
    • You have more than one Tableau site.
    • You want to ingest all of the metadata from only one Tableau site into a single domain in Collibra.
    • The domain into which you want to ingest is not the default domain, meaning the domain specified in the domainId property in your lineage harvester configuration file.
    site_name: domain_id
    site_name
    The name of the site to be ingested. The site name is case-sensitive.
    domain_id
    The unique reference ID of the domain in Collibra in which you want to ingest metadata. The domain ID is case-sensitive.
    To ingest all metadata from a Tableau site in the specified domain, specify the site name and a separate domain ID for each site that you list on the siteIds property in the lineage harvester configuration file for Tableau. If the site_name or domain_id property is not specified for a site, the metadata from the site is ingested in the default domain.
    projects

    The Tableau projects to be ingested and the domain in which you want to ingest metadata from the Tableau projects or sub-projects.

    Tip Project filtering is not relevant for those who have an Explorer role in Tableau, because Explorers need to configure permissions for each data object in Tableau that they want to ingest. As the Administrator role has access to all data objects, project filtering allows Administrators to specify which projects to ingest.

    site_name > project_name : domain_id

    The site_name should be the Tableau site name. The project_name should be the Tableau project name.

    The domain_id should be the unique reference ID of the domain in Collibra in which you want to ingest metadata.

    When you specify the site and project names, the following rules apply:

    • Add spaces before and after >. The spaces are separators between the site and project.
    • Specify the full exact site and project names.
    • The values are case-sensitive.

    When you specify a Tableau project, all assets in the project are ingested in the specified domain. If you want to ingest assets from different Tableau projects in one domain, you can specify the same value for domain id for different projects.

    Example

    "Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"

    site_name > project_name > sub-project_name : domain_id

    The site_name should be the Tableau site name. The project_name should be the Tableau project name. Optionally, use sub-project_name to specify the Tableau sub-project name.

    The domain_id property should be the unique reference ID of the domain in Collibra in which you want to ingest metadata.

    When you specify the site, project and sub-project names, the following rules apply:

    • Add spaces before and after >. The spaces are separators between the site and project.
    • Specify the full exact site and project names.
    • The values are case-sensitive.

    Example

    "Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"

    domainMapping

    This section defines in which domains in Collibra you want to ingest assets from your Tableau sites and Tableau projects.

    Domain mapping is transitive, meaning that all resources, such as Tableau workbooks and data attributes in a parent Tableau site, project or sub-project, are ingested in the same domain as the parent.

    Important The domainMapping and filters sections are mutually exclusive. Do not include both domainMapping and filters sections in your JSON file.

    Tip 
    • If you want to ingest all of the projects in a Tableau site into multiple domains in Collibra, use this domainMapping section.
    • If you want to ingest all of the projects in a Tableau site into the default domain, use only the domainID property in the lineage harvester configuration file. The domainID property represents the default domain.
      Note Tableau assets that are not mapped to specific domains via this domainMapping section, for example Tableau Server assets, are ingested in that default domain.
    • If you want to ingest all of the projects in a Tableau site into a single domain in Collibra, use site filtering.
    • If you want to ingest metadata from only some of the projects in a Tableau site, use project filtering.
    site name

    The Tableau site name, followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau site.

    Important In the configuration file, use the actual site name, along with the domain reference ID, for example: "Collibra_tab_partner_site": "afc8cfb0-91f1-4075-a3e5-7ce6d1f9bcc9"
    site name > project name

    The Tableau project name, preceded by the name of the Tableau site to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau project.

    Important In the configuration file, use the actual site and project names, along with the domain reference ID, for example: "Collibra_tab_partner_site > JB_Test_2812": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"
    site name > project name > sub-project name

    The Tableau sub-project name, preceded by the name of the Tableau site and project to which it belongs, and followed by the unique reference ID of the domain in Collibra in which you want to ingest resources from the Tableau sub-project.

    Important In the configuration file, use the actual site, project and sub-project names, along with the domain reference ID, for example: "Collibra_tab_partner_site > JB_Test_2812 > ProjectJJ2": "d224a1a5-43b4-43b2-8df0-ddf8f2726b82"
  4. Save the <source ID> configuration file.