Add an Edge capability to an Edge site

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After you have created and installed an Edge site, you can add an Edge capability to perform specific tasks on a data source. For example, you can register a data source by using a JDBC connection that belongs to an Edge capability.

Prerequisites

Steps

Tip 

The information in this section varies depending on the capability template that you select.

Select a data source and the connection type if needed to see the related information.

Currently, the information is shown for:

  1. Open an Edge site.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of the Edge sites.
    3. In the table, click the name of the Edge site whose status is Healthy.
      The Edge site page opens.
  2. In the Capabilities section, click Add capability.
    The Add capability page is shown.
  3. Select the EdgeCatalog Data ClassificationCatalog JDBC ingestionJDBC ProfilingCatalog JDBC SamplingS3 synchronizationGCS synchronizationDatabricks Unity Catalog synchronizationCatalog JDBC ingestionTechnical Lineage for dbtCollibra Protect for AWS Lake FormationCollibra Protect for BigQueryCollibra Protect for DatabricksCollibra Protect for SnowflakeGoogle Dataplex Catalog synchronization capability template you want to use.
    Note When you select a capability template, you may need to add required custom properties. For example, if you select the S3 synchronization capability template, you have to add credentials to configure the S3 connection.
  4. Enter the required information.
    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    S3 synchronization

    Yes

    S3 service account

    This section contains information about how to connect to Amazon S3.
    AWS Connection
    The AWS connection to be used.

    Yes

    IAM role
    The IAM role to be used by the AWS Glue crawlers.

    Yes

    Delete Glue database left after previous synchronization of the file system

    Select the checkbox if you want the capability to delete the Glue databases created by previous runs of the capability, before the capability starts the synchronization.
    If you deselect this checkbox, the Glue databases created by previous runs are not removed. This can be useful for troubleshooting.

    By default, this checkbox is selected.

    No

    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting.
    Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the S3 synchronization.

    By default, this checkbox is not selected.

    No

    Finalization Strategy

    Define what you want to do if an asset has been deleted from the S3 data source after an initial synchronization.
    The possible values are:

    • Change Status (default): If an asset has been deleted from the S3 data source after an initial synchronization, we update the status of the asset in Collibra to "Missing from source".
    • Remove Resources: If an asset has been deleted from the S3 data source after an initial synchronization, we remove the asset from Collibra.
    • Ignore: If an asset has been deleted from the S3 data source after an initial synchronization, we don't change anything for the asset in Collibra.

    Yes

    Logging parameter

    You can use this field to customize the debug logging.

    Important Only complete this field on request of or together with Collibra Support.

    No

    Custom parameter

    Use this field to define that you want to ingest File Group assets as File assets.

    • Name: file-group-as-file
    • Type: Text
    • Encryption: Not encrypted (Plain text)
    • Value: true
    • Type: Text
    • Value Type: Plaintext
    • Name: file-group-as-file
    • Value: true

    No

    Glue database configuration

    Glue database configuration

    Text in JSON format to define the Glue database names, regions, and domain IDs that you want to integrate.

    Tip  Use this parameter if the current S3 synchronization crawler configuration doesn’t meet your needs. With this parameter, you can integrate an AWS Glue database for which you defined crawlers in AWS Glue itself. This allows you to use all crawler options from the AWS Glue Console. If you use this parameter, you don't need to create crawlers in Collibra.

    Important  If you use this parameter, any crawlers you create in Collibra will not be taken into account during the S3 synchronization. You, however, will need to create a dummy crawler in Collibra to start the synchronization. A dummy crawler is a crawler with an invalid include path, such as s3://dummy.
    In a future release, we'll remove the need for a dummy crawler.

    • The text must be in JSON format and can contain a block per database that you want to integrate.
      You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use.
    • In a block, you can specify the Glue database name, region, and domain ID that must be ingested. The format is:
      • "glueDbName": “the name of the AWS Glue database”
      • "glueDbRegion": “the region of the AWS Glue database”
      • "dgcDomainId": “the domain ID in Collibra where assets of the AWS Glue database must be added”
        If you don't add the domain ID, the assets are added in the same domain as the S3 File System asset.

    Example 
    [
    	{
    		"glueDbName": "integrations-auto-1",
    		"glueDbRegion": "eu-west-1",
    		"dgcDomainId": "a3fe0607-65af-43d6-bc2c-7c3adae6e162"
    	},
    	{
    		"glueDbName": "integrations-auto-2",
    		"glueDbRegion": "eu-west-1"
    	}
    ]

    In this example:

    • Assets from the AWS Glue database "integrations-auto-1" will be ingested into the domain with ID "a3fe0607-65af-43d6-bc2c-7c3adae6e162".
    • Assets from the AWS Glue database "integrations-auto-2" will be ingested into the same domain as the S3 File System asset.


    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    An option to determine the verbosity of the log files. The default value is No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Catalog Data Classification

    Yes

    Connection

    This section contains information to connect to the data source.

    JDBC connection

    The connection to the data source.

    Yes

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Databricks Unity Catalog synchronization

    Yes

    Databricks Connection

     
    Databricks Connection
    The Databricks connection to be used.

    Yes

    Configuration

    This section contains information on how to connect to Databricks Unity Catalog. 
    Save input metadata
    If you select this option the metadata extracted from the data source will be saved in a file that can be used for troubleshooting. Select this option only on request of Collibra Support.

    No

    Exclude Schemas (will be removed soon, use domain mapping instead)

    Comma-separated list of the schemas that you don't want to integrate.

    No

    (deprecated) Filters and Domain Mapping

    Important This field is deprecated in the latest UI. You can now define the mappings in the integration configuration.
    If you have existing mappings here, they will continue to work. However, we advise you to move them to the integration configuration.

    Text in JSON format to include or exclude databases and schemas, and to configure domain mappings.

    • The text must be in JSON format and can contain an include and an exclude block. You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use.
    • In the include block, you can specify the domain in which specific catalogs or schemas must be ingested. The format is: “Catalog/Database > schema ”: “domain ID”. For example, "HR > address-schema": "30000000-0000-0000-0000-000000000000".
    • In the exclude block, you can specify the catalogs or schemas that you don't want to ingest. For example, "* > test".
    • The exclude block has priority over the include block.
    • If the include block is not present, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • You can use the keyword default as a domain ID. In that case, the catalog or schema will be ingested in the same domain as the System asset.
    • A match with a database has priority over a match with a schema.
    • The integration fails before the synchronization starts, if one or more domain IDs specified in the include block don't exist.
    • The integration fails before the synchronization starts if a domain ID is left empty in the include block.
    • You can use the ? and * wildcards in the catalog and schema names. If a catalog or schema matches multiple lines, the most detailed match is taken into account.

    No

    (deprecated) Extensible Properties Mapping

    Via the Extensible Properties Mapping field, Databricks Unity Catalog allows you to add additional properties to Catalog, Schema, and Table objects.

    Important 
    • This field is deprecated in the latest UI. You can now define the mappings in the integration configuration. If you have existing mappings here, they will continue to work. However, we advise you to move them to the integration configuration.
    • If you use this feature, make sure to set up all required characteristic assignments for the asset types.

    Three possible JSON formats are available.

    • Version 0.1: This version allows you to ingest custom properties only. You can ingest the values from the Properties field from Catalog, Schema, and Table objects into specific attributes in Collibra assets. You do this by adding the mapping between the Properties fields for the objects in Databricks Unity Catalog and the Collibra attribute IDs to ingest the data in, using a JSON string.
      • The text must be in JSON format and can contain a Catalogs, Schemas, and Tables block. The Catalogs block refers to Database assets, the Schemas block to Schema assets, and the Tables block to Table assets.
      • In each block, you specify the property name and the attribute ID to which you want to map the value in the property. The format is: "[property name]": "[attribute resource ID]". For example, "Description from source system": "00000000-0000-0000-0001-000500000074".
      Example 
      {
      "catalogs": {
      "color": "00000000-0000-0000-0000-000000001234",
      "Description from source system": "00000000-0000-0000-0001-000500000074"
      },
      "schemas": {
      "File Location": "00000000-0000-0000-0001-000500000004"
      },
      "tables": {
      "delta.lastCommitTimestamp": "00000000-0000-0000-0000-000000003114"
      }
      }

      In this example:

      • In the Database assets that we create, we'll add the Color value in attribute 00000000-0000-0000-0000-000000001234, and the Description from Source value in attribute 00000000-0000-0000-0001-000500000074.
      • In the Schema assets that we create, we'll add the File Location value in attribute 00000000-0000-0000-0001-000500000004.
      • In the Table assets that we create, we'll add the delta.lastCommitTimestamp value in attribute 00000000-0000-0000-0000-000000003114.
    • Version 0.2: This version allows you to ingest both default system properties and custom properties. You can ingest most values from the Details page from Catalog, Schema, and Table objects into specific attributes in Collibra assets. You do this by adding the mapping between the fields for the objects in Databricks Unity Catalog and the Collibra attribute IDs to ingest the data in, using a JSON string.
      • The text must be in JSON format.
      • A Version block referencing 0.2 must be added.
      • A Catalogs, Schemas, and Tables block can be added. The Catalogs block refers to Database assets, the Schemas block to Schema assets, and the Tables block to Table assets.
      • Inside a Catalogs, Schemas, or Tables block, you can add a systemAttributes and a customParameters block. systemAttributes refers to the default system properties. customParameters refers to the custom properties.
      • In each block, you specify the property name and the attribute ID to which you want to map the value in the property. The format is: "[property name]": "[attribute resource ID]". For example, "Description from source system": "00000000-0000-0000-0001-000500000074".
        Following system properties are supported:
        • Catalogs: "browse_only", "catalog_type", "connection_name", "created_at", "created_by", "isolation_mode", "metastore_id", "provider_name", "provisioning_info", "securable_kind", "securable_type", "share_name", "storage_location", "storage_root", "updated_at" , and "updated_by".
        • Schemas: "catalog_type", "created_at", "created_by", "metastore_id", "securable_type", "securable_kind", "storage_location", "storage_root", "updated_at", and "updated_by".
        • Tables: "access_point", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
          Tables mapping apply to tables and views.
      Example 
      {
      "version": 0.2,
      "catalogs": {
      "systemAttributes": {
      "metastore_id": "00000000-0000-0000-0000-000000004224"
      },
      "customParameters": {
      "color": "00000000-0000-0000-0000-000000001234",
      "Description from source system": "00000000-0000-0000-0001-000500000074"
      }
      },
      "schemas": {
      "customParameters": {
      "File Location": "00000000-0000-0000-0001-000500000004"
      }
      },
      "tables": {
      "systemAttributes": {
      "metastore_id": "00000000-0000-0000-0000-000000004224"
      },
      "customParameters": {
      "delta.lastCommitTimestamp": "00000000-0000-0000-0000-000000003114"
      }
      }
      }

      In this example:

      • In the Database assets that we create, we'll add the metastore_id value in attribute "00000000-0000-0000-0000-000000004224", the Color value in attribute 00000000-0000-0000-0000-000000001234, and the Description from Source value in attribute 00000000-0000-0000-0001-000500000074.
      • In the Schema assets that we create, we'll add the File Location value in attribute 00000000-0000-0000-0001-000500000004.
      • In the Table and View assets that we create, we'll add the metastore_id value in attribute "00000000-0000-0000-0000-000000004224" and the delta.lastCommitTimestamp value in attribute 00000000-0000-0000-0000-000000003114.
    • Version 0.3: This version allows you to ingest both default system properties and custom properties, and define separate decisions for tables and views. You can ingest most values from the Details page from Catalog, Schema, Table, and View objects into specific attributes in Collibra assets. You do this by adding the mapping between the fields for the objects in Databricks Unity Catalog and the Collibra attribute IDs to ingest the data in, using a JSON string.
      • The text must be in JSON format.
      • A Version block referencing 0.3 must be added.
      • A Catalogs, Schemas, Tables, and Views block can be added. The Catalogs block refers to Database assets, the Schemas block to Schema assets, the Tables block to Table assets, and the Views block to Database View assets.
      • Inside a Catalogs, Schemas, Tables, or Views block, you can add a systemAttributes and a customParameters block. systemAttributes refers to the default system properties. customParameters refers to the custom properties.
      • In each block, you specify the property name and the attribute ID to which you want to map the value in the property. The format is: "[property name]": "[attribute resource ID]". For example, "Description from source system": "00000000-0000-0000-0001-000500000074".
        Following system properties are supported:
        • Catalogs: "browse_only", "catalog_type", "connection_name", "created_at", "created_by", "isolation_mode", "metastore_id", "provider_name", "provisioning_info", "securable_kind", "securable_type", "share_name", "storage_location", "storage_root", "updated_at" , and "updated_by".
        • Schemas: "catalog_type", "created_at", "created_by", "metastore_id", "securable_type", "securable_kind", "storage_location", "storage_root", "updated_at", and "updated_by".
        • Tables: "access_point", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
        • Views: "access_point", "created_at", "created_by", "data_access_configuration_id", "data_source_format", "deleted_at", "metastore_id", "securable_type", "securable_kind", "sql_path", "storage_credential_name", "storage_location", "table_type", "updated_at", "updated_by", and "view_definition".
      Example 
      {
      "version": 0.3,
      "catalogs": {
      "systemAttributes": {
      "metastore_id": "00000000-0000-0000-0000-000000004224"
      },
      "customParameters": {
      "color": "00000000-0000-0000-0000-000000001234",
      "Description from source system": "00000000-0000-0000-0001-000500000074"
      }
      },
      "schemas": {
      "customParameters": {
      "File Location": "00000000-0000-0000-0001-000500000004"
      }
      },
      "tables": {
      "systemAttributes": {
      "metastore_id": "00000000-0000-0000-0000-000000004224"
      },
      "customParameters": {
      "delta.lastCommitTimestamp": "00000000-0000-0000-0000-000000003114"
      }
      }
      "views": {
      "systemAttributes": {
      "metastore_id": "00000000-0000-0000-0000-000000004224"
      },
      "customParameters": {
      "view.sqlConfig.spark.sql.session.timeZone": "018cedbf-37fc-7da3-9ea8-da2af754222e"
      }
      }
      }

      In this example:

      • In the Database assets that we create, we'll add the metastore_id value in attribute "00000000-0000-0000-0000-000000004224", the Color value in attribute 00000000-0000-0000-0000-000000001234, and the Description from Source value in attribute 00000000-0000-0000-0001-000500000074.
      • In the Schema assets that we create, we'll add the File Location value in attribute 00000000-0000-0000-0001-000500000004.
      • In the Table assets that we create, we'll add the metastore_id value in attribute "00000000-0000-0000-0000-000000004224" and the delta.lastCommitTimestamp value in attribute 00000000-0000-0000-0000-000000003114.
      • In the Database View assets that we create, we'll add the metastore_id value in attribute "00000000-0000-0000-0000-000000004224" and the view.sqlConfig.spark.sql.session.timeZone value in attribute 018cedbf-37fc-7da3-9ea8-da2af754222e.

    No

    Compute Resource HTTP Path

    The HTTP path of the compute resource in Databricks Unity Catalog that we can process to extract the source tags.

    You can find the HTTP path in the connection details of your cluster. For details, go to Get connection details for a cluster in Databricks documentation.

    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    ADLS synchronization

    Yes

    ADLS service account

    This section contains the information on how to connect to Azure Data Lake Storage.
    Azure Connection
    The ADLS connection to be used.

    Yes

    Synchronization Source

    Choose which Microsoft data source you want to integrate from.
    The possible values are:

    • (default): If you select this option, the integration adds assets up to File level. Tables and columns are not integrated.
    • : If you select this option, a File asset can contain Table and Column assets.

    For more information on the difference, go to Azure Data Lake Storage asset types and operating model.

    Yes

    Microsoft Purview Account Name

    Only complete this field if you selected in Synchronization Source.

    The name of your Microsoft Purview account.
    If you enter a Purview account name, the integration uses Microsoft Purview for the integration.

    No

    Save Input Metadata

    If you select this option the metadata extracted from the data source will be saved in a file that can be used for troubleshooting. Select this option only on request of Collibra Support.

    No

    Max Schema Level

    For columns that have a structured technical data type, Array or Struct, you can register the structure of the data. This is supported for AVRO, CSV, JSON, ORC, PARQUET, PSV, SSV, TSV, TXT, and XML.

    In this field, enter the maximum level of the structure you want to see. For example, 3.

    Note If you include a high number of levels, this can have an impact on the integration performance.

    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Azure ConnectionThe Azure connection to be used.

    Yes

    Subscription IDThe ID of your Azure subscription.

    Yes

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from Azure ML.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the Azure ML.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following capability template to ingest Collibra Data Quality & Observability user-defined rules, metrics, and dimensions into Collibra Data Catalog:

    DQ Connector

    Yes

    DQ

    This section contains information about the Collibra Data Quality & Observability connection.
    Base URL
    Your Collibra Data Quality & Observability URL

    Yes

    Username
    The Collibra Data Quality & Observability username for this connection.

    Yes

    Password
    The Collibra Data Quality & Observability password for this connection.

    Yes

    Encryption options

    Select the type of encryption to use.

    Default: To be encrypted by Edge management server.

    Issuer of the JWT
    If you have selected Encrypted with public key, enter your JWT issuer.

    No

    Collibra metadata modelThis section contains information about where to ingest Collibra Data Quality & Observability assets.
    DQ Rules domain id
    The UUID of the Rulebook Domain for the ingested Collibra Data Quality & Observability rules.

    Yes

    DQ Metrics domain id
    The UUID of the Rulebook Domain for the ingested Collibra Data Quality & Observability metrics.

    Yes

    DQ Dimensions domain id
    The UUID of the Governance Asset Domain for the ingested Collibra Data Quality & Observability dimensions.

    Yes

    Default DQ Dimension name

    The default Data Quality Dimension, for example Accuracy, Completeness, Consistency and so on.

    Default: Completeness.

    Yes

    DQ Metric classified by DQ Dimension relation type id
    The UUID of the Data Quality Metric classified by / classifies Data Quality Dimension relation. If left unspecified, this relation will not be added.

    No

    Assets are imported in batches of this size

    The batch size of the ingestion.

    Default: 5000.

    Yes

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select your Edge capability template.

    Note When you select a capability template, you may need to add required custom properties. For example, if you select the S3 synchronization capability template, you have to add credentials to configure the S3 connection.

    Yes

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    GCS synchronization

    Yes

    GCP service account

    This section contains information on how to connect to Google Cloud Storage.
    GCP Connection
    The GCP connection to be used.

    Yes

    ConfigurationThis section contains information on the configuration of the crawlers. 
    Maximum number of files per crawler
    The maximum number of files that can be registered per crawler. The default value is 1,000.

    Yes

    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the synchronization.

    This checkbox is not selected by default.

    No

    Integrate Schemas from Dataplex

    Select the checkbox if you want to integrate the schemas from Dataplex based on the crawler path that will be specified in the GCS integration configuration.
    If the checkbox is not selected, no Dataplex data will be ingested.

    This checkbox is selected by default.

    No

    Project IDs
    Add a comma-separated list of the Project IDs where Dataplex is enabled.
    The capability will search in these projects for schemas based on the crawler path that will be specified in the GCS integration configuration. If the Project IDs field is empty, the integration will search in the project included in the provided GCP Service Account Credentials JSON.

    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Catalog JDBC ingestion

    Yes

    Connection

    This section contains information to connect to the data source.

    JDBC connection

    The connection to the data source.

    Yes

    JDBC data source type (Deprecated)

    Deprecated field. The field was used to indicate the type of the data source. You no longer need to change this field. The required value is automatically identified.

    Note The automatically identified value is not shown in this page.

    Yes

    Supports schemas

    A text field where you have to enter True to enable database registration of data sources that have no schema. If the data source has schemas, you can ignore this field.

    Tip If the data source does not have a schema, Data Catalog creates a Schema asset with the same name as the full name of the database.

    No

    Other Settings

    Others

    This section can contain additional capability properties.
    Click Add propertyAdd Other Settings to add a property.ClosedShow possible properties

    Note No validation is performed on the values you add.

    No

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    For more information, go to logging.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    JDBC Profiling

    Yes

    Connection

    This section contains information to connect to the data source.

    JDBC connection

    The connection to the data source.

    Yes

    Other Settings

    Others

    This section can contain additional capability properties.

    Warning Adding additional properties can have a significant impact on your Edge site. Only add or update them together with Collibra Support.

    Click Add propertyAdd Other Settings to add a property.
    The possible properties are: ClosedShow properties

    Note No validation is performed on the values you add.

    No

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    For more information, go to logging.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    AWS ConnectionThe AWS connection to be used.

    Yes

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from Amazon SageMaker.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the Amazon SageMaker.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    AWS ConnectionThe AWS connection to be used.

    Yes

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from Amazon Bedrock.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the Amazon Bedrock.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    SAP AI Core ConnectionThe SAP AI Core connection to be used.

    Yes

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from SAP AI Core.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the SAP AI Core.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Catalog JDBC Sampling

    Yes

    Connection

    This section contains information to connect to the data source.

    JDBC connection

    The connection to the data source.

    Yes

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    For more information, go to logging.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Amazon Redshift

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Authentication Type

    The authentication details for signing in to Azure Data Factory. You can select one of the following values:

    Service Principal
    When you select this authentication type, ensure that you entered the application secret for the Service Principal in the Service Principal Secret field when you created the Azure connection.
    Resource Owner Password Credentials
    When you select this authentication type, ensure that you specify the username field in this capability and also entered the password in the Service Principal Secret field when you created the Azure connection.

    Yes

    ADF Connection

    The Azure connection that you created.

    Yes

    Username

    The email address of your Azure Active Directory user.

    This field applies only when you selected Resource Owner Password Credentials for the field.

    No

    Resource Group Name

    The name of the resource group that the data factory belongs to.

    Yes

    Subscription ID

    The subscription ID of the resource group.

    Yes

    Factories

    The Azure Data Factory factories that Collibra Data Lineage collects and processes. Specify this property with an array of Azure Data Factory factory names. This property is optional.

    The following rules apply when you specify this property:

    • Enter the factory names in square brackets ([ ]), enclose each factory name in double quotes (" "), and separate them by a comma, for example, ["MyFirstFactory", "MySecondFactory"].
    • The factory name is not case-sensitive. For example, the MyFactory and myfactory factories are considered the same by Azure Data Factory and Collibra Data Lineage.
    • If you do not specify any factory name, Collibra Data Lineage collects and processes all factories that have datasets and piplelines in them.

    No

    Source Configuration

    The source configuration for database mapping, system mapping, schema mapping, and filtering. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <sourceId>.conf file in this field.
    Example 

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for ADF

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Authentication Type

    The authentication details for signing in to Azure Data Factory. You can select one of the following values:

    Service Principal
    When you select this authentication type, ensure that you entered the application secret for the Service Principal in the Service Principal Secret field when you created the Azure connection.
    Resource Owner Password Credentials
    When you select this authentication type, ensure that you specify the username field in this capability and also entered the password in the Service Principal Secret field when you created the Azure connection.

    Yes

    ADF Connection

    The Azure connection that you created.

    Yes

    Username

    The email address of your Azure Active Directory user.

    This field applies only when you selected Resource Owner Password Credentials for the field.

    No

    Resource Group Name

    The name of the resource group that the data factory belongs to.

    Yes

    Subscription ID

    The subscription ID of the resource group.

    Yes

    Factories

    The Azure Data Factory factories that Collibra Data Lineage collects and processes. Specify this property with an array of Azure Data Factory factory names. This property is optional.

    The following rules apply when you specify this property:

    • Enter the factory names in square brackets ([ ]), enclose each factory name in double quotes (" "), and separate them by a comma, for example, ["MyFirstFactory", "MySecondFactory"].
    • The factory name is not case-sensitive. For example, the MyFactory and myfactory factories are considered the same by Azure Data Factory and Collibra Data Lineage.
    • If you do not specify any factory name, Collibra Data Lineage collects and processes all factories that have datasets and piplelines in them.

    No

    Source Configuration

    The source configuration for database mapping, system mapping, schema mapping, and filtering. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the connection_definitions.conf file in this field.
    Example 

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

     

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    GCP Connection

    The GCP connection that you created.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    No

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Logging configuration, Memory (MiB), and JVM arguments

    These fields contain configuration options that can help when investigating issues with the capability.

    Important Only complete these fields on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Azure

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Azure

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Azure

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Synonyms

    This query retrieves the alternative names for the database objects.

    Views

    This query retrieves the view definitions.

    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Custom Technical Lineage

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Databricks Connection

    The Databricks connection that you created.

    Yes

    Compute Resource HTTP Path

    The HTTP path of the compute resource in Databricks Unity Catalog that Collibra Data Lineage collects and processes to create technical lineage.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Time Frame

    Specify the duration for data collection. The default value is 365, which means that Collibra Data Lineage collects the data of the past 365 days.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Filters

    Use this section to include or exclude databases and schemas to be ingested. Enter the filters in JSON format. If you used filters when you integrated Databricks Unity Catalog, you can enter in this field the content from the Filters and Domain Mapping field in the Databricks Unity Catalog capability. Noted that Collibra Data Lineage ignores the UUIDs that are specified in the content.

    Text in JSON format to include or exclude databases and schemas, and to configure domain mappings.

    • The text must be in JSON format and can contain an include and an exclude block. You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use.
    • In the include block, you can specify the domain in which specific catalogs or schemas must be ingested. The format is: “Catalog/Database > schema ”: “domain ID”. For example, "HR > address-schema": "30000000-0000-0000-0000-000000000000".
    • In the exclude block, you can specify the catalogs or schemas that you don't want to ingest. For example, "* > test".
    • The exclude block has priority over the include block.
    • If the include block is not present, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • You can use the keyword default as a domain ID. In that case, the catalog or schema will be ingested in the same domain as the System asset.
    • A match with a database has priority over a match with a schema.
    • The integration fails before the synchronization starts, if one or more domain IDs specified in the include block don't exist.
    • The integration fails before the synchronization starts if a domain ID is left empty in the include block.
    • You can use the ? and * wildcards in the catalog and schema names. If a catalog or schema matches multiple lines, the most detailed match is taken into account.

    No

    Logging configuration, Memory (MiB), and JVM arguments

    These fields contain configuration options that can help when investigating issues with the capability.

    Important Only complete these fields on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Databricks Unity Catalog

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Databricks Connection

    The Databricks connection that you created.

    Yes

    Compute Resource HTTP Path

    The HTTP path of the compute resource in Databricks Unity Catalog that Collibra Data Lineage collects and processes to create technical lineage.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Time Frame

    Specify the duration for data collection. The default value is 365, which means that Collibra Data Lineage collects the data of the past 365 days.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Save Input Metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Filters

    Use this section to include or exclude databases and schemas to be ingested. Enter the filters in JSON format. If you used filters when you integrated Databricks Unity Catalog, you can enter in this field the content from the Filters and Domain Mapping field in the Databricks Unity Catalog capability. Noted that Collibra Data Lineage ignores the UUIDs that are specified in the content.

    Text in JSON format to include or exclude databases and schemas, and to configure domain mappings.

    • The text must be in JSON format and can contain an include and an exclude block. You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use.
    • In the include block, you can specify the domain in which specific catalogs or schemas must be ingested. The format is: “Catalog/Database > schema ”: “domain ID”. For example, "HR > address-schema": "30000000-0000-0000-0000-000000000000".
    • In the exclude block, you can specify the catalogs or schemas that you don't want to ingest. For example, "* > test".
    • The exclude block has priority over the include block.
    • If the include block is not present, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • You can use the keyword default as a domain ID. In that case, the catalog or schema will be ingested in the same domain as the System asset.
    • A match with a database has priority over a match with a schema.
    • The integration fails before the synchronization starts, if one or more domain IDs specified in the include block don't exist.
    • The integration fails before the synchronization starts if a domain ID is left empty in the include block.
    • You can use the ? and * wildcards in the catalog and schema names. If a catalog or schema matches multiple lines, the most detailed match is taken into account.

    No

    Logging configuration, Memory (MiB), and JVM arguments

    These fields contain configuration options that can help when investigating issues with the capability.

    Important Only complete these fields on request of or together with Collibra Support.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions, where you specify relevant translations for each data source. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <sourceId>.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for DataStage

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions, where you specify relevant translations for each data source. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the connection_definitions.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Db2

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    dbt Connection

    The dbt connection that you created.

    Yes

    Environment Ids

    The IDs of the environments that Collibra Data Lineage uses to download job artifacts.

    Enter an array of environment IDs, for example [123456, 987654]. This field is required if you do not enter a value for the Admin URL field in the dbt connection.

    If you enter values for both the Admin URL and Environment Ids fields, the Environment Ids field takes precedence.

    No

    Source Configuration

    The source configuration to reduce the amount of data objects to be downloaded and enhance the performance of CollibraData Lineage in the following ways:

    • Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
    • Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
    • Map a materialization as a view instead of a table by specifying the materializedMapping property.

    Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <sourceId>.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for dbt Cloud

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    dbt Connection

    The dbt connection that you created.

    Yes

    Environment Ids

    The IDs of the environments that Collibra Data Lineage uses to download job artifacts.

    Enter an array of environment IDs, for example [123456, 987654]. This field is required if you do not enter a value for the Admin URL field in the dbt connection.

    If you enter values for both the Admin URL and Environment Ids fields, the Environment Ids field takes precedence.

    No

    Source Configuration

    The source configuration to reduce the amount of data objects to be downloaded and enhance the performance of Collibra Data Lineage in the following ways:

    • Filter the projects and jobs to be downloaded. Include projects and jobs to be downloaded by specifying the filter property.
    • Specify different Collibra system names for different projects by specifying the collibraSystemNames property .
    • Map a materialization as a view instead of a table by specifying the materializedMapping property.

    Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <sourceId>.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The source configuration to reduce the amount of data objects to be processed and enhance the performance of Collibra Data Lineage.

    Specify the following properties in JSON format and enter the content in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for dbt

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The source configuration to reduce the amount of data objects to be processed and enhance the performance of Collibra Data Lineage.

    Specify the following properties in JSON format and enter the content in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for BigQuery

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Billing ID

    Important This field is currently optional. In a future version of Collibra it will become mandatory.

    The billing ID is a JDBC connection parameter that is required to execute the SQL statements to harvest the metadata. Enter the project ID of any single project for which you want to harvest metadata.

    Tip You can then use the Project ID field to specify all of the other projects from which you want to harvest metadata.

    No

    Project ID

    Use this field to specify (by project ID) the project or projects from which you want to harvest metadata. Leave this field empty if you want to harvest the metadata from all projects for which the service account has permissions.

    Tip Each field can only contain a single project ID. To list mutiple project IDs, click Add property, and then add the next project ID.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Columns Tail

    This query retrieves all columns tails.

    Views

    This query retrieves the view definitions.

    Other Queries

    This query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Billing ID

    Important This field is currently optional. In a future version of Collibra it will become mandatory.

    The billing ID is a JDBC connection parameter that is required to execute the SQL statements to harvest the metadata. Enter the project ID of any single project for which you want to harvest metadata.

    Tip You can then use the Project ID field to specify all of the other projects from which you want to harvest metadata.

    No

    Project ID

    Use this field to specify (by project ID) the project or projects from which you want to harvest metadata. Leave this field empty if you want to harvest the metadata from all projects for which the service account has permissions.

    Tip Each field can only contain a single project ID. To list mutiple project IDs, click Add property, and then add the next project ID.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Columns Tail

    This query retrieves all columns tails.

    Views

    This query retrieves the view definitions.

    Other Queries

    This query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Greenplum

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Hive

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Hive) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Object Names

    This query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Hive) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Object Names

    This query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    IICS connection

    The Informatica Intelligent Cloud Services (IICS) connection that you created.

    Note Collibra Data Intelligence Platform 2023.03 or newer is required to use the Informatica Intelligent Cloud Services (IICS) connection.

    Yes

    Objects

    The objects that you want to retrieve.

    Each object requires a path and a type as shown in the following example, where,

    path
    The path to the object, which is relative to the Explore directory in IICS, for example, Sales.
    type

    The type of the object, for example, Taskflow.

    IICS scanner's starting point is a Taskflow or Linear Taskflow (Workflow). Therefore the only meaningful types to retrieve are: Taskflow, Workflow, Project and Folder.

    The types are not case sensitive.

    Tip For more information about the objects that you can retrieve and the required information, go to the Informatica documentation.

    Yes

    Parameter Files

    Upload a ZIP file that contains Informatica Intelligent Cloud Services parameter files. You can name the ZIP file as you prefer. Ensure that the ZIP file contains all parameter files that you want CollibraData Lineage to collect.

    No

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for Informatica Intelligent Cloud Services (IICS)

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    IICS connection

    The Informatica Intelligent Cloud Services (IICS) connection that you created.

    Note Collibra Data Intelligence Platform 2023.03 or newer is required to use the Informatica Intelligent Cloud Services (IICS) connection.

    No

    Objects

    The objects that you want to retrieve.

    Each object requires a path and a type as shown in the following example, where,

    path
    The path to the object, which is relative to the Explore directory in IICS, for example, Sales.
    type

    The type of the object, for example, Taskflow.

    IICS scanner's starting point is a Taskflow or Linear Taskflow (Workflow). Therefore the only meaningful types to retrieve are: Taskflow, Workflow, Project and Folder.

    The types are not case sensitive.

    Tip For more information about the objects that you can retrieve and the required information, go to the Informatica documentation.

    Yes

    Parameter Files

    Upload a ZIP file that contains Informatica Intelligent Cloud Services parameter files. You can name the ZIP file as you prefer. Ensure that the ZIP file contains all parameter files that you want Collibra Data Lineage to collect.

    No

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    If the connection definitions are provided but certain properties are not specified, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Informatica PowerCenter

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    If the connection definitions are provided but certain properties are not specified, an analyze error called CONFIGURATION is displayed in the transformations table on the Sources tab page when the technical lineage is created. The unspecified properties are marked as UNDEFINED in the analyze error. For more information about the analyze errors, go to Analyze errors and possible solutions in Technical lineage Sources tab page.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for Looker

    Yes

    Main Properties

    The required information for creating a technical lineage.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Looker instance. Ingesting the same Looker instance under different source IDs will fail.
    • Any single Looker instance can be ingested only once. If you create more than one connection for the same Looker instance, integration will fail. If you want to ingest from multiple unique Looker instances, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Looker connection

    The Looker connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Looker.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Looker assets.

    Yes
    Paging limit

    Optional property for customizing the Looker API pagination settings. The default value of "50" is sufficient in most cases; however, you can decrease it to help mitigate node limit errors, or increase it to speed up API calls.

    Note The paging limit option is known to cause issues when used with Looker Core instances. If you experience issues, for example a Received RST_STREAM: Protocol error, we recommend disabling pagination by setting the value to "0".

    No
    Concurrency level

    This optional property is intended to help if you are experiencing HTTP 401 Unauthorized errors due to too many concurrent HTTP calls, using the same token. It allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is "15", meaning as many as 15 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to "1" effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.

    No
    Connection timeout

    This optional property is intended to help avoid timeout errors, when Edge attempts to connect to your Looker instance. The default value is "30", meaning a timeout error is thrown if a connection is not established within 30 seconds.

    If timeout errors persist, try setting the value to "60" or "90".

    No
    Source configuration

    This field allows you to provide JSON code, to:

    • Filter on the Looker folders from which you want to ingest metadata.
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
      Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.

    If you previously integrated Looker via the lineage harvester, you can copy and paste in this field the JSON code from your Looker <source ID> configuration file.

    Example 
    No

    Custom Properties

    This section identifies to which Collibra Data Lineage service instance you want to upload the Looker metadata.

    Warning This applies only for Collibra Cloud for Government customers.

    techlinKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    techlinHost

    The URL of he Collibra Data Lineage service instance to which you want to upload Tableau metadata, for example "techlin-gcp-eu.collibra.com".

    Warning This applies only for Collibra Cloud for Government customers.

    No

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Looker instance. Ingesting the same Looker instance under different source IDs will fail.
    • Any single Looker instance can be ingested only once. If you create more than one connection for the same Looker instance, integration will fail. If you want to ingest from multiple unique Looker instances, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    We highly recommend that you specify only one source ID per Looker service account.

    Yes

    Looker connection

    The Looker connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Looker.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Looker assets.

    This is the default domain.

    If you want to ingest the contents of specific Looker Folders into specific domains in Collibra, you specify the domain reference IDs in the filters section of your source configuration. See the Source Configuration field below.

    Yes
    Paging limit

    Optional property for customizing the Looker API pagination settings. The default value of "50" is sufficient in most cases; however, you can decrease it to help mitigate node limit errors, or increase it to speed up API calls.

    Note The paging limit option is known to cause issues when used with Looker Core instances. If you experience issues, for example a Received RST_STREAM: Protocol error, we recommend disabling pagination by setting the value to "0".

    No

    Concurrency level

    This optional property is intended to help if you are experiencing HTTP 401 Unauthorized errors due to too many concurrent HTTP calls, using the same token. It allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is "15", meaning as many as 15 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to "1" effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.

    No
    Connection timeout

    This optional property is intended to help avoid timeout errors, when Edge attempts to connect to your Looker instance. The default value is "30", meaning a timeout error is thrown if a connection is not established within 30 seconds.

    If timeout errors persist, try setting the value to "60" or "90".

    No

    Source configuration

    This field allows you to provide JSON code, to:

    • Filter on the Looker folders from which you want to ingest metadata.
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Looker.
      Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog.

    If you previously integrated Looker via the lineage harvester, you can copy and paste in this field the JSON code from your Looker <source ID> configuration file.

    Example 
    No
    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for SAP Analytics Cloud

    Yes

    Main Properties

    The required information for creating a technical lineage.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per SAP Analytics Cloud system. Ingesting the same SAP Analytics Cloud system under different source IDs will fail.
    • Any single SAP Analytics Cloud system can be ingested only once. If you create more than one connection for the same SAP Analytics Cloud system, integration will fail. If you want to ingest from multiple unique SAP Analytics Cloud systems, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Yes

    SAP Datasphere connection

    The SAP Datasphere Catalog connection that you created for ingestion in CollibraData Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to SAP Datasphere Catalog.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the SAP Analytics Cloud assets.

    This is the default domain.

    If you want to ingest the contents of specific SAP Analytics Cloud systems, you must also specify the same domain reference ID in the filters section of your source configuration. See the Source Configuration field below.

    Yes
    SAP Analytics Cloud tenant ID

    If you have multiple SAP Analytics Cloud systems, but you only want to ingest metadata from one of them, this optional field allows you to specific from which system you want to integrate. If you leave this field empty, Edge ingests the metadata from all SAP Analytics Cloud systems.

    Note Collibra Data Lineage can only ingest assets in SAP Datasphere that are published. For information on how to publish SAP Analytics Cloud assets in SAP Datasphere Catalog, see Set up SAP Analytics Cloud.

    No
    Source Configuration

    This field allows you to provide JSON code to specify the SAP Analytics Cloud containers and folders from which you want to ingest metadata.

    Example 
    No

    Custom Properties

    This section identifies to which Collibra Data Lineage service instance you want to upload the Looker metadata.

    Warning This applies only for Collibra Cloud for Government customers.

    techlinKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    techlinHost

    The URL of he Collibra Data Lineage service instance to which you want to upload Tableau metadata, for example "techlin-gcp-eu.collibra.com".

    Warning This applies only for Collibra Cloud for Government customers.

    No

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Containers page size

    These options allow you to customize the size of data returned in API calls. You can customize the containers page size, the assets page size, and the transformations page size.

    For each option, the default value is 100 records per page. This is sufficient in most cases; however, you can decrease the value to help mitigate timeout errors, or increase it to speed up API calls.

    If you are experiencing timeout errors, we recommend reducing the value for each of these options, to 75 or 50.

    No
    Assets page size
    No
    Transformations page size
    No

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per SAP Analytics Cloud system. Ingesting the same SAP Analytics Cloud system under different source IDs will fail.
    • Any single SAP Analytics Cloud system can be ingested only once. If you create more than one connection for the same SAP Analytics Cloud system, integration will fail. If you want to ingest from multiple unique SAP Analytics Cloud systems, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    We highly recommend that you specify only one source ID per SAP Analytics Cloud system.

    Yes

    SAP Datasphere connection

    The SAP Datasphere Catalog connection that you created for ingestion in Collibra Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to SAP Datasphere Catalog.

    Yes

    Domain ID

    Warning 
    • This field is deprecated and should be left empty.
    • You need to specify the relevant domain ID in the Domain field in the Integrations page of the Data Catalog UI.
    • If you correctly specify one or more domains in the Data Catalog UI, but also specify a different domain in this field, integration will fail.
    Yes

    SAP Analytics Cloud tenant ID

    If you have multiple SAP Analytics Cloud systems, but you only want to ingest metadata from one of them, this optional field allows you to specific from which system you want to integrate. If you leave this field empty, Edge ingests the metadata from all SAP Analytics Cloud systems.

    Note Collibra Data Lineage can only ingest assets in SAP Datasphere that are published. For information on how to publish SAP Analytics Cloud assets in SAP Datasphere Catalog, see Set up SAP Analytics Cloud.

    No

    Source configuration

    This field is no longer relevant and should be left empty.

    If you want to ingest specific containers or folders into specific domains in Collibra, you need to configure filtering in the Integrations Configuration tab in Data Catalog. See the section "Synchronize your technical lineage" in this topic.

    No
    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Containers page size

    These options allow you to customize the size of data returned in API calls. You can customize the containers page size, the assets page size, and the transformations page size.

    For each option, the default value is 100 records per page. This is sufficient in most cases; however, you can decrease the value to help mitigate timeout errors, or increase it to speed up API calls.

    If you are experiencing timeout errors, we recommend reducing the value for each of these options, to 75 or 50.

    No
    Assets page size No
    Transforms page size No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for SSRS/PBRS

    Yes

    Main Properties

    The required information for creating a technical lineage.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per SQL Server Reporting Service (SSRS) or Power BI Report Server (PBRS). Ingesting the same SQL Server Reporting Service (SSRS) or Power BI Report Server (PBRS) under different source IDs will fail.
    • Any single SSRS or PBRS can be ingested only once. If you create more than one connection for the same SSRS or PBRS, integration will fail. If you want to ingest from multiple unique SSRS or PBRS, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Microsoft SSRS/PBRS connection

    The Microsoft SSRS/PBRS connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to SSRS-PBRS.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the SSRS assets.

    Yes
    Folder Filter

    This field allows you to include only specific folders that contain reports or KPIs in the ingestion process.

    Important This field is mandatory. If you want to ingest all folders, enter *.

    You can filter on multiple folders by:

    • Specifying folder names.
    • Specifying the full path to folders.
    • Using a wildcard.
    • Using a combination of these approaches. For example: folder1, /database/folder2, /folder3/*

    Tip For more information about connecting to a SSRS or PBRS folder, see the Microsoft documentation.

    Yes

    Source configuration

    This field allows you to provide JSON code, to:

    The <source ID> configuration file allows you to:
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
    • Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.

    If you previously integrated SSRS-PBRS via the lineage harvester, you can copy and paste in this field the JSON code from your SSRS-PBRS <source ID> configuration file.

    Example 
    No

    Custom Properties

    This section identifies to which Collibra Data Lineage service instance you want to upload the SSRS-PBRS metadata.

    Warning This applies only for Collibra Cloud for Government customers.

    techlinKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    techlinHost

    The URL of he Collibra Data Lineage service instance to which you want to upload Tableau metadata, for example techlin-gcp-eu.collibra.com.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per SQL Server Reporting Service (SSRS) or Power BI Report Server (PBRS). Ingesting the same SQL Server Reporting Service (SSRS) or Power BI Report Server (PBRS) under different source IDs will fail.
    • Any single SSRS or PBRS can be ingested only once. If you create more than one connection for the same SSRS or PBRS, integration will fail. If you want to ingest from multiple unique SSRS or PBRS, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    We highly recommend that you specify only one source ID per SSRS or PBRS service account.

    Yes

    Microsoft SSRS/PBRS connection

    The Microsoft SSRS/PBRS connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to SSRS-PBRS.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the SSRS assets.

    Yes
    Folder Filter

    This field allows you to include only specific folders that contain reports or KPIs in the ingestion process.

    Important This field is mandatory. If you want to ingest all folders, enter *.

    You can filter on multiple folders by:

    • Specifying folder names.
    • Specifying the full path to folders.
    • Using a wildcard.
    • Using a combination of these approaches. For example: folder1, /database/folder2, /folder3/*

    Tip For more information about connecting to a SSRS or PBRS folder, see the Microsoft documentation.

    Yes

    Source configuration

    This field allows you to provide <source ID> configuration file JSON code.

    The <source ID> configuration file allows you to:
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in SSRS and PBRS.
    • Provide additional information about databases in SSRS and PBRS, which is necessary if the databases do not contain all information to process the SQL source code correctly.

    If you previously integrated SSRS-PBRS via the lineage harvester, you can copy and paste in this field the JSON code from your SSRS-PBRS <source ID> configuration file.

    Example 
    No
    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Matillion connection

    The Matillion connection that you created.

    Note Collibra Data Intelligence Platform 2023.03 or newer is required to use the Matillion connection.

    No

    Group NameThe name of your group in Matillion.

    Yes

    Project Name

    The name of your project in Matillion.

    You can only add the name of one project. If you want to create a technical lineage for other projects, add a technical lineage for Matillion capability for each project.

    Note Each capability requires a separate Matillion connection.

    Yes

    Environment Name

    The name of your environment in Matillion.

    You can only add the name of one environment. If you want to create a technical lineage for other environments, add a technical lineage for Matillion capability for each environment.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Start timestamp

    The timestamp of tasks in Matillion, which indicates the amount of metadata that technical lineage via Edge collects.

    Specify this field with a UNIX timestamp in milliseconds. The default value is 1, which gets as much history as Matillion provides. Matillion provides 7 days of history by default.

    Yes

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    Note A value is required, but it is not used when technical lineage for Matillion is created.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Matillion

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Matillion connection

    The Matillion connection that you created.

    Note Collibra Data Intelligence Platform 2023.03 or newer is required to use the Matillion connection.

    No

    Group Name
    The name of your group in Matillion.

    Yes

    Project Name

    The name of your project in Matillion.

    You can only add the name of one project. If you want to create a technical lineage for other projects, add a technical lineage for Matillion capability for each project.

    Note Each capability requires a separate Matillion connection.

    Yes

    Environment Name

    The name of your environment in Matillion.

    You can only add the name of one environment. If you want to create a technical lineage for other environments, add a technical lineage for Matillion capability for each environment.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Start timestamp

    The timestamp of tasks in Matillion, which indicates the amount of metadata that technical lineage via Edge collects.

    Specify this field with a UNIX timestamp in milliseconds. The default value is 1, which gets as much history as Matillion provides. Matillion provides 7 days of history by default.

    Yes

    Source Configuration

    The connection definitions and system names. Specify the following properties in JSON format and enter the content in this field.

    Tip If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the source ID configuration file in this field.

    No

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    Note A value is required, but it is not used when technical lineage for Matillion is created.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for MicroStrategy

    Yes

    Main Properties

    The required information for creating a technical lineage.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per MicroStrategy Intelligence Server. Ingesting the same MicroStrategy Intelligence Server under different source IDs will fail.
    • Any single MicroStrategy Intelligence Server can be ingested only once. If you create more than one connection for the same MicroStrategy Intelligence Server, integration will fail. If you want to ingest from multiple unique MicroStrategy Intelligence Servers, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    MicroStrategy connection

    The MicroStrategy connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to MicroStrategy.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the MicroStrategy assets.

    Yes
    URL for reports
    This optional property ensures that the correct URL to data objects in MicroStrategy is included on the asset pages of corresponding MicroStrategy assets. The required value depends on which platform you run MicroStrategy:
    • For J2EE, use: "MicroStrategy/servlet/mstrWeb"
    • For .NET, use: "MicroStrategy/asp/Main.aspx"

    No

    MicroStrategy Library URL

    If you are using a custom URL to connect to the MicroStrategy Library Server, use this field to specify the custom library URL.

    Important You only need to specify the URL if both of the following are true:
    • You are connecting to a proxy server.
    • You are not using the default, hardcoded URL to the MicroStrategy Library Server.

      Example If the URL to your MicroStrategy Library is https://collibra.microstrategy.com/MicroStrategyLibrary/api, you don't need to use this field, as that is the default, hardcoded URL. However, if the URL is something like https://collibra.microstrategy.com/MicroStrategyLibraryProd/api, then use this field and configure it as follows:
      "microStrategyLibraryUrl": "MicroStrategyLibraryProd"

    No
    Source configuration

    This field allows you to provide JSON code, to:

    • Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
      Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
    • Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
    • Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
      Note Mapping doesn't work for custom SQL.

    If you previously integrated MicroStrategy via the lineage harvester, you can copy and paste in this field the JSON code from your MicroStrategy <source ID> configuration file.

    Example 
    No

    Custom Properties

    This section identifies to which Collibra Data Lineage service instance you want to upload the MicroStrategy metadata.

    Warning This applies only for Collibra Cloud for Government customers.

    techlinKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    techlinHost

    The URL of he Collibra Data Lineage service instance to which you want to upload Tableau metadata, for example "techlin-gcp-eu.collibra.com".

    Warning This applies only for Collibra Cloud for Government customers.

    No

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Maximum parallel requests

    This optional property allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is "1", which means that HTTP requests are run in a synchronous manner, instead of in parallel. As value of "5", for example, means that as many as 5 HTTP requests can take place in parallel.

    A lower value reduces the chances of experiencing HTTP 401 Unauthorized errors.

    No

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

     

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per MicroStrategy Intelligence Server. Ingesting the same MicroStrategy Intelligence Server under different source IDs will fail.
    • Any single MicroStrategy Intelligence Server can be ingested only once. If you create more than one connection for the same MicroStrategy Intelligence Server, integration will fail. If you want to ingest from multiple unique MicroStrategy Intelligence Servers, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    We highly recommend that you specify only one source ID per MicroStrategy Intelligence Server.

    Yes

    MicroStrategy connection

    The MicroStrategy connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to MicroStrategy.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the MicroStrategy assets.

    Yes

    URL for reports

    This optional property ensures that the correct URL to data objects in MicroStrategy is included on the asset pages of corresponding MicroStrategy assets. The required value depends on which platform you run MicroStrategy:
    • For J2EE, use: "MicroStrategy/servlet/mstrWeb"
    • For .NET, use: "MicroStrategy/asp/Main.aspx"

    No

    MicroStrategy Library URL

    If you are using a custom URL to connect to the MicroStrategy Library Server, use this field to specify the custom library URL.

    Important You only need to specify the URL if both of the following are true:
    • You are connecting to a proxy server.
    • You are not using the default, hardcoded URL to the MicroStrategy Library Server.

      Example If the URL to your MicroStrategy Library is https://collibra.microstrategy.com/MicroStrategyLibrary/api, you don't need to use this field, as that is the default, hardcoded URL. However, if the URL is something like https://collibra.microstrategy.com/MicroStrategyLibraryProd/api, then use this field and configure it as follows:
      "microStrategyLibraryUrl": "MicroStrategyLibraryProd"

    No

    Source configuration

    This field allows you to provide JSON code, to:

    • Specify the default domain, meaning the domain in Collibra in which the corresponding assets of MicroStrategy metadata will be ingested if domain mapping is not configured.
      Note If you do configure domain mapping, the default domain is still the destination domain of the MicroStrategy Server asset.
    • Optionally, specify from which MicroStrategy projects you want to ingest metadata, and into which domains you want to ingest the corresponding assets.
    • Optionally, configure data source mapping, to map the name of a data source returned by the lineage harvester to the true name of the data source.
      Note Mapping doesn't work for custom SQL.

    If you previously integrated MicroStrategy via the lineage harvester, you can copy and paste in this field the JSON code from your MicroStrategy<source ID> configuration file.

    Example 
    No
    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Maximum parallel requests

    This optional property allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is "1", which means that HTTP requests are run in a synchronous manner, instead of in parallel. As value of "5", for example, means that as many as 5 HTTP requests can take place in parallel.

    A lower value reduces the chances of experiencing HTTP 401 Unauthorized errors.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for MySQL

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Netezza

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Oracle

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    Database LinksThis query retrieves links to other databases.
    SynonymsThis query retrieves the alternative names for the database objects.
    ViewsThis query retrieves the view definitions.
    Materialized Views This query retrieves materialized view definitions.
    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    Database LinksThis query retrieves links to other databases.
    SynonymsThis query retrieves the alternative names for the database objects.
    ViewsThis query retrieves the view definitions.
    Materialized Views This query retrieves materialized view definitions.
    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Database Link Mapping

    If you are using DBLinks, this optional field allows you to configure, per data source, the database and schema to which DBLink points.

    The configuration format is as follows:

    {"<dblink_name>": {"database":"<database>","schema":"<schema>"}, ...}

    Tip  If you’re using a DBLink to target another source, you need to share the database model between the targeted (independent) source and the dependent source. Use the Dependent On Sources option to configure that dependency and share the database model.

    Important If the same DBLink, for example dblink.example.com, exists in multiple databases, the formatting shown in the previous example still applies, but you need to enclose it in curly brackets and specify the relevant database, as follows:
    • Basic formatting, as shown in the previous example:
      "dblink.example.com": {"database":"Database_A","schema":"Schema_A1"}
    • Formatting if the DBLink exists in multiple databases and you want to apply it only in a database named "dbScope1":
      "dbScope1": {"dblink.example.com": {"database":"Database_A","schema":"Schema_A1"}}

    If a DBLink is referenced in multiple mappings, as shown in the following example, the first mapping is used.

    "dbScope1": {
       "dblink.example.com": {"database":"DevDB_A","schema":"DevSch_A1"}
    }, 
       "dblink.example.com": {"database":"Database_A","schema":"Schema_A1"}}

    In this case, occurrences of dblink.example.com in the database named "dbScope1" are mapped to:

    "database":"DevDB_A","schema":"DevSch_A1"

    No

    Database-System Mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for PostgreSQL

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Important This field is mandatory, but the value you specify is not taken into consideration. We will remove this field in a future Collibra version.

    Yes

    Database Name Override

    We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process. If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for Power BI

    Yes

    Main Properties

    The required custom properties are based on the selected capability template.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Power BI service. Ingesting the same Power BI service under different source IDs will fail.
    • Any single Power BI service can be ingested only once. If you create more than one connection for the same Power BI service, integration will fail. If you want to ingest from multiple unique Power BI services, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Power BI connection

    The Power BI connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Power BI.

    Yes

    API URL

    The API URL of your Power BI service.

    The default value is https://api.powerbi.com.

    Important This property is only relevant for US government or national cloud Power BI customers, in which case you must include and specify values for both this property and the scope property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    No
    Scope

    Optional property that is intended only for customers with a different scope, such as Chinese tenants.

    Example https://analysis.chinacloudapi.cn/powerbi/api/.default

    Important If you are a US government or national cloud Power BI customer, you must include and specify values for both this property and the apiUrl property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    No

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Power BI assets.

    Yes
    Source configuration

    This field allows you to provide JSON code for database mapping, workspace filtering and specifying the name of a System asset in Collibra.

    • Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
      Note Mapping doesn't work for custom SQL.
    • Configure workspace filtering.
      Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.

    If you previously integrated Power BI via the lineage harvester, you can copy and paste in this field the JSON code from your Power BI <source ID> configuration file.

    Example 
    No

    Custom Properties

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Metadata is harvested and uploaded in a ZIP file to a Collibra Data Lineage service instance, for processing.

    Use this optional property to specify whether or not the raw metadata should be deleted after it has been processed.

    If you select this option, the raw metadata is deleted after processing. If you don't select this option, it is stored in an Amazon S3 bucket.

     

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No
    Use HTTP/1.1 protocol
    Option to use HTTP/1.1 streams, in case file-size limitations are resulting in timeout errors when using the default HTTP/2 streams. No

    Enable lineage for DAX queries

    Note This feature is not available on Collibra Cloud for Government.

    Option to enable DAX analysis via Collibra AI. This feature:

    • Creates column-level lineage that includes your calculated columns and measures in Power BI.
    • Enables stitching between calculated columns in the technical lineage and the corresponding Power BI Column assets in Data Catalog.

    Select this option to enable DAX analysis.

    Clear the checkbox to disable DAX analysis.

    For complete information on this feature, go to DAX analysis via Collibra AI.

     

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

     

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Power BI service. Ingesting the same Power BI service under different source IDs will fail.
    • Any single Power BI service can be ingested only once. If you create more than one connection for the same Power BI service, integration will fail. If you want to ingest from multiple unique Power BI services, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Power BI Connection

    The Power BI connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Power BI.

    Yes

    API URL

    The API URL of your Power BI service.

    The default value is https://api.powerbi.com.

    Important This property is only relevant for US government or national cloud Power BI customers, in which case you must include and specify values for both this property and the scope property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    No

    Scope

    Optional property that is intended only for customers with a different scope, such as Chinese tenants.

    Example https://analysis.chinacloudapi.cn/powerbi/api/.default

    Important If you are a US government or national cloud Power BI customer, you must include and specify values for both this property and the apiUrl property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    No

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Power BI assets.

    Yes

    Source Configuration

    This field allows you to provide JSON code for database mapping, workspace filtering and specifying the name of a System asset in Collibra.

    • Map the names of the server, database and schema that were collected by the lineage harvester to their true names.
      Note Mapping doesn't work for custom SQL.
    • Configure workspace filtering.
      Tip We highly recommend that you read through Filtering Power BI workspaces for important information and guidance before configuring your filters.
    • If useCollibraSystemName in the lineage harvester configuration file is set to true, use the collibraSystemName property to specify the system name of databases in Power BI. Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.

    If you previously integrated Power BI via the lineage harvester, you can copy and paste in this field the JSON code from your Power BI <source ID> configuration file.

    Example 
    No
    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Metadata is harvested and uploaded in a ZIP file to a Collibra Data Lineage service instance, for processing.

    Use this optional property to specify whether or not the raw metadata should be deleted after it has been processed.

    If you select this option, the raw metadata is deleted after processing. If you don't select this option, it is stored in an Amazon S3 bucket.

     

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Use HTTP/1.1 protocol

    Option to use HTTP/1.1 streams, in case file-size limitations are resulting in timeout errors when using the default HTTP/2 streams. No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    Enable lineage for DAX queries

    Note This feature is not available on Collibra Cloud for Government.

    Option to enable DAX analysis via Collibra AI. This feature:

    • Creates column-level lineage that includes your calculated columns and measures in Power BI.
    • Enables stitching between calculated columns in the technical lineage and the corresponding Power BI Column assets in Data Catalog.

    Select this option to enable DAX analysis.

    Clear the checkbox to disable DAX analysis.

    For complete information on this feature, go to DAX analysis via Collibra AI.

     
    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SAP HANA

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Note If you are migrating an SAP HANA data source from the lineage harvester, ensure that you run the ignore-source command with the source ID from the lineage harvester configuration file. When you synchronize this capability, an error occurs if the source ID from the lineage harvester exists even if you use the same source ID for this field. For more information, go to Migrate the technical lineage of a data source.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Calculated Views

    This query retrieves calculated views.

    Dependencies of Calculated Views

    This query retrieves dependencies of calculated views.

    Cross-references of Calculated Views

    Cross-references of Calculated Views

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    SQL Active

    An option to determine whether to include or remove the technical lineage of the data source with the SQL based input.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    If you have a capability with this option selected and the synchronization of the capability fails with the Missing required parameter hanaUseCloudScanner error message, go to In Edge Harvester, previously configured/working SAP HANA Classic capabilities fail to submit to Edge in Collibra Support Portal for a solution.

    No

    Calculated Views Active

    An option to determine whether to include or remove the technical lineage from calculated views in an SAP HANA Classic on-premises data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    If you have a capability with this option selected and the synchronization of the capability fails with the Missing required parameter hanaUseCloudScanner error message, go to In Edge Harvester, previously configured/working SAP HANA Classic capabilities fail to submit to Edge in Collibra Support Portal for a solution.

    No

    Use Hana Cloud for Calculated Views

    An option to determine whether to include or remove the technical lineage from calculated views in an SAP HANA Cloud/Advanced data source.

    To include technical lineage from the SAP HANA Cloud/Advanced data source, you must select this option and the Calculated Views Active option.

    Note Do not select this checkbox if:
    • You are not getting technical lineage from Calculated views.
    • You want to exclude the technical lineage of this data source.

    No

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Note If you are migrating an SAP HANA data source from the lineage harvester, ensure that you run the ignore-source command with the source ID from the lineage harvester configuration file. When you synchronize this capability, an error occurs if the source ID from the lineage harvester exists even if you use the same source ID for this field. For more information, go to Migrate the technical lineage of a data source.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Calculated Views

    This query retrieves calculated views.

    Dependencies of Calculated Views

    This query retrieves dependencies of calculated views.

    Cross-references of Calculated Views

    Cross-references of Calculated Views

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    SQL Active

    An option to determine whether to include or remove the technical lineage of the data source with the SQL based input.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    If you have a capability with this option selected and the synchronization of the capability fails with the Missing required parameter hanaUseCloudScanner error message, go to In Edge Harvester, previously configured/working SAP HANA Classic capabilities fail to submit to Edge in Collibra Support Portal for a solution.

    No

    Calculated Views Active

    An option to determine whether to include or remove the technical lineage from calculated views in an SAP HANA Classic on-premises data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    If you have a capability with this option selected and the synchronization of the capability fails with the Missing required parameter hanaUseCloudScanner error message, go to In Edge Harvester, previously configured/working SAP HANA Classic capabilities fail to submit to Edge in Collibra Support Portal for a solution.

    No

    Use Hana Cloud for Calculated Views

    An option to determine whether to include or remove the technical lineage from calculated views in an SAP HANA Cloud/Advanced data source.

    To include technical lineage from the SAP HANA Cloud/Advanced data source, you must select this option and the Calculated Views Active option.

    Note Do not select this checkbox if:
    • You are not getting technical lineage from Calculated views.
    • You want to exclude the technical lineage of this data source.

    No

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Snowflake

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Ingestion Method

    The Snowflake ingestion methods that Collibra Data Lineage uses to ingest metadata from Snowflake data sources. Select one of the following values:

    SQL
    The SQL Snowflake ingestion mode. Collibra Data Lineage creates a column-level technical lineage based on SQL statements.
    SQL-API
    The SQL-API Snowflake ingestion mode. Collibra Data Lineage creates a column-level technical lineage based on Snowflake schemas and the access history.

    For more information, go to Technical lineage for Snowflake ingestion methods.

    Yes

    Days

    The number of days of the user access history that Collibra Data Lineage collects and processes. For example, if you set the value to 20, Collibra Data Lineage collects the last 20 days of user access history.

    You can use this field to limit data retrieval from the ACCESS_HISTORY table. This field only takes effect when you use the SQL-API Snowflake ingestion mode.

    Specify a value in the range of 1 - 366. If you do not enter a value, all user access history is collected by default.

    Note A higher value of this field results in Collibra Data Lineage retrieving more data from Snowflake. This might cause a Usage of EmptyDir volume "output" exceeds the limit "15Gi" error when Collibra Data Lineage analyzes the metadata to create the technical lineage.

    No

    Extra Database Definitions

    The name of the database from which Collibra Data Lineage collects metadata, but the database is excluded from the technical lineage that is created. This field is useful for stitching across databases. You can specify a cross-referenced database to ensure correct lineage across all databases that Collibra Data Lineage processes to create the technical lineage.

    Tip You can add extra database definitions by clicking Add property.

    No

    Schema Names

    The schema name of your data source. This field takes effect only when you use the SQL-API Snowflake ingestion mode. You can use this field as a filter to include lineage for objects only in the specified schema.

    Ensure that the schema name you specify matches the Schema asset name that you created when you registered the data source in Data Catalog.

    Tip You can add extra schema names by clicking Add property.

    No

    Source Configuration

    The source configuration for the data source. Specify the following property in JSON format and enter the content in this field. This field applies only when you select the SQL-API Snowflake ingestion mode.

    Property

    Description

    Required?

    displaySampleQueries

    Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display WHERE amount < 100 or WHERE amount < ?.

    Specify one of the following values:

    true
    Actual values from queries are displayed.
    false
    A question mark (?) is displayed. This is the default value.
    No
    analyzeTemporaryTables

    Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values: 

    true
    Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met:
    • The query starts with the CREATE TEMPORARY TABLE statement.

    • Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query.

    false
    Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.
    No

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.

    If you select the SQL Snowflake ingestion mode, the following queries apply:

    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    ProceduresThis query retrieves the stored procedures.

    Views

    This query retrieves the view definitions.

    If you select the SQL-API Snowflake ingestion mode, the following queries apply:

    QueryDescription

    Object Dependencies

    This query retrieves view definitions.

    Columns Joined

    This query retrieves table and column definition information.

    If you have missing upstream lineage information, while creating technical lineage for Snowflake with the SQL-API ingestion mode, you can use this query as a workaround to fix the issue.

    Access History

    This query retrieves lineage and transformation details.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Ingestion Method

    The Snowflake ingestion methods that Collibra Data Lineage uses to ingest metadata from Snowflake data sources. Select one of the following values:

    SQL
    The SQL Snowflake ingestion mode. Collibra Data Lineage creates a column-level technical lineage based on SQL statements.
    SQL-API
    The SQL-API Snowflake ingestion mode. Collibra Data Lineage creates a column-level technical lineage based on Snowflake schemas and the access history.

    For more information, go to Technical lineage for Snowflake ingestion methods.

    Yes

    Days

    The number of days of the user access history that Collibra Data Lineage collects and processes. For example, if you set the value to 20, Collibra Data Lineage collects the last 20 days of user access history.

    You can use this field to limit data retrieval from the ACCESS_HISTORY table. This field only takes effect when you use the SQL-API Snowflake ingestion mode.

    Specify a value in the range of 1 - 366. If you do not enter a value, all user access history is collected by default.

    Note A higher value of this field results in Collibra Data Lineage retrieving more data from Snowflake. This might cause a Usage of EmptyDir volume "output" exceeds the limit "15Gi" error when Collibra Data Lineage analyzes the metadata to create the technical lineage.

    No

    Extra Database Definitions

    The name of the database from which Collibra Data Lineage collects metadata, but the database is excluded from the technical lineage that is created. This field is useful for stitching across databases. You can specify a cross-referenced database to ensure correct lineage across all databases that Collibra Data Lineage processes to create the technical lineage.

    Tip You can add extra database definitions by clicking Add property.

    No

    Schema Names

    The schema name of your data source. This field takes effect only when you use the SQL-API Snowflake ingestion mode. You can use this field as a filter to include lineage for objects only in the specified schema.

    Ensure that the schema name you specify matches the Schema asset name that you created when you registered the data source in Data Catalog.

    Tip You can add extra schema names by clicking Add property.

    No

    Source Configuration

    The source configuration for the data source. Specify the following property in JSON format and enter the content in this field. This field applies only when you select the SQL-API Snowflake ingestion mode.

    Property

    Description

    Required?

    displaySampleQueries

    Indicates whether to display transformations with a question mark (?) or with actual values from queries in the Source code pane in the technical lineage graph. For example, you can choose to display WHERE amount < 100 or WHERE amount < ?.

    Specify one of the following values:

    true
    Actual values from queries are displayed.
    false
    A question mark (?) is displayed. This is the default value.
    No
    analyzeTemporaryTables

    Indicates whether to parse the CREATE TEMPORARY TABLE statement in the ingested queries. Specify one of the following values: 

    true
    Collibra Data Lineage examines the queries and parses the CREATE TEMPORARY TABLE statement when the following conditions are met:
    • The query starts with the CREATE TEMPORARY TABLE statement.

    • Collibra Data Lineage did not encounter the CREATE TEMPORARY TABLE statement before this query.

    false
    Collibra Data Lineage does not examine or parse the CREATE TEMPORARY TABLE statement in the ingested queries. This is the default value.
    No

    No

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.

    If you select the SQL Snowflake ingestion mode, the following queries apply:

    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.
    ProceduresThis query retrieves the stored procedures.

    Views

    This query retrieves the view definitions.

    If you select the SQL-API Snowflake ingestion mode, the following queries apply:

    QueryDescription

    Object Dependencies

    This query retrieves view definitions.

    Columns Joined

    This query retrieves table and column definition information.

    If you have missing upstream lineage information, while creating technical lineage for Snowflake with the SQL-API ingestion mode, you can use this query as a workaround to fix the issue.

    Access History

    This query retrieves lineage and transformation details.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Spark SQL

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Select the Technical lineage capability template for your data source to create a technical lineage for the JDBC data source.

    Important Technical lineage via Edge is only available in private beta. Please create a support ticket to get access.

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Spark SQL) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    Object namesThis query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Spark SQL) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    Object namesThis query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SQL Server

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Database Links

    This query retrieves links to other databases.

    SynonymsThis query retrieves the alternative names for the database objects.
    ViewsThis query retrieves the view definitions.
    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription

    Columns

    This query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Database Links

    This query retrieves links to other databases.

    SynonymsThis query retrieves the alternative names for the database objects.
    ViewsThis query retrieves the view definitions.
    Other QueriesThis query retrieves other data that technical lineage needs, for example stored procedures, functions, and packages.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions, where you specify relevant translations for each data source. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <source ID>.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SQL Server Integration Services (SSIS)

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    Yes

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Source Configuration

    The connection definitions, where you specify relevant translations for each data source. Specify the following properties in JSON format and enter the content in this field.

    If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <source ID>.conf file in this field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     
    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Sybase

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

     

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database Name

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
    ViewsThis query retrieves the view definitions.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical lineage for Tableau

    Yes

    Main Properties

    The required information for creating a technical lineage.

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Tableau server or Tableau online account. Ingesting the same Tableau server or Tableau online account under different source IDs will fail.
    • Any single Tableau server or Tableau online account can be ingested only once. If you create more than one connection for the same Tableau server or Tableau online account, integration will fail. If you want to ingest from multiple unique Tableau server or Tableau online accounts, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Tableau connection

    The Tableau connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Tableau.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Tableau assets.

    Yes
    REST only

    Indication whether or not you want to use both the Tableau REST API and Tableau Metadata API to harvest Tableau metadata.

    • Cleared (default): The lineage harvester will use the REST API and Metadata API to harvest Tableau metadata.
    • Selected: The lineage harvester will only use the REST API to harvest Tableau metadata.
    Note This filed must be cleared, to:
    • Enable technical lineage and the automatic stitching of Column assets to Tableau Data Attribute assets.
    • Harvest owner information for Tableau projects, workbooks and data models.

    No

    Exclude images

    Indication whether or not you want to excluding the downloading of images.

    • Cleared: Images are downloaded.
    • Selected (default): Images are not downloaded.

    Note The maximum number of images that can be uploaded to Collibra per day is determined by the configuration of the file upload service, in Collibra Console. For complete details, see the Upload configuration settings in DGC service configuration: options.

    No

    Site ID

    The site IDs of the Tableau sites that you want to include in the ingestion process.

    To ingest from multiple Tableau sites, enter each site ID in a separate Site ID field.

    To ingest the default Tableau site, enter "Default" or leave the field empty. This field is not case sensitive.

    Warning If you enter "Default", you must include the double quotation marks. The site IDs of any other Tableau sites must not be enclosed in double quotation marks. If the formatting of the site IDs does not conform to this detail, ingestion will fail.
    Example 
    Tip Ensure that you specify the correct value. The correct value is the URL of the site to which you want to sign in. When you manually sign in to Tableau Server or Tableau Online, the site ID is the value that appears after /site/ in the browser address bar. In the following example URLs, the site ID is MarketingTeam:
    • Tableau Server: http://MyServer/#/site/MarketingTeam/projects
    • Tableau Online: https://10ay.online.tableau.com/#/site/MarketingTeam/workbooks

    On Tableau Server, however, the URL of the default site does not specify the site. For example, the URL for a view named Profits, on a site named Sales, is http://localhost/#/site/sales/views/profits. The URL for this same view on the default site is http://localhost/#/views/profits. The site name Sales does not figure in the URL.

    Yes

    Site Name

    The site name, or names, of the Tableau sites you specified in the Site ID field.

    If you don't provide a site ID in the Site ID field, in which case the default Tableau site is ingested, leave this field blank.

    No

    Concurrency level

    This field is intended to help if you are experiencing HTTP 401 Unauthorized errors due to too many concurrent HTTP calls, using the same token. It allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is 10, meaning as many as 10 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to 1 effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.

    No

    Source configuration

    This field allows you to provide JSON code for system mapping, database mapping, domain mapping and filtering.

    Tip If you previously integrated Tableau via the lineage harvester, you can copy and paste in this field the JSON code from your Tableau <source ID> configuration file. For more information, go to Tableau hostname, schema, and system name mapping.

    Example 

    No

    Custom Properties

    This section identifies to which Collibra Data Lineage service instance you want to upload the Tableau metadata.

    Warning This applies only for Collibra Cloud for Government customers.

    techlinKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only for Collibra Cloud for Government customers.

    No

    techlinHost

    The URL of he Collibra Data Lineage service instance to which you want to upload Tableau metadata, for example "techlin-gcp-eu.collibra.com".

    Warning This applies only for Collibra Cloud for Government customers.

    No

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

     

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Paging

    This option allows you to customize the Tableau API pagination settings.

    The default values are sufficient in most cases; however, you can decrease them to help mitigate node limit errors, or increase them to speed up API calls.

    If the integration fails because of timeout errors due to page sizing limits, Collibra Data Lineage automatically adjusts the limits and retries. For example, if failure occurs with worksheetsPageSize set to 100, the value is automatically reduced to 50 and another integration attempt is automatically started. If it fails again, the value is again halved. If integration is still unsuccessful with an adjusted value of 1, an error is thrown and no further attempts are started. If integration is eventually successful, the page size value is restored to its original value, in this example 100, for the next synchronization.

    No

    Logging

    This section contains the properties for debug logging. This setting is not valid for this integration.

    No

    Debug

    This setting is not valid for this integration. It should be set to false. No
    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Source ID

    The name of the data source. You can give this any name, as long as it is unique.

    Warning 
    • You can only specify one source ID per Tableau server or Tableau online account. Ingesting the same Tableau server or Tableau online account under different source IDs will fail.
    • Any single Tableau server or Tableau online account can be ingested only once. If you create more than one connection for the same Tableau server or Tableau online account, integration will fail. If you want to ingest from multiple unique Tableau server or Tableau online accounts, you have to create a new Edge connection for each one, configure a new capability template for each one, and each must have a unique source ID.

    Warning If you are switching between the lineage harvester and Edge, the value in this field must exactly match the value of the id property in your lineage harvester configuration file.

    Yes

    Tableau connection

    The Tableau connection that you created for ingestion in Data Catalog.

    Tip Select the name that you provided in the Name field when you created a connection to Tableau.

    Yes

    Domain ID

    The unique reference ID of the domain in Collibra Data Intelligence Platform in which you want to ingest the Tableau assets.

    Yes

    REST only

    Indication whether or not you want to use both the Tableau REST API and Tableau Metadata API to harvest Tableau metadata.

    • Cleared (default): The lineage harvester will use the REST API and Metadata API to harvest Tableau metadata.
    • Selected: The lineage harvester will only use the REST API to harvest Tableau metadata.
    Note This filed must be cleared, to:
    • Enable technical lineage and the automatic stitching of Column assets to Tableau Data Attribute assets.
    • Harvest owner information for Tableau projects, workbooks and data models.

    No

    Exclude images

    Indication whether or not you want to excluding the downloading of images.

    • Cleared: Images are downloaded.
    • Selected (default): Images are not downloaded.

    Note The maximum number of images that can be uploaded to Collibra per day is determined by the configuration of the file upload service, in Collibra Console. For complete details, see the Upload configuration settings in DGC service configuration: options.

    No

    Site ID

    The site IDs of the Tableau sites that you want to include in the ingestion process.

    To ingest from multiple Tableau sites, enter each site ID in a separate Site ID field.

    To ingest the default Tableau site, enter "Default" or leave the field empty. This field is not case sensitive.

    Warning If you enter "Default", you must include the double quotation marks. The site IDs of any other Tableau sites must not be enclosed in double quotation marks. If the formatting of the site IDs does not conform to this detail, ingestion will fail.
    Example 
    Tip Ensure that you specify the correct value. The correct value is the URL of the site to which you want to sign in. When you manually sign in to Tableau Server or Tableau Online, the site ID is the value that appears after /site/ in the browser address bar. In the following example URLs, the site ID is MarketingTeam:
    • Tableau Server: http://MyServer/#/site/MarketingTeam/projects
    • Tableau Online: https://10ay.online.tableau.com/#/site/MarketingTeam/workbooks

    On Tableau Server, however, the URL of the default site does not specify the site. For example, the URL for a view named Profits, on a site named Sales, is http://localhost/#/site/sales/views/profits. The URL for this same view on the default site is http://localhost/#/views/profits. The site name Sales does not figure in the URL.

    Yes

    Site Name

    The site name, or names, of the Tableau sites you specified in the Site ID field.

    If you don't provide a site ID in the Site ID field, in which case the default Tableau site is ingested, leave this field blank.

    No

    Concurrency level

    This field is intended to help if you are experiencing HTTP 401 Unauthorized errors due to too many concurrent HTTP calls, using the same token. It allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is 10, meaning as many as 10 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to 1 effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.

    No

    Source configuration

    This field allows you to provide JSON code for database mapping, domain mapping and filtering.

    If you previously integrated Tableau via the lineage harvester, you can copy and paste in this field the JSON code from your Tableau <source ID> configuration file.

    Example 

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

     

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    No

    Paging

    This option allows you to customize the Tableau API pagination settings.

    The default values are sufficient in most cases; however, you can decrease them to help mitigate node limit errors, or increase them to speed up API calls.

    If the integration fails because of timeout errors due to page sizing limits, Collibra Data Lineage automatically adjusts the limits and retries. For example, if failure occurs with worksheetsPageSize set to 100, the value is automatically reduced to 50 and another integration attempt is automatically started. If it fails again, the value is again halved. If integration is still unsuccessful with an adjusted value of 1, an error is thrown and no further attempts are started. If integration is eventually successful, the page size value is restored to its original value, in this example 100, for the next synchronization.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for Teradata

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Teradata) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Object Names

    This query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Logging

    This section contains general information about logging.

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    JDBC Connection

    The JDBC connection that you created for Catalog JDBC ingestion.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    External Database Name

    The database value to be used as the database name in the full path (system -> database -> schema -> table). Use this field to ensure successful stitching for a database-less data source. You can specify one of the following values:

    • CData, which CDATA drivers returned as a placeholder. Use this value if you did not create a custom database name by using the CustomizedDefaultCatalogName property when you registered your data source.
    • The custom database name that you specified for the CustomizedDefaultCatalogName property when you registered your data source.

    No

    Database Name

    The name of the database or schema (these terms are synonymous for Teradata) from which you want to harvest metadata.

    Yes

    Queries

    The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use. The query code is automatically available. However, you can modify the query code if needed.

    Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

    Note 
    • If you change queries, you can only use supported SQL syntax.
    • Collibra Support does not provide support for customized SQL files.
    Note On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used. However, Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

    To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.
    QueryDescription
    ColumnsThis query retrieves the columns, tables, schemas, databases or projects fields in the form: database or project > schema > table > column.

    Object Names

    This query retrieves a list of object names from which technical lineage can be created. The objects can include stored procedures, views, macros, and so on.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    Debug

    An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (beta). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

    Select one of the following values:

    True
    Enables logging of the JDBC job.
    False
    Disables logging of the JDBC job. This is the default value.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired?
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Main Properties

    This section contains the information for creating a technical lineage.

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Advanced Properties

    This section contains the advanced properties for creating a technical lineage.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired?

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    Yes

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Technical Lineage for SqlDirectory

    Yes

    Source ID

    The name of the data source. Specify a name that is unique.

    Yes

    Shared Storage Connection

    The Shared Storage connection that you created.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    Yes

    Dialect

    The dialect of the database.

    Yes

    Collibra System Name

    The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

    The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. Fore more information, go to Prepare the SQL directory and Automatic stitching for technical lineage.

    Yes

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Cloud for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

    To use this option, enter the source ID of the independent source.

    Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Teradata, Snowflake, and Oracle. For all other dialects:
    • An analyze error is raised, prompting you to provide the DDL file.
    • The only workaround is to consolidate your SQL statements and DDL file in a single data source.

    For complete information, go to Sharing database models across data sources.

     

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Important aaThis setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

    Select one of the following values:

    ValueDescription
    Load

    Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

    Analyze

    Load and analyze the metadata on the Collibra Data Lineage service instance.

    Synchronization does not start after analysis; it starts only after either:

    • You trigger synchronization of another data source for which you specify "Sync" in the Processing Level drop-down list.
    • You configure the Technical Lineage Admin Edge capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

    Important  If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.
    Sync

    Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

    Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Clear the checkbox to exclude the technical lineage of this data source.

    Yes

    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Collibra Protect for AWS Lake Formation

    Yes

    AWS Lake Formation Connection

    The AWS Lake Formation connection to connect to AWS Lake Formation.

    Yes

    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Collibra Protect for BigQuery

    Yes

    GCP Connection

    The GCP connection to connect to Google Cloud Platform.

    Yes

    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Collibra Protect for Databricks

    Yes

    JDBC Connection

    The JDBC connection to connect to Databricks.

    Yes

    FieldDescriptionRequired

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    Collibra Protect for Snowflake

    Yes

    JDBC Connection

    The JDBC connection to connect to Snowflake.

    Yes

    Snowflake role testing

    An option that determines how Snowflake checks roles (that is, Protect groups) for applying data protection standards and data access rules. This is to accommodate Snowflake users who have multiple roles.

    This field contains the following options:

    • CURRENT_ROLE: Checks only the primary role assigned to the Snowflake user. This is the default option.
    • IS_ROLE_IN_SESSION: Checks all the roles assigned to the Snowflake user, including secondary roles, within the active session.

    Yes

    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    SAP Datasphere synchronization

    Yes

    Connection

     
    SAP Connection

    The SAP Datasphere Catalog connection to be used.

    Tip If you already integrated SAP Analytics Cloud, you can also reuse its SAP Datasphere Catalog connection.

    Yes

    Configuration

    This section contains information on how to connect to SAP Datasphere. 
    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. If this option is selected, you can download the files from the Synchronization Result dialog box once the synchronization activity is completed.

    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    No

    FieldDescriptionRequired
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    GCP service account

    This section contains information on how to connect to Google Cloud Storage.
    GCP ConnectionThe GCP connection to be used.

    Yes

    ConfigurationThis section contains information on the configuration of the crawlers. 
    Project IDs (Deprecated)

    Add a comma-separated list of the Project IDs where Dataplex is enabled.

    This field is deprecated in the latest user interface and replaced by the Project IDs field on the Synchronize Metadata page. You can add the Project IDs when you synchronize Google Dataplex.

    The following rules apply when you add Project IDs:
    • If you enter a value in this field and do not add Project IDs on the Synchronize Metadata page, the capability will search in these projects in this field when you synchronize the capability.
    • If you leave this field empty and do not add project IDs on the Synchronize Metadata page, the capability will search in the projects that you entered in the GCP Service Account field in the GCP connection.
    • Do not enter a value in this field and also add Project IDs on the Synchronize Metadata page; otherwise, the synchronization will end with an error when you synchronize the capability.

    No

    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the synchronization.

    This checkbox is not selected by default.

    No

    (deprecated) Filters and Domain Mapping
    Important 

    This field is deprecated in the latest UI. You can now define the mappings in the integration configuration.
    If you have existing mappings here, they will continue to work. However, we advise you to move them to the integration configuration.

    Text in JSON format to include or exclude lakes and zones, and to configure domain mappings.

    • The text must be in JSON format and can contain an include and an exclude block.
    • In the include block,
      • You can specify the domain in which specific lakes or zones must be ingested. The format is: “project ID> lake ID> zone ID”: “domain ID”. For example, "integrations-automated-uer > testlake> testzone": "c8fe882a-a12e-4284-b655-7ac2a4fb08cb.
      • You can also specify the domain in which specific tables and columns must be ingested. The format is "project ID> lake ID > zone ID > table ID":"domain ID"
    • In the exclude block, you can specify the lakes or zones that you don't want to ingest. For example, "* > test".
    • The exclude block has priority over the include block.
    • If the include block is not present, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a zone, we use the domain specified for the Lake.
    • You can use the keyword default as a domain ID. In that case, the lake or zone will be ingested in the same domain as the System asset.
    • A match with a lake has priority over a match with a zone.
    • The integration fails before the synchronization starts, if one or more domain IDs specified in the include block don't exist.
    • The integration fails before the synchronization starts if a domain ID is left empty in the include block.
    • You can use the ? and * wildcards in the zone and lake names. If a lake or zone matches multiple lines, the most detailed match is taken into account.
    • If you registered the BigQuery data source via the BigQuery JDBC connector, and then integrate Google Dataplex, assets will be ingested in the same domains that were registered during JDBC ingestion. Specifically, Project assets are registered in the Database domains, and Zone assets are registered in the Schema domains. The mapping created by JDBC ingestion takes priority over the configurations in this field. In this way, no duplicated tables or columns are created. For more information, go to Ways to work with Google Cloud Platform (GCP).
    Examples

    No

    Extensible Properties Mapping

    Via the Extensible Properties Mapping field, you can integrate additional properties from Dataplex: Table creation date, Table modified date, System (showing where the table comes from), and type (the Zone type).

    Important 

    If you use this feature, make sure to set up all required characteristic assignments for the asset types.

    You do this by adding the mapping between the fields for the objects in Dataplex and the Collibra attribute IDs to ingest the data in, using a JSON string.

    • The text must be in JSON format and can contain a Zones and Tables block.
    • In each block, you specify the property name and the attribute ID to which you want to map the value in the property. The format is: "[property name]": "[attribute resource ID]". For example, "system": "19a27fda-8c50-48a8-87b3-f275ad450fe5".
    Example 
        {
           "tables": {
    	   "system": "19a27fda-8c50-48a8-87b3-f275ad450fe5",
    	   "create_time": "00c57a11-37ca-4259-9c38-0ac5e522e9e8",
    	   "update_time": "a415c2a6-8289-4a4d-8d49-3685712d7622",
    	},
    	"zones": {
    	   "zone_type": "c217db55-b5d6-4430-ad80-8534e691e54a"
    	}
        }

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from the Google Dataplex.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the Google Dataplex.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

    FieldDescriptionRequired
    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    GCP service account

    This section contains information on how to connect to Google Cloud Storage.
    GCP ConnectionThe GCP connection to be used.

    Yes

    ConfigurationThis section contains information on the configuration of the crawlers. 
    Project IDs (Deprecated)

    Add a comma-separated list of the Project IDs where Dataplex is enabled.

    This field is deprecated in the latest user interface and replaced by the Project IDs field on the Synchronize Metadata page. You can add the Project IDs when you synchronize Google Dataplex Catalog.

    The following rules apply when you add Project IDs:
    • If you enter a value in this field and do not add Project IDs on the Synchronize Metadata page, the capability will search in these projects in this field when you synchronize the capability.
    • If you leave this field empty and do not add project IDs on the Synchronize Metadata page, the capability will search in the projects that you entered in the GCP Service Account field in the GCP connection.
    • Do not enter a value in this field and also add Project IDs on the Synchronize Metadata page; otherwise, the synchronization will end with an error when you synchronize the capability.

    No

    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the synchronization.

    This checkbox is not selected by default.

    No

    (Deprecated) Filters and Domain Mapping
    Important 

    This field is deprecated. Define any mappings in the integration configuration.

    No

    Extensible Properties Mapping
    Important 

    This field does not apply if you use the Google Dataplex Catalog ingestion. Define any mappings in the integration configuration.

    No

    Advanced Configuration

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This field is ignored when you integrate metadata from the Google Dataplex Catalog.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This field is ignored when you integrate metadata from the Google Dataplex Catalog.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

  5. Click Create.
    The capability is added to the Edge site.
    The fields become read-only.

More information

ADLS integration

Catalog Data Classification

Catalog JDBC ingestion

JDBC Profiling

Catalog JDBC Sampling

S3 synchronization

GCS synchronization

Databricks Unity Catalog integration

DQ Connector

Technical lineage via Edge

Protect for AWS Lake Formation

Protect for BigQuery

Protect for Databricks

Protect for Snowflake

Azure ML for Azure ML

AWS SageMaker AI for AWS SageMaker AI

AWS Bedrock AI for AWS Bedrock AI

SAP AI Core for SAP AI Core

SAP Datasphere integration

Google Dataplex integration

Google Dataplex Catalog integration