Create a technical lineage via Edge

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

This topic provides an overview of the necessary steps to create a technical lineage via Edge.

You can also use the Collibra Catalog Cloud Ingestions API to create or update a technical lineage capability and start or schedule a synchronization to create a technical lineage. For more information about using APIs, go to Collibra Developer Portal.

To view the steps to create technical lineage for your data source, select the data source and connection type, if applicable. For a listed of supported data sources and their corresponding connection types, go to Supported data sources for technical lineage.

Tip 

Select a data source and the connection type if needed to see the related information.

Currently, the information is shown for:

Available vaults

Tip 

You can use a vault to add your data source information to your Edge site connection.

None
AWS Secrets Manager
Azure Key Vault
CyberArk Vault
Google Secret Manager
HashiCorp Vault
 
Important Collibra Data Lineage support for Databricks Unity Catalog leverages the system tables feature in Databricks Unity Catalog. The system tables feature is in Public Preview. For details, go to Databricks Previews support & details in Databricks documentation.

Before you begin

  • This feature is available only in the latest UI.
  • Use Collibra Data Intelligence Platform 2024.07 or newer
  • Use Collibra Data Intelligence Platform 2024.02 or newer
  • Use Collibra Data Intelligence Platform 2023.03 or newer.
  • Use Collibra Data Intelligence Platform 2023.08 or newer
  • Create an Edge site in Collibra Data Intelligence Platform. Ensure that you use Edge 2024.02 or newer.
  • Install an Edge site.
  • Integrate Google Dataplex or register Google BigQuery databases by using the BigQuery JDBC connector. For details, go to Ways to work with Google Cloud Platform (GCP).
  • Register the data source via Edge. Before you register the data source, ensure that you add the Catalog JDBC ingestion capability, so that CollibraData Lineage can stitch the data objects in your technical lineage to the assets in Data Catalog.
  • Integrate Databricks Unity Catalog or register a Databricks file system.
  • Review the Supported transformation details topic to understand the lineage information Collibra Data Lineage ingests from Databricks Unity Catalog.

Requirements and permissions

The following requirements and permissions are needed for the technical lineage process. Additional, Edge-related roles and resources, are mentioned in each of the specific steps.

  • A global role with the following global permissions:
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permissions on the community level in which you created the domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • As a technical lineage user, ensure that your Catalog Author global role has the following global permissions. With these permissions, Collibra Data Lineage can process the lineage and synchronize the results to Data Catalog to create technical lineage.
    • Catalog > Advanced Data Type > Add
    • Catalog > Advanced Data Type > Remove
    • Catalog > Advanced Data Type > Update
    • Catalog > Technical lineage
  • As a Data Catalog user, ensure that your Edge integration engineer global role has the following global permissions. With these permissions, you can create connections and capabilities on Edge, configure the integration, and synchronize the integration.
    • Manage connections and capabilities
    • View Edge connections and capabilities
  • As a Databricks Unity Catalog user, ensure that you have the following permissions in Databricks. The access token of this user must be specified in the Databricks connection so that Collibra Data Lineage can access the system tables (Public Preview) after connecting to Databricks Unity Catalog.
    • Enable the lineage system tables.
    • Have the USE CATALOG privilege to the system catalog.
    • USE_SCHEMA, and SELECT privileges to the system.access schema.
    • For details, go to Enable system tables and Grant access to system tables in Databricks documentation.

      If you do not have the right accesses, the Could not get column lineage data error occurs when you synchronize the Technical Lineage for Databricks Unity Catalog capability. Contact Databricks support if you encounter issues on getting access to the system tables.

  • Necessary permissions to all database objects that technical lineage via Edge accesses.
  • Tip Some data sources require specific permissions. For the data source selected above:
    You need read access on the SYS schema.
    You need read access on the SYS schema and the View Definition Permission in your SQL Server.
    You need read access on information_schema:
    • bigquery.datasets.get
    • bigquery.tables.get
    • bigquery.tables.list
    • bigquery.jobs.create
    • resourcemanager.projects.get
    • bigquery.routines.get
    • bigquery.routines.list
    • bigquery.readsessions.create
    • bigquery.readsessions.getData
    • GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
    • The role of the user that you specify in the username property in lineage harvester configuration file must be the owner of the views in PostgreSQL.
    The role of the user must be the owner of the views in PostgreSQL, and the username of the user must be specified in the JDBC connection that you use to access PostgreSQL.
    You need read access on information_schema. Only views that you own are processed.
    Ensure that your service account token has the Read-Only permission.
    Ensure that you have the permission to copy the target/ directory, which is generated by running the dbt compile command, to a Shared Storage connection folder. For more information about the Shared Storage connection folder, go to Step 1 Create a Shared Storage connection.
    • SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
    • Read access to the SYS schema or the tables in the schema.

    You need Monitoring role permissions.

    To create technical lineage from calculated views in an SAP HANA Classic on-premises data source, you also need the following permissions: 

    • SELECT on the following views:
      • _SYS_REPO.ACTIVE_OBJECT
      • _SYS_REPO.ACTIVE_OBJECTCROSSREF
      • SYS.OBJECT_DEPENDENCIES
    • The CATALOG READ system privilege
    A role with the LOGIN option.
    SELECT WITH GRANT OPTION, at Table level.
    CONNECT ON DATABASE
    The following permissions are required, regardless of the ingestion mode: SQL or SQL-API.
    • Ensure that the Snowflake user has the appropriate allowed host list. For details, go to Allowing Hostnames in Snowflake documentation.
    • You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the OBJECT_VIEWER database role on the shared database to the user. The username of the user must be specified in the JDBC connection that you use to access Snowflake.
    You need read access on the DBC.
    You need read access to the following dictionary views:
    • all_tab_cols
    • all_col_comments
    • all_objects
    • ALL_DB_LINKS
    • all_mviews
    • all_source
    • all_synonyms
    • all_views
    Note By default, the lineage harvester queries the all_source table to retrieve Package bodies. However, this requires the EXECUTE privilege. As an alternative, you can direct the harvester to query the dba_source table, which requires the SELECT_CATALOG_ROLE role. To do so, you need to:
    • If via Edge: Replace all_source by dba_source in the Other Queries field in your Edge capability.
    • If via the CLI lineage harvester: Replace all_source by dba_source in the file ./sql/oracle/queries.sql, which is included in the ZIP file when you download the lineage harvester.
    You need read access on definition_schema.
    • Your user role must have privileges to export assets.
    • You must have read permission on all assets that you want to export.
    • You have at least a Matillion Enterprise license.
    • You have generated the Matillion certificate. Ensure that the certificate is signed by a certificate authority. Self-signed certificate is not supported when you create technical lineage via Edge.
    • You have added the Matillion certificate to a Java truststore. For more information about adding a certificate to a Java truststore, go to Add a Certificate to a Truststore Using Keytool.
    • As a technical lineage user, ensure that your Catalog Author global role has the following global permissions. With these permissions, CollibraData Lineage can process the lineage and synchronize the results to Data Catalog to create technical lineage.
      • Catalog > Advanced Data Type > Add
      • Catalog > Advanced Data Type > Remove
      • Catalog > Advanced Data Type > Update
      • Catalog > Technical lineage
    • As a Data Catalog user, ensure that your Edge integration engineer global role has the following global permissions. With these permissions, you can create connections and capabilities on Edge, configure the integration, and synchronize the integration.
      • Manage connections and capabilities
      • View Edge connections and capabilities
    • As a Google Dataplex user, ensure that you have the following access. Use the service account of this user when you create a GCP connection so that CollibraData Lineage can harvest lineage from Dataplex
      • Enable the Data Lineage API in Dataplex for the projects that you want to harvest lineage from. For more information, go to Data Lineage API in Google Cloud documentation.
      • The Data Lineage Viewer role.
      • The BigQuery Admin role if you want Collibra Data Lineage to collect lineage not only from stored procedures that you created but also from those that other Dataplex users created.
      • The bigquery.jobs.get permission.
        For more information, go to IAM basic and predefined roles reference in the Google Cloud documentation.
      • When you synchronize technical lineage for Google Dataplex, you can add Project IDs that you want to harvest lineage from. If you want to have Project IDs available for selection when you add Project IDs, ensure that the service account has the resourcemanager.projects.get permission to GCP Projects where Dataplex is enabled. If the service account does not have this permission, you can enter the Project IDs manually on the Synchronization configuration page.
    • You need the following Admin API permissions:
      1. The first call we make to MicroStrategy is to authenticate. We connect to:
        <MSTR URL>:<Port>/MicroStrategyLibrary/api-docs/ and use GET api/auth/login.
        For complete information, see the MicroStrategy documentation.
        If this API call can be made successfully, you can ingest the metadata.
      2. The same connection:
        <MSTR URL>:<Port>/MicroStrategyLibrary/api-docs/, but with GET api/model/tables/<tableId>.
        For complete information, see the MicroStrategy documentation.
        This endpoint is needed to create lineage and stitching.
    • You need permissions to access the library server.
    • The lineage harvester uses port 443. If the port is not open, you also need permissions to access the repository.
    • You have to configure the MicroStrategy Modeling Service. For complete information, see the MicroStrategy documentation.
    Important  Before you start the Power BI integration process, you have to perform a number of tasks in Power BI and Microsoft Azure. These tasks, which are performed outside of Collibra, are needed to enable the lineage harvester to reach your Power BI application and collect its metadata. For complete information, go to Set up Power BI.

    Collibra Data Lineage supports:

    • Power BI on the Microsoft Power Platform.
    • Power BI on Fabric.
    The configuration requirements and the integration are the same, regardless of your setup.

    Important 
    Before you start the SAP Analytics Cloud integration process, you have to perform a number of tasks in SAP. For complete information, go to Set up SAP Analytics Cloud.
    Important 
    Before you start the Tableau integration process, you have to perform a number of tasks in Tableau. For complete information, go to the following topics:
    Important Before you start the Looker integration process, you need to set up Looker.
    Warning 

    Collibra Data Lineage uses the API 4.0 endpoints GET /queries/<query_id> and GET /running_queries. Due to a security update by Looker, the behavior of these endpoints has changed. Therefore, you must now:

    • Select the "Disallow Numeric Query IDs" option in Looker.
    • Ensure that your Looker user has the Admin role. The Admin role has the Administer permission, which is not available in the custom permission set.

    For complete information, see the Looker Query ID API Patch Notice.

    Important 

    You need the following roles, with user access to the server from which you want to ingest:

    • A system-level role that is at least a System user role.
    • An item-level role that is at least a Content Manager role.

    We recommend that you use SQL Server 2019 Reporting Services or newer. We can't guarantee that older versions will work.

Steps

  1. Set up Tableau.
  2. Set up Power BI.
  3. Set up Looker.
  4. Set up SAP Analytics Cloud.
  5. Set up SSRS-PBRS.
  6. Set up MicroStrategy.
  7. Which custom lineage definition option are you using?

What's next?

Important 
  • CollibraData Lineage visualizes lineage for Google Dataplex down to the table level. To view the technical lineage for Google Dataplex, ensure that you select Objects in the toolbar of your technical lineage graph.
  • Currently, stitching is not supported for the table level lineage. This support will be added in a future release with the addition of column level lineage support. When this support is available, stitching will work, regardless of whether you integrated Google Dataplex or registered Google BigQuery databases by using the BigQuery JDBC connector.
The following example shows a technical lineage for Google Dataplex.

View the technical lineage.

Important 
  • Databricks Unity Catalog does not provide source code for each transformation. Therefore, no source code is shown in the source code pane in the technical lineage graph.
  • Collibra Data Lineage ingests lineage for Databases, Schemas, Tables, and Columns, but does not ingest any other assets such as Notebooks or Workflows.

For more information, go to Supported transformation details.