Prepare the lineage harvester configuration file for Power BI

You have to prepare a technical lineage configuration file and run the lineage harvester to retrieve metadata from Power BI and send it to the Collibra Data Lineage service to be scanned, processed and analyzed.

Example 

Before you begin

Requirements and permissions

  • A global role with the following global permissions:
    • Catalog, for example Catalog Author
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permissions on the community level in which you created the BI Data Catalog domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add

Steps

  1. Open the lineage-harvester.conf file that was created when you installed the lineage harvester, and enter the values for each property.
    Tip Watch a video on how to do this:
    PropertiesDescription
    general

    This section describes the necessary connection information.

    techlin

    This section contains information that is necessary to connect to the Collibra Data Lineage service instance.

    Warning This applies only to US government customers.

    url

    The URL of the Collibra Data Lineage service instance.“url”: “https://techlin-gov.collibra.com”

    Warning This applies only to US government customers.

    userKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This applies only to US government customers.

    catalog

    This section contains information that is necessary to connect to Data Catalog.

    url

    The URL of your Collibra environment.

    Note You can only enter the public URL of your Collibra DGC environment. Other URLs are not accepted.

    username

    The username that you use to sign in to Collibra.

    useCollibraSystemName

    Indicates whether or not you want to use the system or server name of a data source to match to the System asset in Data Catalog during automatic stitching. This is useful when you have multiple databases with the same name.

    By default, the useCollibraSystemName property is set to false. If you want to use it, set it to true.

    Important 
    • If you set this property to true, the lineage harvester reads the value of the collibraSystemName property in your Power BI <source ID> configuration file.
    • If you set the useCollibraSystemName property to false, the lineage harvester ignores the collibraSystemName property in the Power BI <source-ID> configuration file.
    sources

    This section describes the data sources for which you want to create the technical lineage. You have to create a configuration section for each data source.

    Note You can add multiple data sources to the same configuration file.

    scope

    Optional property that is intended only for customers with a different scope, such as Chinese tenants.

    Example “scope” : “https://analysis.chinacloudapi.cn/powerbi/api/.default”

    Important If you are a US government or national cloud Power BI customer, you must include and specify values for both this property and the apiUrl property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    apiUrl

    The API URL of your Power BI service.

    The default value is https://api.powerbi.com.

    Important This property is only relevant for US government or national cloud Power BI customers, in which case you must include and specify values for both this property and the scope property. For complete information, consult Microsoft's documentation on Power BI for US government customers.

    type
    The kind of data source. In this case, the value has to be PowerBI.
    id

    The unique ID to identify the Power BI service metadata that was uploaded to the Collibra Data Lineage service.

    Warning In the sources section of your lineage harvester configuration file, you can only specify one id property per Power BI service. If you have multiple id properties for a single Power BI service, ingestion will fail. If you have multiple id properties in the configuration file, it means you intend to ingest from multiple unique Power BI services.

    tenantDomain

    The Power BI tenant domain is the domain associated with the Microsoft Azure tenant.

    This domain is either a default domain or a custom domain. You can specify this property with the URL, such as collibrapowerbi.onmicrosoft.com or tenant ID, such as e**b****-****-****-****-1b**d****4663.

    Note Usually, you can find a list of Power BI tenant or server domains in your Azure Active Directory or in the top right menu.

    loginFlow

    This section describes the authentication information for accessing your Power BI metadata.

    The lineage harvester supports two authentication methods: service principal, and username and password. For complete information on your authentication options, see Authentication.

    type

    This depends on the authentication method you use.

    • If you use service principle: The value should be ServicePrincipal.
    • If you use username and password: The value should be ResourceOwnerPasswordCredentials.
    applicationId
    The unique ID of the Microsoft Azure Application (client) ID.
    username

    The email address of your Azure Active Directory user.

    Tip This property only applies if you are using the username and password authentication method.

    domainId
    The reference ID of the domain in Collibra in which you want to ingest Power BI metadata.
    useHttp1
    Optional property to use HTTP1 streams, in case file-size limitations are resulting in timeout errors while using the default HTTP2 streams.
    deleteRawMetadataAfterProcessing

    The lineage harvester harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance, for processing.

    You can use this optional property to specify whether or not the raw metadata should be deleted from Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    The default value is false.

    If the property is set to true, the raw source metadata is deleted after processing. If set to false, it is stored in the Collibra infrastructure.

    Note Setting this property to true can negatively impact performance.

  2. Save the configuration file.

What's next?

Prepare the Power BI <source ID> configuration file.