Prepare the Power BI configuration file (deprecated)

You create a configuration file for the Power BI metadata that you want to ingest. This configuration file is used by the Power BI harvester to retrieve metadata from Power BI and send it to Collibra to be scanned, processed and analyzed.

Tip 

The content in this topic differs according to the authentication method.

Prerequisites

  • You have access to the Power BI harvester on the Downloads page.
  • You have completed all prerequisite tasks.
    • You have registered Power BI in Microsoft Azure.
    • You have a user with Power BI administrator rights in Microsoft Azure.
    • The user with Power BI administrator rights in Microsoft Azure is part of a security group and has the Contributor role in the Power BI workspaces.
    • You have enabled the service principal option in the Power BI Admin portal.
    • The service principal is part of a security group and has the Contributor role in the Power BI workspaces.
  • You have a dedicated domain to ingest the Power BI assets.
  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have a global role with the Technical lineage global permission.
  • You have a global role with the Data Stewardship Manager global permission.
  • A resource role with the following resource permission on the community level in which you created the BI Data Catalog domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • Your environment meets the system requirements to run the Power BI harvester and the lineage harvester.

Tip For a full ingestion, we highly recommend to have a Power BI Premium subscription.

Steps

  1. In the Power BI harvester folder, open the empty configuration file.
  2. Enter the values for each property.
    PropertiesDescription

    Mandatory

    powerbi

    This section contains information that is necessary to connect to your Power BI application.

    Yes

    tenantDomain

    The Power BI tenant domain is the domain associated with the Microsoft Azure tenant.

    This domain is either a default domain or a custom domain. For example, collibrapowerbi.onmicrosoft.com.

    Note Usually, you can find a list of Power BI tenant or server domains in your Azure Active Directory or in the top right menu.

    Yes

    applicationId
    The unique ID of the Microsoft Azure Application (client) ID.

    Yes

    userName

    The username that you use to access Power BI.

    The username is an email address.

    You do not need this property. Leave it empty.

    Tip If you cannot store your username in the configuration file for security or other reasons, delete this field and provide the username via command line or when prompted by the Power BI harvester.

    No

    password

    The passwordclient secret key that you use to access Power BI. Specifically, this is the password that you use when you sign in to your Power BI application. Specifically, this is the Power BI application client secret key.

    In case the password is an empty string, leave this field empty.

    Tip If you cannot store your password in the configuration file for security or other reasons, delete this field and provide the password via command line or when prompted by the Power BI harvester.

    No

    workspaceFilter

    An option to exclude specific Power BI workspaces from the ingestion process. You can add multiple workspaces. For example "workspace1, workspace2, workspace3".

    If the workspaceFilter field remains empty or is deleted from the configuration file, all accessible Power BI workspaces are processed and ingested.

    Tip For more information about the query options to filter Power BI workspaces, see the Microsoft documentation. Be aware that the "IN" operator is currently not supported.

    Important If you use Power BI harvester older than version 1.1.0.0, the workspaceFilter property is named groupFilter. This change is backward compatible. However, if you download a new Power BI harvester, we highly recommend to update your configuration file.

    No

    techlin

    This section contains information to identify your Power BI metadata on the Collibra Data Lineage server.

    Yes

    sourceId

    The unique ID of your Power BI metadata.

    The lineage harvester uses this ID to locate the Power BI metadata on the Collibra Data Lineage server.

    Tip This value can be anything as long as it is a unique, human readable ID and the same as the value of the Id property in the lineage harvester configuration file. The Power BI and lineage harvesters use the ID to identify a batch of data on the Collibra Data Lineage server.

    Yes

    catalog

    This section contains information that is necessary to connect to Data Catalog.

    Yes

    domainId

    The unique resource ID of the domain in Collibra Data Intelligence Cloud in which you want to ingest the Power BI assets.

    Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

    Yes

    url

    The URL of your Collibra Data Intelligence Cloud instance.

    Note You can only enter the public URL of your Collibra Data Intelligence Cloud environment. Other URLs will not be accepted.

    Yes

    userName

    The username that you use to sign in to Collibra.

    Tip If you cannot store your username in the configuration file for security or other reasons, delete this field and provide the username via command line or when prompted by the Power BI harvester.

    No

    password

    The password that you use to sign in to Collibra.

    Tip If you cannot store your passwordin the configuration file for security or other reasons, delete this field and provide the password via command line or when prompted by the Power BI harvester.

    No

    useCollibraSystemName

    Indication whether you want to use the system or server name of a data source to match to the System asset in Data Catalog during automatic stitching. This is useful when you have multiple databases with the same name.

    By default, the useCollibraSystemName property is set to false. If you want to use it, set it to true.

    • If you set the useCollibraSystemName property to true, the Power BI harvester reads the <source-ID> configuration file and takes the value in the collibraSystemName property into account.
    • If you set the useCollibraSystemName property to false, the Power BI harvester ignores the collibraSystemName property in the <source-ID> configuration file.

    Warning Unless you have multiple databases with the same name, we highly recommend that you keep the default value.

    Yes

  3. Save the configuration file.
  4. Trigger the Power BI harvester to upload the Power BI metadata:
    • Run the following command line if your configuration file is in its default location: .\powerbi-harvester.bat
    • Launch the path to the Power BI configuration file if you moved the configuration file to a different location:.\bin\powerbi-harvester.exe .\config\powerbi-harvester.conf

    Note We highly recommend that you run the Power BI harvester via command line. This enables you to follow the metadata upload and see possible errors that may occur.

  5. If the Power BI harvester prompts for credentials, enter them or use command line options to provide them.
    Note Credentials provided via command line overwrite the credentials in the configuration file.
  6. The Power BI harvester collects the Power BI metadata and sends it to the Collibra Data Lineage server. Collibra scans and analyzes the metadata.

Tip If you want to ingest multiple Power BI applications, create a new configuration file using a unique ID and repeat these steps. In the lineage harvester configuration file, you can add multiple Power BI sections that each refer to a different ID.

Note If you are not able to run the Power BI harvester, go to the troubleshooting section to resolve your issues.

Example

This example shows a configuration file with the username / password authentication method.

Tip Click Copy code to copy the example to your clipboard.

Copy code

{
 "powerbi": {
  "tenantDomain": "<organization.onmicrosoft.com>",
  "applicationId": "<microsoft-azure-id>",
  "userName": "<your-power-bi-email-address>",
  "password": "<password-to-access-power-bi>",
  "workspaceFilter": "workspace-name1", "workspace-name2"
 },
 "techlin": {
  "sourceId" : "<unique-power-bi-ID>"
  },
 "catalog": {
  "domainId": "<your-catalog-domain>",
  "url": "<url-to-collibra>",
  "userName": "<my-collibra-username>",
  "password": "<my-collibra-password>"
 },
 "useCollibraSystemName": false
}

This example shows a configuration file with the service principal authentication method.

Tip Click Copy code to copy the example to your clipboard.

Copy code

{
 "powerbi": {
  "tenantDomain": "<organization.onmicrosoft.com>",
  "applicationId": "<microsoft-azure-id>",
  "userName": "",
  "password": "<secret-key>",
  "workspaceFilter": "<filter-workspace-name>"
 },
 "techlin": {
  "sourceId" : "<unique-power-bi-ID>"
  },
 "catalog": {
  "domainId": "<your-catalog-domain>",
  "url": "<url-to-collibra>",
  "userName": "<my-collibra-username>",
  "password": "<my-collibra-password>"
 },
 "useCollibraSystemName": false
}

Warning  If you are ingesting a large amount of Power BI data and you use the workspace filter (workspaceFilter), the Power BI harvester might time out, resulting in an Internal Server Error. If you get this error, we highly advise you to not use the workspace filter. See the known issues in Power BI ingestion limitations.

What's next?

You can now download and install the lineage harvester and prepare the lineage harvester configuration file. The lineage harvester triggers Collibra to create new Power BI assets, stitch them and show a technical lineage for them.

To refresh the Power BI metadata in Data Catalog, you can run the Power BI harvester and lineage harvester again or schedule jobs to run them automatically.