Power BI workspaces

Updated:

Power BI workspaces represent the most used metadata in Power BI. They contain, for example, reports and data sets. If you want a full ingestion, you have to make sure that the lineage harvester can access all metadata in your Power BI workspaces.

Regardless of your filtering method, meaning v1 or v2:

  • You use a Power BI <source ID>_filter configuration file to configure your filters.
  • If you use technical lineage via Edge, you can copy the JSON code from your <source ID> configuration file into the Source configuration field in your technical lineage Edge capacity.

Filtering v2

As of the release of lineage harvester version 2025.4.1 — the CLI harvester version of which is available via the Collibra Downloads page, or via Edge version 2025.02.70 — you can benefit from filtering v2. Filtering v2 entails two primary benefits:

  • In addition to filtering on capacities and workspaces, you can filter on dashboards and reports.
  • You can filter on reports that are published in apps in Power BI.

The original Power BI filtering configuration — which we refer to as "filtering v1" — is still supported.

Important considerations

  • Depending on the authentication type, you must have specific roles and permissions to access the metadata in the Power BI workspaces.
  • You can only fully ingest new Power BI workspaces. This means that classic workspaces and My Workspace in Power BI are not supported.
  • To ingest Power BI dataflows:
    • You need access to the Power BI environment in which the data flow is stored.
    • The data set in the data flow must exist in a premium workspace.
  • Workspace filtering takes precedence over capacity filtering, meaning workspaces are filtered first.

Filtering Power BI workspaces

By default, the lineage harvester accesses the metadata of all Power BI workspaces. If you don't use filtering, the metadata of all workspaces is uploaded to the Collibra Data Lineage service instance and ingested in Data Catalog. Filtering allows you to process and ingest only the metadata that matters most to you.

Inclusion and exclusion filters

Filtering v2 Filtering v1

You can use the following inclusion filters to ingest only the Power BI capacities and workspaces you specify:

  • cpacityFilter
    • includedNames
  • workspaceFilter
    • includedNames

You can use the following exclusion filters to ingest all workspaces except for those you specify:

  • cpacityFilter
    • excludedNames
  • workspaceFilter
    • excludedNames

Wildcards are supported for all inclusion and exclusion filters, for capacities, workspaces, dashboards, and reports.

You can combine inclusion and exclusion filters in the same <source ID> configuration file.

You can use the following inclusion filters to ingest only the Power BI capacities and workspaces you specify:

  • capacityNames
  • capacityIds
  • workspaceNames
  • workspaceIds

You can use the following exclusion filters to ingest all workspaces except for those you specify:

  • excludeWorkspaceNames
  • excludeWorkspaceIds

Wildcards are supported for the capacityNames, workspaceNames and excludeWorkspaceNames properties.

You can combine inclusion and exclusion filters in the same <source ID> configuration file.

Where is filtering done?

The filter properties that you use in your Power BI <source ID> configuration file determine whether filtering is done by the lineage harvester or done on the Collibra Data Lineage service instance. The following table highlights the advantages, limitations and configuration considerations.

Filtering Description

Done by the lineage harvester

The lineage harvester accesses only the workspaces specified in your <source ID> configuration file, and sends metadata from only those workspaces to the Collibra Data Lineage service instance for processing and ingestion in Data Catalog.

Advantages

  • Faster integration testing, as you can filter on a single workspace.
  • Enhanced data security and privacy by excluding workspaces that contain sensitive information. Metadata from workspaces that are filtered out by the lineage harvester is not sent to the Collibra Data Lineage service instance for processing.
  • Improve processing times by excluding workspaces dedicated to, for example, development and testing. This is especially beneficial for organizations with more than 50,000 workspaces.

Limitations

  • For this to work as described, you can only use workspace ID properties in your <source ID> configuration file:
    Filtering v2Filtering v1

    You can only use a workspaceFilter section, with the includedIds property.

    You can only use the workspaceIds property.

  • You cannot use wildcards.
Done on the Collibra Data Lineage service instance

The lineage harvester accesses all workspaces and filtering is carried out only after knowing the names and IDs of all workspaces and capacities. As a result, the raw metadata is accessed by the lineage harvester, but only the filtered metadata is processed on the Collibra Data Lineage service instance and ingested in Data Catalog.

Advantages

  • Greater choice of filtering options
    Filtering v2Filtering v1

    You can use any of the following properties:

    • capacityFilter
      • excludedIds
      • includedNames
      • excludedNames
    • workspaceFilter
      • includedIds
      • excludedIds
      • includedNames
      • excludedNames

    You can use wildcards with any of these properties.

    You can use any of the following properties:

    • capacityNames
    • capacityIds
    • workspaceNames
    • excludeWorkspaceNames
    • excludeWorkspaceIds
    You can use wildcards with the following properties:
    • capacityNames
    • workspaceNames
    • excludeWorkspaceNames

Limitations

  • Longer processing times, especially if you have many thousands of workspaces.
  • Although you can limit which workspaces are processed and ingested, you can't limit which workspaces are uploaded to the Collibra Data Lineage service instance. The raw metadata from all workspaces is uploaded.
    Tip You can use the deleteRawMetadataAfterProcessing property in your lineage harvester configuration file, to automatically delete the uploaded raw metadata that you don't want to ingest in Data Catalog.

Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.

Best practices

Workspace states

On Power BI Workspace asset pages, you can include the attribute type State, to show the state of ingested Power BI workspaces, for example Active, Orphaned or Deleted. To do so, you have to edit the global assignment of the Power BI Workspace asset type and assign the attribute type State.

For complete information on Power BI workspaces and possible states, see the Microsoft Power BI documentation.

Tip If you only want to see Power BI workspaces that have the state Active:
  1. Ensure that the attribute type State is assigned to the Power BI Workspace asset type via the global assignment.
  2. Go to the Global view, and then create an advance filter and filter by the following clauses:
    1. Asset type equals Power BI Workspace
    2. Characteristic State equals Active.

Deleted workspaces

If you delete a Power BI workspace, the workspace is maintained for a 90-day grace period, during which a Power BI administrator can restore the workspace. During the grace period, the workspace has the state Deleted. When you ingest Power BI metadata in Data Catalog, this deleted workspace is ingested.

When the grace period elapses, the state of the workspace becomes Removing, for a short time, while it is being permanently removed. The state then becomes Not found. At this point, as the workspace no longer exists in Power BI, the Power BI Workspace asset in Collibra will also be deleted upon the next synchronization.

If a workspace becomes inactive, meaning no reports or dashboards have been viewed in the past 60 days, it is excluded from the ingestion.

Why are deleted workspaces ingested?

Let's image that you ingest a Power BI workspace with the Active state and that over time, you add comments, tags and characteristics to the asset in Collibra. Now let's imagine that the workspace is deleted in Power BI and we do not ingest the deleted workspace. In this case, the Power BI Workspace asset in Collibra is deleted upon the next synchronization. But what if the Power BI administrator decides, during the 90-day grace period, to restore the workspace in Power BI? Upon the next synchronization, a new Power BI Workspace asset is created in Collibra, but all of the comments, tags and characteristics that were part of the deleted asset are lost.

By ingesting deleted Power BI workspaces, we safeguard against losing any of the additional information on the Power BI Workspace asset, in case a Power BI administrator decides to restore a workspace during the grace period.