Power BI workspace filtering
As a BI Admin, use workspace filtering to manage which Power BI metadata Collibra ingests. This process allows you to include or exclude specific reports and semantic models, which ensures your Data Catalog remains focused on high-value assets.
Regardless of your filtering method, meaning v1 or v2, you have to either:
- If you use technical lineage via Edge, enter the filter configuration, as JSON code, in the Source configuration field in your technical lineage Edge capacity. If, previously, you used the CLI lineage harvester and you have a prepared <source ID> configuration file, you can copy the JSON and paste it in the Source configuration field.
- If you use the CLI lineage harvester (deprecated), use a Power BI <source ID>_filter configuration file to configure your filters.
Filtering v2
As of the release of lineage harvester version 2025.4.1 — the CLI harvester version of which is deprecated, but available via the Collibra Downloads page, or via Edge version 2025.02.70 — you can benefit from filtering v2. Filtering v2 entails two primary benefits:
- You can filter on reports that are published in apps in Power BI.
- In addition to filtering on capacities and workspaces, you can filter on dashboards and reports.
The original Power BI filtering configuration — which we refer to as "filtering v1" — is still supported.
For complete information on how to configure filtering v2 (and filtering v1), go to Prepare a <source ID> configuration file.
Important considerations
- Depending on the authentication type, you must have specific roles and permissions to access the metadata in the Power BI workspaces.
- You can only fully ingest new Power BI workspaces. This means that classic workspaces and My Workspace in Power BI are not supported.
-
To ingest Power BI dataflows:
- You need access to the Power BI environment in which the data flow is stored.
- The semantic model in the data flow must exist in a premium workspace.
- Workspace filtering takes precedence over capacity filtering, meaning workspaces are filtered first. If there is no explicit exclusion of capacities containing workspaces, all capacities containing workspaces are ingested. Filtering of reports and dashboards is subordinate to workspace filtering, meaning that to include reports and dashboards from a certain workspace, that workspace has be ingested as well. Reports and dashboards from a single workspace cannot be ingested in different domains.
- When processing workspace filters, processing stops after the first matching filter. To illustrate, in the following example configuration, the first filter dictates that all workspaces (courtesy of the wildcard
*) are ingested inDOMAIN_A. Therefore, the second filter, which dictates thatmy_workspaceis ingestion inDOMAIN_B, is not considered.Copy[
{
"domainId": "DOMAIN_A",
"workspaceNames": [
"*"
]
},
{
"domainId": "DOMAIN_B",
"workspaceNames": [
"my_workspace"
]
}
]
Filtering Power BI workspaces
By default, the lineage harvester accesses the metadata of all Power BI workspaces. If you don't use filtering, the metadata of all workspaces is uploaded to the Collibra Data Lineage service instance and ingested in Data Catalog. Filtering allows you to process and ingest only the metadata that matters most to you.
Inclusion and exclusion filters
| Filtering v2 | Filtering v1 |
|---|---|
|
You can use the following inclusion filters to ingest only the Power BI capacities, workspaces, reports, and dashboards you specify:
You can use the following exclusion filters to ingest all capacities, workspaces, reports, and dashboards except for those you specify:
Wildcards are supported for all inclusion and exclusion filters, for capacities, workspaces, dashboards, and reports. You can combine inclusion and exclusion filters in the same <source ID> configuration file. Show an example
In this example, the metadata from all workspaces is uploaded to the Collibra Data Lineage service instance. Then, the metadata in all of the workspaces in CapacityABC, except for Workspace1, is ingested in Data Catalog. Copy
Assets are ingested in the domain in Collibra with reference ID |
You can use the following inclusion filters to ingest only the Power BI capacities and workspaces you specify:
You can use the following exclusion filters to ingest all workspaces except for those you specify:
Wildcards are supported for the You can combine inclusion and exclusion filters in the same <source ID> configuration file. Show an example
In this example, the metadata from all workspaces is uploaded to the Collibra Data Lineage service instance. Then, the metadata in all of the workspaces in CapacityABC, except for Workspace1, is ingested in Data Catalog. Copy Assets are ingested in the domain in Collibra with reference ID |
Filter on reports that are included in published Power BI apps
If you add a report or dashboard to an app in Power BI, what actually happens is that a copy of the original report or dashboard is created in the app. The original report or dashboard still exists outside of the app.
When integrating Power BI, by default Collibra Data Lineage:
- Harvests the original report and the in-app version of the report.
-
Ingests both of them, meaning it creates 2 Power BI Report assets in Data Catalog:
-
One for the original report, for example:
report-abc - One for the in-app version. The prefix "[App]" is used to identify the in-app report, for example:
[App] report-abc.
-
One for the original report, for example:
There are, however, two filter keywords that you can use alone or in combination, to modify how Collibra Data Lineage handles in-app reports.
| Keyword | Description |
|---|---|
createAppReports
|
Use this keyword to specify that you don't want to ingest the in-app versions of reports.
If
If If you don't use the |
includedInApp
|
Use this keyword to specify how you want Collibra Data Lineage to address reports that are included in published Power BI apps. If
If If you don't use the
|
Let's say that you have 8 reports in Power BI:
- 5 reports are not included in an app.
- 3 reports are included in a app, which means there are also 3 in-app versions of these reports.
The following table shows which of these reports are ingested, based on how you use the 2 keywords.
"createAppReports": true (or not used) |
"createAppReports": false
|
|
|---|---|---|
includedInApp is not used |
11 reports are ingested:
|
8 reports are ingested:
The 3 in-app versions of these reports are not created or ingested. |
"includedInApp": true
|
6 reports are ingested:
|
3 reports are ingested:
The 3 in-app versions of these reports are not created or ingested. |
"includedInApp": false
|
5 reports are ingested:
|
5 reports are ingested:
|
"filters":[
{
"reportFilter": {
"excludedNames": "*restricted*",
"createAppReports": false,
"includedInApp": true
}
}
]
Where filtering is carried out
The filter properties that you use in your Power BI <source ID> configuration file determine whether filtering is done by the lineage harvester before metadata collection or done on the Collibra Data Lineage service instance. The following table highlights the advantages, limitations and configuration considerations.
| Filtering | Description | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Done by the lineage harvester |
The lineage harvester accesses only the workspaces specified in your <source ID> configuration file, and sends metadata from only those workspaces to the Collibra Data Lineage service instance for processing and ingestion in Data Catalog. Advantages
Limitations
To ensure that This property applies to filtering v2 only. Show an example
In this example, only the specified workspace is harvested, even though Copy
|
||||||||
| Done on the Collibra Data Lineage service instance |
The lineage harvester accesses all workspaces and filtering is carried out only after knowing the names and IDs of all workspaces and capacities. As a result, the raw metadata is accessed by the lineage harvester, but only the filtered metadata is processed on the Collibra Data Lineage service instance and ingested in Data Catalog. Advantages
Limitations
Show an example <source ID> configuration file
|
Note The metadata of inactive and personal workspaces is not harvested or uploaded to the Collibra Data Lineage service instance. An inactive workspace is one for which no reports or dashboards have been viewed in the past 60 days. My workspace is the personal workspace for any Power BI customer to work with their own, personal content.
Filter validation
Filter configurations are validated against the following scenarios:
- Duplicate keywords.
- Unknown or unsupported keywords.
- Contradicting inclusion and exclusion filters.
- Mixed filter v1 and filter v2 keywords.
- A single workspace is mapped to more than one domain. (In this case, only the first filter is considered.)
If validation fails for any of these scenarios, a warning with failure details is shown in an analyze error on the Technical lineage Sources tab page. Critical errors occur only if the <source ID> configuration file is incorrectly formatted or doesn’t contain valid keywords. In such cases, the filter configuration is not processed. If configured inclusion and exclusion filters are contradicting, only the exclusion filter is taken into consideration.
Best practices
For complete information on how to configure filtering v2 (and filtering v1), go to Prepare a <source ID> configuration file.
You can filter on a capacity to ingest the metadata from all workspaces in that capacity. Let's say, for example, that you have 50,000 workspaces but you only want to ingest metadata from the workspaces related to a specific department in your organization. You could specify each of the relevant workspaces in the configuration file, but that would be tedious. Furthermore, if someone in your organization creates a new workspace, it will have to be added to your configuration file. Instead, you can filter on a capacity. Then, when a new workspace is created, ensure that it is added to the relevant capacity and metadata from that workspace is automatically ingested, without having to update the configuration file.
Let’s say that you have three workspaces, each dedicated to a different department in your organization: HR, Finance, and Marketing. You want to ingest the metadata from these three workspaces into three different domains and set the permissions in Collibra so that people can access only the domain for the department to which they belong.
| Example: filtering v2 | Example: filtering v1 |
|---|---|
|
Copy
|
Copy
|
If you use only the workspace IDs inclusion properties in your configuration:
- The lineage harvester (via CLI or Edge) connects only to those workspaces.
- Collibra bypasses the endpoint:
GetModifiedWorkspaces, which lists all of the workspaces in your tenant.
If the GetModifiedWorkspaces endpoint is bypassed, Collibra is unaware of any other workspaces that exist in your tenant. This makes the harvesting job a lot faster and more secure.
If this method is not preferred in your organization, you can instead use another filter property and include the deleteMetadataAfterProcessing property in your configuration file (or select the option in your Edge capacity) and set the value to true. In this case, filtering is done on the Collibra Data Lineage service instance, as described in the table above:
- All of the raw metadata in the capacity is harvested and sent to the Collibra Data Lineage service instance.
- Only the metadata from the workspaces you want to ingest are ingested in Collibra.
- All metadata is permanently deleted from the Collibra Data Lineage service instance.
This approach, however, does not enhance performance.
Workspace states
On Power BI Workspace asset pages, you can include the attribute type State, to show the state of ingested Power BI workspaces, for example Active, Orphaned or Deleted. To do so, you have to edit the global assignment of the Power BI Workspace asset type and assign the attribute type State.
For complete information on Power BI workspaces and possible states, see the Microsoft Power BI documentation.
- Ensure that the attribute type State is assigned to the Power BI Workspace asset type via the global assignment.
- Go to the Global View, and then create an advance filter and filter by the following clauses:
- Asset type equals Power BI Workspace
- Characteristic State equals Active.
Deleted workspaces
If you delete a Power BI workspace, the workspace is maintained for a 90-day grace period, during which a Power BI administrator can restore the workspace. During the grace period, the workspace has the state Deleted. When you ingest Power BI metadata in Data Catalog, this deleted workspace is ingested.
When the grace period elapses, the state of the workspace becomes Removing, for a short time, while it is being permanently removed. The state then becomes Not found. At this point, as the workspace no longer exists in Power BI, the Power BI Workspace asset in Collibra is also be deleted upon the next synchronization.
If a workspace becomes inactive, meaning no reports or dashboards have been viewed in the past 60 days, it is excluded from the ingestion.
Why deleted workspaces are ingested
Let's image that you ingest a Power BI workspace with the Active state and that over time, you add comments, tags and characteristics to the asset in Collibra. Now let's imagine that the workspace is deleted in Power BI and we do not ingest the deleted workspace. In this case, the Power BI Workspace asset in Collibra is deleted upon the next synchronization. But what if the Power BI administrator decides, during the 90-day grace period, to restore the workspace in Power BI? Upon the next synchronization, a new Power BI Workspace asset is created in Collibra, but all of the comments, tags and characteristics that were part of the deleted asset are lost.
By ingesting deleted Power BI workspaces, we safeguard against losing any of the additional information on the Power BI Workspace asset, in case a Power BI administrator decides to restore a workspace during the grace period.