Prepare Power BI <source ID> configuration file
The lineage harvester uses a lineage harvester configuration file to collect the Power BI data objects. It then sends the metadata to the Collibra Data Lineage server. However, if the useCollibraSystemName property in the lineage harvester configuration file is set to true, you also have to provide a <source ID> configuration file that defines the system name of databases in Power BI.
Collibra Data Lineage uses the system names to match the structure of databases in Power BI to assets in Data Catalog.
- You can also include a filters section in your <source ID> configuration file, to specify the Power BI workspaces from which you want to ingest metadata.
- The name "<source ID>" refers to the value of the
sourceIdproperty in the lineage harvester configuration file.
Steps
- Create a new JSON file in the lineage harvester config folder.
- Give the JSON file the same name as the value of the
sourceIdproperty in the lineage harvester configuration file.Example The value of thesourceIdproperty in the lineage harvester configuration file ispower-bi-source-1. Therefore, the name of your JSON file should be power-bi-source-1.conf.Important Your JSON file must have the file extension .conf. - For each database in Power BI, add the following content to the JSON file:
Property
Description Mandatory?
found_dbname=<database name>;found_hostname=<server name>;found_schema=<schema name>
The database information of supported data sources in Power BI that is typically collected by the lineage harvester. It describes on which server a database is running (
found_hostname), what the name of the database is (found_dbname), and optionally, what the name of the schema is (found_schema).TipYou can use wildcards to capture multiple connection string combinations:
Show me the supported wildcardsPattern Description * Matches everything. ? Matches any single character. [seq] Matches any character in "seq". [!seq] Matches any character not in "seq". Yes
dbnameThe name of the database of a supported data source in Power BI. No
schemaThe name of the default schema of a supported data source in Power BI.
If the lineage harvester fails to find a specific schema, it uses the default schema.
No
dialectThe dialect of the supported data source in Power BI.
Click here for a list of dialects of supported data sources in Power BI.- azure, for an Azure SQL Server data source.
- bigquery, for a Google BigQuery data source.
- mssql, for a Microsoft SQL Server data source.
- oracle, for an Oracle data source.
- redshift, for an Amazon Redshift data source.
- snowflake, for a Snowflake data source.
- sybase, for a Sybase data source.
No
collibraSystemNameThe system or server name of a database.
If you don't specify a value for this property, the result will be "DEFAULT".
Warning The value of this property must exactly match the name of your System asset in Collibra.
Important If you are using a <source ID> configuration file for the purpose of providing the true system name of an ODBC database in Power BI, you are not required to:- Set the
useCollibraSystemNameproperty in the lineage harvester configuration file totrue. - Specify a Collibra system name in the <source ID> configuration file.
useCollibraSystemNameproperty is set totruein the lineage harvester configuration file, then you must specify a Collibra system name in the <source ID> configuration file.Yes
(unless you are using a <source ID> file to provide the true system names of ODBC databases in Power BI.)
This section allows you to specify the Power BI workspaces from which you want to ingest metadata.
Warning If you don't want to specify the Power BI workspaces from which to ingest, you must completely remove this filters section.
Note The filters work as "workspace AND workspace AND capacity AND capacity", meaning that if you specify a capacity, all of the workspaces in that capacity are also ingested.
TipYou can use wildcards to capture multiple connection string combinations:
Show me the supported wildcardsPattern Description * Matches everything. ? Matches any single character. [seq] Matches any character in "seq". [!seq] Matches any character not in "seq". No
The unique resource ID of the domain (or domains), in Collibra Data Intelligence Cloud, in which you want to ingest the Power BI assets.
Tip You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.
Yes
descriptionAny description, as you see fit.
Yes
workspaceNamesThe names of Power BI workspaces from which you want to ingest metadata.
Important Any meta-characters in the name of a workspace must be enclosed in square brackets "[ ]". For example, a workspace with the name "Sale and Marketing [automobiles]" should be formatted as follows:
Sale and Marketing [[]automobiles[]]No
workspaceIdsThe IDs of Power BI workspaces from which you want to ingest metadata.
No capacityNamesThe names of capacities on which you want to filter.
No capacityIdsThe IDs of capacities on which you want to filter.
Warning Any letters in a capacity ID must be in upper case.
No See an example.
{ "found_dbname=databasename1;found_hostname=*;found_schema=schema1": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql", "collibraSystemName": "mssql-system-name" }, "found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": { "dbname": "oracle-database-name", "schema": "oracle-schema-name", "dialect": "oracle", "collibraSystemName": "oracle-system-name" }, "filters":[ { "domainId": "<domain-ref-id>", "description": "FirstFilter", "workspaceNames": ["workspace1", "workspace2"], "workspaceIds": ["id3","id4"], "capacityNames": ["capacity1","capacity2"] }, { "domainId": "<domain-ref-id>", "description": "SecondFilter", "workspaceNames": ["workspace3", "workspace4"], "capacityIds": ["id1","id2"] } ] }Tip Click
to copy the example to your clipboard. - Save the <source ID> configuration file.