Create a technical lineage via the lineage harvester
This topic describes the general steps on how to use the lineage harvester to create a technical lineage.
|
Select a data source, to show Currently, information is shown for: |
Amazon Redshift
Azure Data Factory
Azure SQL Data Warehouse
Azure SQL Server
Azure Synapse Analytics
DB2
Google BigQuery
Greenplum
HiveQL
IBM InfoSphere DataStage
Informatica Intelligent Cloud Services
Informatica PowerCenter
Looker
Matillion
MicroStrategy
Oracle
PostgreSQL
Power BI
MySQL
Netezza
SAP Hana
Snowflake
Spark SQL
Downloaded SQL files
SQL Server
SQL Server Integration Services
SSRS-PBRS
Sybase
Tableau
Teradata
Custom technical lineage
|
-
Ensure that you meet the Azure Data Factory-specific permissions described in Set up Azure Data Factory.
-
You need read access on information_schema. Only views that you own are processed.
-
You need read access on the SYS schema.
-
You need read access on information_schema:
- bigquery.datasets.get
- bigquery.tables.get
- bigquery.tables.list
- bigquery.jobs.create
- bigquery.routines.get
- bigquery.routines.list
-
SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
-
You need Monitoring role permissions.
-
A role with the LOGIN option.
-
SELECT WITH GRANT OPTION, at Table level.
-
CONNECT ON DATABASE
-
You need read access on the SYS schema and the View Definition Permission in your SQL Server.
-
You need read access on definition_schema.
- GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
-
The role of the user that you specify in the
usernameproperty in lineage harvester configuration file must be the owner of the views in PostgreSQL. -
You need read access on the DBC.
-
You need read access to the following dictionary views:
- all_tab_cols
- all_col_comments
- all_objects
- ALL_DB_LINKS
- all_mviews
- all_source
- all_synonyms
- all_views
- Your user role must have privileges to export assets.
- You must have read permission on all assets that you want to export.
- You have added the Matillion certificate to a Java truststore.
- You have at least a Matillion Enterprise license.
-
The source code files must be in the same directory as the lineage.json file. Otherwise, an error occurs indicating that the lineage harvester cannot find the source code files. For complete information, go to Working with custom technical lineage.
-
You need the following roles, with user access to the server from which you want to ingest:
- A system-level role that is at least a System user role.
- An item-level role that is at least a Content Manager role.
We recommend that you use SQL Server 2019 Reporting Services or newer. We can't guarantee that older versions will work.
-
You need Admin API permissions.
The first call we make to MicroStrategy is to authenticate. We connect to <MSTR URL>:<Port>/MicroStrategyLibrary/api-docs/ and use POST api/auth/login. You have to ensure that this API call can be made successfully. - You need permissions to access the library server.
- The lineage harvester uses port 443. If the port is not open, you also need permissions to access the repository.
- If you have a MicroStrategy on-premises environment, you need the permissions for all of the database objects that the lineage harvester accesses.
- You have to configure the MicroStrategy Modeling Service. For complete information, see the MicroStrategy documentation.
- Necessary permissions to all database objects that the lineage harvester accesses.Show me the data source-specific permissions
Data source
Required permissions
Amazon Redshift
You need read access on information_schema. Only views that you own are processed.Azure Data Factory
Ensure that you meet the Azure Data Factory-specific permissions described in Set up Azure Data Factory.Azure SQL server
You need read access on the SYS schema.Azure Synapse Analytics
SELECT, at table level. Grant this to every table for which you want to create a technical lineage.Google BigQuery
You need read access on information_schema:- bigquery.datasets.get
- bigquery.tables.get
- bigquery.tables.list
- bigquery.jobs.create
- bigquery.routines.get
- bigquery.routines.list
Greenplum
A role with the LOGIN option.HiveQL
SELECT WITH GRANT OPTION, at Table level.IBM DB2
CONNECT ON DATABASEInformatica Intelligent Cloud Services
- Your user role must have privileges to export assets.
- You must have read permission on all assets that you want to export.
Matillion
- You have added the Matillion certificate to a Java truststore.
- You have at least a Matillion Enterprise license.
Microsoft SQL Server
You need read access on the SYS schema and the View Definition Permission in your SQL Server.MySQL
SELECT, at table level. Grant this to every table for which you want to create a technical lineage.Netezza
You need read access on definition_schema.Oracle You need read access to the following dictionary views:- all_tab_cols
- all_col_comments
- all_objects
- ALL_DB_LINKS
- all_mviews
- all_source
- all_synonyms
- all_views
PostgreSQL
- GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
-
The role of the user that you specify in the
usernameproperty in lineage harvester configuration file must be the owner of the views in PostgreSQL.
Snowflake The following permissions are the same, regardless of the ingestion mode:SQLorSQL-API.You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the IMPORTED PRIVILEGES privilege on the shared database to the user that runs the lineage harvester.Tip If the default role in Snowflake does not have the IMPORTED PRIVILEGES privilege, you can use thecustomConnectionPropertiesproperty in the lineage harvester configuration file to assign the appropriate role to the user. For example:"customConnectionProperties": "role=METADATA"Teradata
You need read access on the DBC.
SQL or SQL-API.customConnectionProperties property in the lineage harvester configuration file to assign the appropriate role to the user. For example:"customConnectionProperties": "role=METADATA"The following permissions apply only to MicroStrategy on-premises customers.
Steps
- Optionally, connect to a proxy server.
- Ensure that you meet the Azure Data Factory prerequisites.
- Ensure that you have the correct Tableau versions and permissions, as described in the Set up Tableau topics.
- Complete the tasks in Power BI and Microsoft Azure, as described in the Set up Power BI topics.
- If you are a MicroStrategy on-premises customer, ensure that you have enabled Collibra to access your MicroStrategy data, as described in Set up MicroStrategy.
- Ensure that you have API3 credentials for authorization and access control. For complete information, go to Set up Looker.
- Prepare the Data Catalog physical data layer.
- Prepare an external directory folder for the lineage harvester.
- Prepare a domain for BI asset ingestion.
- Optionally, assign the attribute type State to the global assignment of the Power BI Workspace asset type. For complete information, go to Power BI workspaces.
- Download and install the lineage harvester.
- Create a custom technical lineage JSON file.
- Prepare the lineage harvester configuration file.Note The project name in the configuration file must be the same as the full name of the Database asset.
- If necessary, prepare a <source ID> configuration file.Tip Hostname mapping (beta) will replace database mapping and the
collibraSystemNamesection for databases in a future Collibra version. For complete information and examples of hostname mapping, go to Tableau hostname mapping (beta). - Manually refresh your Power BI datasets.
Important The first time you integrate Power BI, you need to make sure that the data in your Power BI datasets is up-to-date. Carry out this step only if this is the first time you're integrating Power BI in Data Catalog. After that, Microsoft automatically refreshes the datasets every 90 days. For complete information, see:
- Run the lineage harvester.
What's next?
You can check the progress of the ingestion in Activities. The results field indicates how many relations were imported into Data Catalog.
"Heartbeat" messages in the lineage harvester logs indicate that the connection to your Power BI service is active and that metadata is still loading. Heartbeat messages, like this example, can be shown throughout the harvesting process.
2023-07-14T15:43:36+02 INFO [src=PowerBITenant] Heartbeat
After the metadata is ingested in Data Catalog, you can go to the domain that you specified in your lineage harvester configuration file and view the newly created assets. These assets are automatically stitched to existing assets in Data Catalog.
You can also view the Tableau technical lineage.
Warning We strongly recommend that you not edit the full names of any BI assets. Doing so will likely lead to errors during the synchronization process.
Warning We highly recommend that you do not move the ingested assets to a different domain. If you do, the assets will be deleted and recreated in the initial BI Catalog domain when you synchronize. As a consequence, any manually added data of those assets is lost.