Prepare the lineage harvester configuration file for MicroStrategy (NEW)(Beta)

Before you run the lineage harvester, you have to prepare the lineage harvester configuration file to provide the connection information that you need to connect your MicroStrategy server and remote data source to the Collibra Data Lineage service instance and domain in which you want to ingest the MicroStrategy assets.

Example

Before you begin

Install the newest lineage harvester.
Prepare a BI Catalog domain in which you want to ingest MicroStrategy metadata.
Prepare the Data Catalog physical data layer.

Requirements and permissions

Collibra Data Intelligence Cloud.
A global role with the following global permissions:
- Catalog, for example Catalog Author
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
A resource role with the following resource permissions on the community level in which you created the BI Data Catalog domain:
- Asset: add
- Attribute: add
- Domain: add
- Attachment: add

Necessary permissions to all database objects that the lineage harvester accesses.

Tip

Some data sources require specific permissions.

Ensure that you meet the Azure Data Factory prerequisites.

You need read access on the SYS schema.

You need read access on the SYS schema and the View Definition Permission in your SQL Server.

You need read access on information_schema:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.list
bigquery.jobs.create
bigquery.routines.get
bigquery.routines.list

GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

You need read access on information_schema. Only views that you own are processed.

SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

The role of the user that you specify in the username property in lineage harvester configuration file must be the owner of the views in PostgreSQL.

A role with the LOGIN option.

SELECT WITH GRANT OPTION, at Table level.

CONNECT ON DATABASE

Note The following permissions are the same, regardless of the ingestion mode: SQL or SQL-API.

You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the IMPORTED PRIVILEGES privilege on the shared database to the user that runs the lineage harvester.

Tip If the default role in Snowflake does not have the IMPORTED PRIVILEGES privilege, you can use the customConnectionProperties property in the lineage harvester configuration file to assign the appropriate role to the user. For example:
"customConnectionProperties": "role=METADATA"

You need read access on the DBC.

You need read access to the following dictionary views:

all_tab_cols
all_col_comments
all_objects
ALL_DB_LINKS
all_mviews
all_source
all_synonyms
all_views

You need read access on definition_schema.

Your user role must have privileges to export assets.
You must have read permission on all assets that you want to export.

You have added the Matillion certificate to a Java truststore.
You have at least a Matillion Enterprise license.

In MicroStrategy:
- Admin API permissions.
- Permissions to access the library server.
- The lineage harvester uses port 443. If the port is not open, you also need permissions to access the repository.

Steps

Open the lineage-harvester.conf file that was created when you installed the lineage harvester, and enter the values for each property.

Properties	Description
general	This section describes the connection information between the lineage harvester and Data Catalog.
techlin	This section contains information that is necessary to connect to the Collibra Data Lineage service instance. Warning This applies only to US government customers.
url	The URL of the Collibra Data Lineage service instance.“url”: “https://techlin-gov.collibra.com” Warning This applies only to US government customers.
userKey	The unique API key to connect to the Collibra Data Lineage service instance. A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager. Warning This applies only to US government customers.
catalog	This section contains information that is necessary to connect to Data Catalog.
url	The URL of your Collibra Data Intelligence Cloud environment. Note You can only enter the public URL of your Collibra DGC environment. Other URLs will not be accepted.
username	The username that you use to sign in to Collibra.
useCollibraSystemName	Indicates whether or not you want to use the system or server name of a data source to match to the System asset in Data Catalog during automatic stitching. This is useful when you have multiple databases with the same name. By default, the `useCollibraSystemName` property is set to `false`. If you want to use it, set it to `true`. Important If you set this property to `true`, the lineage harvester reads the value of the `collibraSystemName` property in your MicroStrategy <source ID> configuration file. If you set the `useCollibraSystemName` property to `false`, the lineage harvester ignores the `collibraSystemName` property in the Power BI <source-ID> configuration file.
sources	This section contains all MicroStrategy connection properties.
id	The unique ID of your MicroStrategy metadata. For example, `my_microstrategy`. Warning In the `sources` section of your lineage harvester configuration file, you can only specify one `id` property per MicroStrategy Intelligence Server. If you have multiple `id` properties for a single MicroStrategy Intelligence Server, ingestion will fail. If you have multiple `id` properties in the configuration file, it means you intend to ingest from multiple unique MicroStrategy Intelligence Servers. Tip This value can be anything as long as it is unique and human readable. The ID identifies the batch of MicroStrategy metadata on the Collibra Data Lineage service.
type	The kind of data source. In this case, the value has to be `MSTR_V2`.
url	The URL of your MicroStrategy account.
username	The username that you use to sign in to MicroStrategy.
maxParallelRequests	This optional property allows you to specify the internal sizing, meaning the amount of tasks that can be executed at the same time. The default value is "1", which means that HTTP requests are run in a synchronous manner, instead of in parallel. As value of "5", for example, means that as many as 5 HTTP requests can take place in parallel. A lower value reduces the chances of experiencing HTTP 401 Unauthorized errors.
deleteRawMetadataAfterProcessing	The lineage harvester harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance, for processing. You can use this optional property to specify whether or not the raw metadata should be deleted from Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed. The default value is `false`. If the property is set to `true`, the raw source metadata is deleted after processing. If set to `false`, it is stored in the Collibra infrastructure. Note Setting this property to `true` can negatively impact performance.
appUrlSuffix	This optional property ensures that the correct URL to data objects in MicroStrategy is included on the asset pages of corresponding MicroStrategy assets. The required value depends on which platform you run MicroStrategy: For J2EE, use: `"appUrlSuffix": "MicroStrategy/servlet/mstrWeb"` For .NET, use: `"appUrlSuffix": "MicroStrategy/asp/Main.aspx"`

Save the configuration file.

What's next?

Prepare your <source ID> configuration file.