Prepare the lineage harvester configuration file for Looker

You have to prepare a configuration file before you run the lineage harvester. The lineage harvester collects your Looker metadata and sends it to the Collibra Data Lineage service, where it is processed and analyzed. Collibra Data Intelligence Cloud then imports the Looker assets and relations to Data Catalog.

Example

Before you begin

Set up the latest lineage harvester.
Create one or more BI Data Catalog domains in which you want to ingest the Looker assets.

Requirements and permissions

Collibra Data Intelligence Cloud.
A global role with the following global permissions:
- Catalog, for example Catalog Author
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
A resource role with the following resource permissions on the community level in which you created the BI Data Catalog domain:
- Asset: add
- Attribute: add
- Domain: add
- Attachment: add

Steps

Start the lineage harvester to create an empty lineage harvester configuration file by entering the following command:
- Windows: .\bin\lineage-harvester.bat
- For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
An empty configuration file is created in the config folder.

Open the lineage-harvester.conf file and enter the values for each property.

Properties	Description
general	This section describes the connection information between the lineage harvester and Data Catalog.
techlin	This section contains information that is necessary to connect to the Collibra Data Lineage service instance. Warning This section applies only to US government customers.
url	The URL of the Collibra Data Lineage service instance. Example “url”: “https://techlin-gov.collibra.com” Warning This section applies only to US government customers.
userKey	The unique API key to connect to the Collibra Data Lineage service instance. A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager. Warning This section applies only to US government customers.
catalog	This section contains information that is necessary to connect to Data Catalog.
url	The URL of your Collibra Data Intelligence Cloud environment. Note You can only enter the public URL of your Collibra DGC environment. Other URLs will not be accepted.
username	The username that you use to sign in to Collibra.
useCollibraSystemName	Indicates whether or not you want to use the system or server name of a data source to match to the System asset in Data Catalog. Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog. This is useful when you have multiple databases with the same name. By default, the `useCollibraSystemName` property is set to `false`. If you want to use it, set it to `true`. Important If you set this property to `true`, the lineage harvester reads the value of the `collibraSystemName` property in your Looker <source-ID> configuration file. If you set the `useCollibraSystemName` property to `false`, the lineage harvester ignores the `collibraSystemName` property in the Looker <source-ID> configuration file.
sources	This section contains all Looker connection properties.
id	The unique ID of your Looker metadata. For example, my_looker. Tip This value can be anything as long as it is unique and human readable. The ID identifies the batch of Looker metadata on the Collibra Data Lineage service. Warning In the `sources` section of your lineage harvester configuration file, you can only specify one `id` property per Looker instance. If you have multiple `id` properties for a single Looker instance, ingestion will fail. If you have multiple `id` properties in the configuration file, it means you intend to ingest from multiple unique Looker instances.
type	The kind of data source. In this case, the value has to be Looker.
lookerUrl	The URL to your Looker API. Tip There are two ways to find the Looker API URL: In the API Host URL field in the Looker Admin menu. If this field is empty, you can use the default Looker API URL which you can find in the interactive API documentation. In the interactive API documentation URL. It is the part of the URL before `/api-docs/`. Note Looker 3.1 APIs are deprecated; however, the API3 credentials for authorization and access control remain valid.
clientId	The username you use to access the Looker API.
domainId	The unique ID of the domain in Collibra Data Intelligence Cloud in which you want to ingest the Looker assets. This is the default domain. If you want to ingest the contents of specific Looker Folders into specific domains in Collibra, you specify the domain reference IDs in the filters section of the Looker <source ID> configuration file.
pagingLimit	Optional property for customizing the Looker API pagination settings. The default value of `50` is sufficient in most cases; however, you can decrease it to help mitigate node limit errors, or increase it to speed up API calls. Example `"pagingLimit": 10`
deleteRawMetadataAfterProcessing	The lineage harvester harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance, for processing. You can use this optional property to specify whether or not the raw metadata should be deleted from Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed. The default value is `false`. If the property is set to `true`, the raw source metadata is deleted after processing. If set to `false`, it is stored in the Collibra infrastructure. Note Setting this property to `true` can negatively impact performance.

Save the configuration file.
Start the lineage harvester again in the console and run the following command:
- for Windows: .\bin\lineage-harvester.bat full-sync
- for other operating systems: ./bin/lineage-harvester full-sync
When prompted, enter the password or client secret to connect to your Collibra Data Intelligence Cloud and Looker environment.
The passwords are encrypted and stored in /config/pwd.conf.

What's next?

The lineage harvester triggers Collibra to import Looker assets and their relations and create a technical lineage for Looker Look assets.

Currently, Looker assets are not yet stitched to other assets in Data Catalog.

If issues occur during the Looker ingestion process, check the Looker troubleshooting section to solve your problems.

To refresh the Looker metadata, you can run the lineage harvester again or schedule jobs to run them automatically.

Tip You can check the progress of the Looker ingestion in Activities. The results field indicates how many relations were imported into Data Catalog.