Prepare the lineage harvester configuration file for Looker

You have to prepare a configuration file before you run the lineage harvester. The lineage harvester collects your Looker metadata and sends it to the Collibra Data Lineage service, where it is processed and analyzed. Collibra Data Intelligence Cloud then imports the Looker assets and relations to Data Catalog.

Example 

Before you begin

Requirements and permissions

  • Collibra Data Intelligence Cloud.
  • A global role with the following global permissions:
    • Catalog, for example Catalog Author
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permissions on the community level in which you created the BI Data Catalog domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add

Steps

  1. Start the lineage harvester to create an empty lineage harvester configuration file by entering the following command:
    • Windows: .\bin\lineage-harvester.bat
    • For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
    An empty configuration file is created in the config folder.
  2. Open the lineage-harvester.conf file and enter the values for each property.
    PropertiesDescription
    general

    This section describes the connection information between the lineage harvester and Data Catalog.

    techlin

    This section contains information that is necessary to connect to the Collibra Data Lineage service instance.

    Warning This section applies only to US government customers.

    url

    The URL of the Collibra Data Lineage service instance.

    Example “url”: “https://techlin-gov.collibra.com”

    Warning This section applies only to US government customers.

    userKey

    The unique API key to connect to the Collibra Data Lineage service instance.

    A unique user key is needed for each Collibra environment. If you're not sure what your user key is, please contact your Collibra Customer Success Manager.

    Warning This section applies only to US government customers.

    catalog

    This section contains information that is necessary to connect to Data Catalog.

    url

    The URL of your Collibra Data Intelligence Cloud environment.

    Note You can only enter the public URL of your Collibra DGC environment. Other URLs will not be accepted.

    username

    The username that you use to sign in to Collibra.

    useCollibraSystemName

    Indicates whether or not you want to use the system or server name of a data source to match to the System asset in Data Catalog. Collibra Data Lineage uses the system names to match the structure of databases in Looker to assets in Data Catalog. This is useful when you have multiple databases with the same name.

    By default, the useCollibraSystemName property is set to false. If you want to use it, set it to true.

    Important 
    • If you set this property to true, the lineage harvester reads the value of the collibraSystemName property in your Looker <source-ID> configuration file.
    • If you set the useCollibraSystemName property to false, the lineage harvester ignores the collibraSystemName property in the Looker <source-ID> configuration file.
    sources

    This section contains all Looker connection properties.

    id

    The unique ID of your Looker metadata. For example, my_looker.

    Tip This value can be anything as long as it is unique and human readable. The ID identifies the batch of Looker metadata on the Collibra Data Lineage service.

    Warning In the sources section of your lineage harvester configuration file, you can only specify one id property per Looker instance. If you have multiple id properties for a single Looker instance, ingestion will fail. If you have multiple id properties in the configuration file, it means you intend to ingest from multiple unique Looker instances.

    type

    The kind of data source. In this case, the value has to be Looker.

    lookerUrl

    The URL to your Looker API.

    Tip There are two ways to find the Looker API URL:
    • In the API Host URL field in the Looker Admin menu. If this field is empty, you can use the default Looker API URL which you can find in the interactive API documentation.
    • In the interactive API documentation URL. It is the part of the URL before /api-docs/.

    Note Looker 3.1 APIs are deprecated; however, the API3 credentials for authorization and access control remain valid.

    clientId

    The username you use to access the Looker API.

    domainId

    The unique ID of the domain in Collibra Data Intelligence Cloud in which you want to ingest the Looker assets.

    This is the default domain.

    If you want to ingest the contents of specific Looker Folders into specific domains in Collibra, you specify the domain reference IDs in the filters section of the Looker <source ID> configuration file.

    pagingLimit

    Optional property for customizing the Looker API pagination settings.

    The default value of 50 is sufficient in most cases; however, you can decrease it to help mitigate node limit errors, or increase it to speed up API calls.

    Example "pagingLimit": 10

    deleteRawMetadataAfterProcessing

    The lineage harvester harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance, for processing.

    You can use this optional property to specify whether or not the raw metadata should be deleted from Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    The default value is false.

    If the property is set to true, the raw source metadata is deleted after processing. If set to false, it is stored in the Collibra infrastructure.

    Note Setting this property to true can negatively impact performance.

  3. Save the configuration file.
  4. Start the lineage harvester again in the console and run the following command:
    • for Windows: .\bin\lineage-harvester.bat full-sync
    • for other operating systems: ./bin/lineage-harvester full-sync
  5. When prompted, enter the password or client secret to connect to your Collibra Data Intelligence Cloud and Looker environment.
    The passwords are encrypted and stored in /config/pwd.conf.

What's next?

The lineage harvester triggers Collibra to import Looker assets and their relations and create a technical lineage for Looker Look assets.

Currently, Looker assets are not yet stitched to other assets in Data Catalog.

If issues occur during the Looker ingestion process, check the Looker troubleshooting section to solve your problems.

To refresh the Looker metadata, you can run the lineage harvester again or schedule jobs to run them automatically.

Tip You can check the progress of the Looker ingestion in Activities. The results field indicates how many relations were imported into Data Catalog.