Looker source configuration

Updated:

Note This topic is only relevant if you are creating technical lineage via Edge. If you are using the CLI lineage harvester (deprecated), you need to create a <source ID> configuration file. The CLI harvester will officially reach its End of Life on July 31, 2026.

The Source configuration field in the Looker technical lineage Edge capability allows you to:

The value of the Source configuration field must be a valid block of JSON code, for example:

Copy
{
   "Connections":{
      "connection-object1":{
         "schema":"mssql-schema-name",
         "dbname":"mssql-database-name",
         "collibraSystemName":"mssql-system-name"
      },
      "connection-object2":{
         "schema":"oracle-schema-name",
         "dbname":"oracle-database-name",
         "collibraSystemName":"oracle-system-name"
      }
   },
   "filters":[
      {
         "domainId":"605fa8ae-f8c6-4261-938b-8326e2806f3d",
         "description":"Databricks_folder",
         "folderIds":[
            "abc-123",
            "def-456"
         ]
      },
      {
         "domainId":"245bc5a4-4c30-44b5-8356-ddbe708b56d6",
         "description":"personal",
         "folderIds":[
            "hij-789*",
            "jkl-101112"
         ]
      }
   ]
}

The following table describes the various properties you can use in your JSON code block.

Property

Description

Mandatory?

Connections

This section contains all Looker connections for which you want to create a technical lineage.

Yes

<connection name>

The name of a connection object in Looker.

Yes

schema

The name of the default schema of a supported data source in Looker.

If the lineage harvester fails to find a specific schema, it uses the default schema.

No

dbname
The name of the database of a supported data source in Looker.

No

collibraSystemName

The system or server name of a database.

If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but you either don't create a <source ID> configuration file, or don't specify a value for the collibraSystemName property in your <source ID> configuration file, the system name in the technical lineage is "DEFAULT".

How to configure this property if you have two databases with the same name

Let's assume you have two databases named Customers. When you prepare the physical data layer in Data Catalog, you create a System asset for each of these databases. Let's say you named them Customers-Europe and Customers-USA. You can then configure this property as follows.

Copy
"connection-object1": {
    "dialect": "mssql",
    "schema": "mssql-schema-name",
    "dbname": "Customers",
    "collibraSystemName": "Customers-Europe"
},
"connection-object2": {
    "dialect": "oracle",
    "schema": "oracle-schema-name",
    "dbname": "Customers",    
    "collibraSystemName": "Customers-USA"
}

Yes

filters

Optionally, use this section to specify the Looker folders from which you want to ingest metadata.

Important 
  • If you don't want to filter on Looker Folders, you must completely remove this filters section.
  • You can filter on Looker folders, but not on Looker data sets. That's because Looker data sets are linked directly to the server, instead of a folder, as shown in the Looker metadata overview. Looker data sets are ingested in the default domain, regardless of any filtering.

Let’s say, for example, you filter on folder B. A Looker Folder asset is created in the specified domain in Collibra, and all of the metadata in folder B is ingested. If folder B has a parent folder A, then a Looker Folder asset is created (in the domain specified for folder B) to preserve the hierarchy, but no metadata from folder A is ingested.

You can specify more than one Looker folder for ingestion into a single domain in Collibra.

There are significant benefits to filtering by folder ID. For information, see the filters > folderIds property description.

You can use wildcards to capture multiple connection string combinations:

Pattern Description
* Matches everything.
? Matches any single character.
[seq] Matches any character in "seq".
[!seq] Matches any character not in "seq".
No
domainId

The unique resource ID of the domain (or domains), in Collibra, in which you want to ingest data objects from one or more Looker Folders.

You can find the domain ID by clicking the domain type. Then look in the URL of your browser to find the ID. The URL looks like https://<yourcollibrainstance>/domain/<domain ID>?<view>.

 
description
Any description, as you see fit.  
folderNames

The name (or names) of the Looker Folders from which you want to ingest.

You must specify either a folder name, a folder ID, or both.

 
folderIds

The ID (or IDs) of the Looker Folder you want to ingest.

You must specify either a folder ID, a folder name, or both.

Tip If you filter by folder ID, filtering is carried out via the API, instead of on the Collibra Data Lineage service instances.

When you filter by folder ID, the lineage harvester accesses only the folders you specify via this property, and sends only that metadata to the Collibra Data Lineage service instance for processing and ingestion in Data Catalog. Conversely, if you filter by folder name (via the folderNames property), metadata from all Looker folders is sent to the Collibra Data Lineage service instance. Only then is filtering applied.
The advantages are as follows:
  • Faster integration testing, as you can filter on a single folder.
  • Enhanced data security and privacy by not harvesting folders that contain sensitive information.
  • Improved processing times by not including folders dedicated to, for example, development and testing. This is especially beneficial for organizations with a lot of data in Looker.