Example | CLI lineage harvester to Edge: Migration and cleanup workflow

The CLI lineage harvester is deprecated and reaches its End of Life on July 31, 2026. To continue generating technical lineage for selected data sources, any Collibra Data Lineage customers that are still using the CLI lineage harvester need to transition to technical lineage via Edge. before this date.

Scenario

In this use case, you'll migrate the technical lineage of an Oracle data source, using an Amazon Web Services (AWS) connection. You'll decommission the existing CLI harvester configuration, provision equivalent Oracle Edge capability, and resolve common "useCollibraSystemName" configuration mismatches that result in synchronization failures.

The steps in this use case are specific to the latest Collibra UI.

Steps overview

# Step Description
1 Review the preflight checks. Key considerations to help ensure successful integration, including required Edge, technical lineage, and data source-specific permissions, network requirements and more.
2 Copy the source ID and decommission the CLI lineage harvester.

Take note of the source ID of the data source for which you want to migrate lineage. Decommissioning the CLI harvester eliminates the risk that both the CLI harvester and Edge attempt to update the same lineage batch.

3 Prepare and store your SQL files in an AWS S3 bucket.

You need to provide SQL files that include your SQL queries. Collibra Data Lineage processes the metadata based on your queries to create the technical lineage.

The focus of this use case is an AWS S3 bucket.

4

Create an AWS connection.

You need to create an AWS connection to the AWS S3 bucket in which your SQL files are stored.

5

Add the Technical Lineage for SqlDirectory (Cloud).

Add the technical lineage capability to your Edge or Collibra Cloud site. The capability allows the lineage harvester to retrieve data from AWS S3 bucket.
6 Synchronize your technical lineage.

You can synchronize your technical lineage manually or automatically by adding a synchronization schedule. In this use case, you'll synchronize manually.

7 Cleanup and deletion: Delete the technical lineage of a data source.

Use the technical lineage admin "ignore sources" option to remove stale source IDs that cause "useCollibraSystemName" errors or persist as gray nodes in the lineage viewer.

1 Review the preflight checks

To ensure successful metadata ingestion and lineage generation, complete the following preflight checks.

In your Oracle environment

You need read access to the following dictionary views:
  • all_tab_cols
  • all_col_comments
  • all_objects
  • ALL_DB_LINKS
  • all_mviews
  • all_source
  • all_synonyms
  • all_views
Note By default, the lineage harvester queries the all_source table to retrieve Package bodies. However, this requires the EXECUTE privilege. As an alternative, you can direct the harvester to query the dba_source table, which requires the SELECT_CATALOG_ROLE role. To do so, you need to:
  • If via Edge: Replace all_source by dba_source in the Other Queries field in your Edge capability.
  • If via the CLI lineage harvester: Replace all_source by dba_source in the file ./sql/oracle/queries.sql, which is included in the ZIP file when you download the lineage harvester.

In your Collibra environment

  • Technical lineage via Edge is enabled in your Collibra environment.
  • You either created and installed an Edge site or were granted a Collibra Cloud site.

  • The Edge site status must be Healthy.
  • You've registered the data source via Edge.
  • Edge can connect to all Collibra Data Lineage service instances in your geographic location.

Collibra permissions

You can connect to Collibra Data Lineage by using the basic or OAuth authentication method. The following permissions are required only if you use the basic authentication method. 

To create the Amazon Web Services (AWS) connection and add the Edge capability:

To create a Technical Lineage Admin connection:

To synchronize technical lineage:

2 Copy the source ID and decommission the CLI lineage harvester

  1. Open your existing lineage harvester configuration file.
  2. Take note of the value of the id property for the data source you want to migrate. For example, id: "marketing_snowflake_prod". You'll need this when you add the Edge capability.
  3. Remove the entire section for that data source from the lineage harvester configuration file. For example:
    {
    	"dialect" : "oracle",
    	"id" : "informatica_source",
    	"type" : "ExternalDirectory",
    	"dirType" : "powercenter",
    	"path" : "/path/to/the/informatica/folder/",
    	"mask" : "*",
    	"recursive" : false,
    	"deleteRawMetadataAfterProcessing": false
    }

3 Prepare your Oracle SQL files for cloud storage

  1. Create your SQL files. Ensure that the following requirements are met for the SQL files:

    • The SQL files must be UTF-8 encoded.
    • The SQL files can't have white spaces in their names.
    • For better ingestion, include one SQL statement in one SQL file.
    • SQL files that contain Data Definition Language (DDL) statements must be processed before SQL files that contain Data Manipulation Language (DML) statements.
      For the data sources that are listed in Supported SQL statements , Collibra Data lineage automatically detects DDL statements, regardless of the SQL file names.
      For other JDBC data sources, Collibra Data Lineage processes SQL files in alphanumeric order. To ensure that DDL statements are processed first, name SQL files that contain DDL statements so they are before files that contain DML statements.
    • The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. For more information, go to Add a technical lineage capability to your Edge site and Automatic stitching for technical lineage.
    • For Collibra Data Lineage to correctly highlight the transformation logic in the Source code pane, we strongly recommend that your SQL files have Unix line endings. Non-Unix line endings, for example Carriage Return (CR) and Line Feed (LF) line breaks, do not influence the extracted lineage and can result in incorrect highlighting.

    For more information, go to Supported SQL syntax.

  2. Store the SQL files in your AWS S3 bucket.

4 Create an AWS connection

If you use a vault to add your data source information to your Edge site connection, go to Oracle: Create an AWS connection for complete information.

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the table, click the name of the site whose status is Healthy.
      The site page opens.
  2. In the Connections section, click Create connection.
    The Create connection page appears.
  3. Select the AWS connection to connect to Amazon S3.
  4. Enter the required information.
    FieldDescriptionRequiredAvailable for vaults?
    Name

    The name of the Edge or Collibra Cloud site AWS connection.

    Yes No
    Description

    The description of the connection.

    No No
    Vault The vault where you store your data source values. No No
    Authentication type

    The type of authentication you use. The possible values are IAM and EC2.

    Use type EC2 AWS if you want to connect to an AWS EC2 instance that is configured with role based authentication. For more details, go to Prepare S3 for Edge.

    Yes No
    Access Key ID

    The access key ID of the programmatic AWS user.

    Yes for IAM authentication type. Yes
    Secret Access Key

    The secret access key of the programmatic AWS user.

    Yes for IAM authentication type. Yes
  5. Click Create.
    The connection is added to the Edge or Collibra Cloud site.
    The fields become read-only.

5 Add the Oracle capability for Cloud Storage connections

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the table, click the name of the site whose status is Healthy.
      The site page opens.
  2. In the Capabilities section, click Add capability.
    The Add capability page appears.
  3. Select the Technical Lineage for SqlDirectory (Cloud) capability.
  4. Enter the required information.
    FieldDescriptionRequired?

    Name

    The name of the capability.

    Yes

    Description

    The description of the capability.

    Yes

    Source ID

    Enter the source ID that you copied in step 1. For this use case: marketing_snowflake_prod.

    Yes

    TechLin Admin Connection (in preview)

    If you want to use the OAuth authentication type to connect to the Collibra Data Lineage service instances, you have to create a Technical Lineage Admin Edge or Collibra Cloud site connection and select the OAuth authentication type. Then, in this field, you specify the name of the Technical Lineage Admin Edge or Collibra Cloud site connection.

    No

    Cloud Connection

    The name of the AWS connection that you created

    Yes

    Cloud Storage Bucket/Container

    The name of the bucket or container in the cloud-based storage system. Do not include the protocol or prefix, for example don't include s3:// or gs://.

    Yes

    Cloud Storage Region

    The AWS S3 cloud storage region.

    No

    Azure Cloud Storage Account

    Not applicable for this use case.

    No

    Cloud Storage Path

    The path to the folder (in the container or bucket) that contains the files.

    No

    Mask

    The pattern of the file names in the directory. By default, the value is *.

    No

    Dialect

    The dialect of the database. In this use case: oracle

    Yes

    Collibra System Name

    Enter the name of your System asset in Data Catalog. For successful stitching, the names must exactly match.

    Important Be sure to read the following section The "useCollibraSystemName" trap.

    Yes

    Database

    The name of your database, which is also the name of your Database asset in Data Catalog.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching.

    Yes

    Schema

    The name of the default schema, if not specified in the data source itself. This corresponds to the name of your Schema asset.

    Note The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching.

    Yes

    Database Link Mapping

    Not applicable for this use case.

    No

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Platform for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Dependent On Sources

    Not applicable for this use case.

    No

    Database-System mapping

    This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

    No

    Delete Raw Metadata After Processing

    Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

    Select this option to indicate that the raw source metadata is deleted after processing.

    Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

    No

    Analyze Only (Deprecated)

    Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.
    • The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
    • The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

    No

    Processing Level

    Select Sync to immediately synchronize the lineage.

    Yes

    Active

    The option determines whether to include or remove the technical lineage of the data source.

    Select this option to include the technical lineage of this data source.

    Yes

  5. Click Create.
    The capability is added to the Edge or Collibra Cloud site.
    The fields become read-only.

6 Synchronize Oracle lineage Cloud Storage connection

You can synchronize your technical lineage manually or automatically by adding a synchronization schedule. In this use case, you'll synchronize manually.

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Find the connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the connection, expand them to find your capability.
    The capability configuration page opens.
  5. On the Synchronize Configuration tab pane, click Synchronize.
    A notification indicates synchronization has started.
    The synchronization job is started. The lineage is ingested based on the configuration provided.

7 Cleanup and deletion: Delete the technical lineage of a data source

If necessary, remove stale source IDs that cause "useCollibraSystemName" errors or persist as gray nodes in the lineage viewer. This is the most critical pain point for users.

The "useCollibraSystemName" trap

A common source of synchronization failure is "Collibra system name" setting mismatches. This happens if the value of the useCollibraSystemName property in your lineage harvester configuration file does not match with the Edge setting in Collibra Console.

Specifically:

  • In your lineage harvester configuration file: the useCollibraSystemName property must be set to true or false.
  • In Collibra Console, the Collibra system name must match the value of the useCollibraSystemName properties in your lineage harvester configuration file, either True or False.
    For information on this Collibra Console setting, go to Enable technical lineage via Edge.

The useCollibraSystemName property in your lineage harvester configuration file and the Collibra system name setting in Collibra Console determine whether the System name is included in the full path - and therefore the full names - of assets. If the property and field values do not match, stitching cannot be achieved.

Important If you change the values either in your lineage harvester configuration file or in Collibra Console to avoid a mismatch, you must remove the old source ID to clear the cached configuration. Then synchronize as a fresh data source.

Clear the Active checkbox

Ensure that the Active checkbox in the relevant technical lineageEdge capability is cleared.

  1.  Open an Edge or Collibra Cloud site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the Edge or Collibra Cloud site overview, click the name of the Edge or Collibra Cloud site to which you added the technical lineage capability for the data source.
      The Edge or Collibra Cloud site page appears.
  2. In the Capabilities section, locate and click the technical lineage capability for the data source you want to delete.
  3. Clear the Active checkbox.
  4. Click Save.
    The capability is updated.

Create a Technical Lineage Admin connection

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the site overview, click the name of a site.
      The site page appears.
  2. In the Connections section, click Create Connection.

    The Create Connection dialog box appears.
  3. Select the Technical Lineage Admin connection.
  4. Enter the connection information.

  5. Field Description Required
    Name

    A name for the Edge connection.

    Yes
    Description

    A description of the connection.

    No

    Authentication Type

    The authentication method you use to connect to Collibra Data Lineage:

    • Basic Authentication
      If you choose this method, ignore the rest of the fields.
    • OAuth
      If you choose this method, you must use the following fields to provide a client ID and client secret. This authentication method is recommended for enhanced security.
      Important OAuth authentication is not yet available for Collibra Platform for Government customers.
    Yes

    Client ID

    Your client ID for OAuth authentication.

    Yes

    Client Secret

    Your client secret for OAuth authentication.

    Yes

    .

  6. Click Create.

Add a Technical Lineage Admin capability to your Edge or Collibra Cloud site

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the table, click the name of the site whose status is Healthy.
      The site page opens.
  2. Select the relevant capability template: Technical Lineage Admin.
  3. Enter the required information.
    FieldDescriptionRequired?

    Name

    The name of the capability.

    Yes

    Description

    The description of the capability.

    Yes

    Admin connection

    The name of the Edge connection you created in the previous step.

    Yes

    Property

    This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.

    You can use this field to set the HTTP timeout duration by adding the httpTimeout property: 

    Warning If you are a Collibra Platform for Government customer, this field is required to connect to a Collibra Data Lineage service instance:

    Yes for US government customers.

    Debug

    Select False.

    No

    Log level

    Leave this set to No logging.

    No

  4. Click Create.
    The capability is added to the Edge or Collibra Cloud site.
    The fields become read-only.

Run the "Ignore source" option in Data Catalog

Important 

To use the Ignore sources option:

  1. Your metadata must be refreshed. You can either wait for the next scheduled synchronization to run, or you can edit the Integration configuration data refresh schedule setting in Collibra Console so that the refresh is done sooner.
  2. If you edit the Integration configuration data refresh schedule setting, you must restart Collibra.

If you don't refresh your metadata (and restart Collibra, if necessary), an error is shown on the Integration Configuration tab.

For each source that you want to ignore, ensure that the Active checkbox in the respective technical lineage Edge capability is cleared.

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Find the connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the connection, expand them to find your capability.
    The capability configuration page opens.
  5. On the Synchronize Configuration section, click Edit Configuration.
  6. In the Admin command drop-down list, select Ignore sources.
  7. In the Sources drop-down list, select the source or sources you want excluded from the technical lineage.
  8. Click Save.
  9. In the Synchronize Configuration section, click Synchronize.
    When synchronization is complete, the technical lineage of the data source is deleted.

View the synchronization results for the "Ignore sources' job

  1. Open the Activities list.
  2. In the row containing the job, click Result.
    The Synchronization Results dialog box appears.

Run the "Sync" option in Data Catalog, for any other data source

This step ensures that the technical lineage is regenerated. You don't have to create a new Edge connection or add another capability. For the data source that you sync, ensure that:

  • The Active option is selected in the Edge capability.
  • The Processing Level setting is set to Sync.

After a successful synchronization, the deleted data source is removed from the technical lineage graph.