lineage harvester change log

Collibra Data Lineage is updated and improved on a regular basis. On this page, you can see the most important changes between different versions of the lineage harvester. For a complete list, see the release notes.

Note In the documentation, we assume that you have the most recent version of the lineage harvester. We highly recommend to download and use the newest lineage harvester from the Collibra downloads page even if you are on an older version of Collibra Data Intelligence Cloud.

Warning If you upgrade to lineage harvester 1.3.0 or newer, you have to follow an upgrade procedure.

The following list contains the most important changes to the lineage harvester and its configuration file.

Changed in version

New lineage harvester functionality

2022.03
  • By default, the lineage harvester no longer harvests images. If you want to include images, include the optional excludeImages property in your configuration file and set the value to false.
  • When ingesting Tableau metadata, you can now leave empty the collibraSystemName property in your configuration file, even if the useCollibraSystemName property is set to true.
  • The lineage harvester now correctly shows the help overview when you run the --help command.
  • Hive source now skips harvesting DDL of exclusively locked tables.
  • When you change the domain reference ID in the lineage harvester configuration file, Tableau assets are now successfully deleted from the previous domain and recreated in the new domain.
  • You no longer see a Fiber Failed error while running the lineage harvester.
  • Protobuf is upgraded to version 3.19.3.
  • Fixed an issue that was causing incomplete technical lineage and stitching issues when using custom SQL in Tableau.
  • Fixed an issue that resulted in a TableauHarvesterError when ingesting Tableau metadata via the linage harvester.
  • Fixed a NullPointerException when no column data type is harvested.
  • Fixed an issue that was causing the ingestion of Looker metadata to fail.
  • Fixed an issue that was causing a JsonParseError when ingesting Tableau metadata.
2022.02
1.4.4

The lineage harvester now supports:

  • Technical lineage for Matillion. Redshift and Snowflake projects in Matillion are supported.
  • Snowflake syntax for the CONNECT BY clause.
1.4.3

The lineage harvester log output now includes Collibra Data Lineage server processing information.

1.4.2

Collibra Data Lineage has improved Teradata parsing.

1.4.1

The lineage harvester for IBM DataStage now supports environment files.

1.4.1

You can now add connection information to the Informatica Intelligent Cloud Services <source ID> configuration file.

1.4.1

You can now request MicroStrategy as a lineage harvester integration in beta.

1.4.0

You can now request the following lineage harvester integrations in beta:

  • AWS Glue script annotations
  • Matillion
  • Power BI Report Server
  • SQL Server Reporting Services

1.4.0

The lineage harvester logs now shows message codes to inform you of an issue.

1.4.0

The user that runs the lineage harvester no longer need elevated permissions to access Snowflake metadata.

You need a role that can access the snowflake shared read-only database.

To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.

1.3.5

The lineage harvester configuration file and Power BI harvester configuration files now have a useCollibraSystemName property. You use this property to enable the harvesters to process the value in collibraSystemName properties and map the structure of the data source to system > database > schema > table > column, which you can see in the technical lineage Browse tab pane.

By default, this property is set to False.

1.3.5

You can now create a separate configuration file for each data source to define the collibraSystemName property. For more information about this option, see the following topics:

1.3.5

You can now use the customConnectionProperties field for Microsoft SQL Sever JDBC sources.

1.3.4

HiveQL sources no longer have connection type JDBC. You can only create a technical lineage for HiveQL sources via folder.

Tip If you previously had a database section in the configuration file with a HiveQL source, change the section to match the properties of a directory section before you run the harvester 1.3.4 or newer.

1.3.4

You can now create a technical lineage for Informatica Intelligent Cloud Services. Specifically, for the Cloud Data Integration service.

You add the connection information to the lineage harvester configuration file.

1.3.4

You can now also add other data source connectors to the connection definition file for DataStage.

1.3.4

The lineage harvester configuration file for HiveQL, Spark SQL, PostgreSQL, Redshift and Snowflake data sources can now have a customConnectionProperties property to provide specific connection properties.

1.3.3

You can now use connection definitions to create a technical lineage for SQL Server Integration Services.

1.3.3

You can add multiple Google BigQuery projects in the configuration file in the "projectIDs" property. The "projectName" property is now deprecated.

1.3.2

Collibra Data Lineage can now process transformation logic for IBM DataStage.

1.3.2

The Collibra Data Lineage server now has an IP address for a server located in Canada, for Google cloud users: 35.197.182.41.

1.3.1

The Collibra Data Lineage server now has an IP address for a server located in Canada, for AWS users: 15.222.200.199.

1.3.1

You can now create a technical lineage for MySQL data sources.

1.3.0

The lineage harvester now gives you the option to provide your passwords via stdin or a password manager.

1.3.0

The lineage harvester now supports IBM InfoSphere DataStage.

1.3.0

The lineage harvester now supports Looker integration.

1.3.0

The lineage harvester now connects to one of the servers with the following IP addresses:

  • 18.198.89.106 (techlin-aws-eu)
  • 54.242.194.190 (techlin-aws-us)
  • 35.205.146.124 (techlin-gcp-eu)
  • 34.73.33.120 (techlin-gcp-us)

Note The lineage harvester connects to different servers based on your geographical location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources and you have to restart your DGC service.

1.2.1

You can now use the lineage harvester to import new Power BI assets, relations and a technical lineage into Data Catalog.

1.2.0

The general section of the configuration file shows the following:

  • A catalog section: This part contains the connection details needed to connect to Data Catalog.

You no longer need an API key to connect to Collibra cloud. This part of the configuration file is optional and not shown when you create the file via the lineage harvester. You can no longer use it in lineage harvester 1.3.0.

{
 "general": {
  "catalog" : {
   "url" : ""}
},

1.2.0

You can now create a technical lineage for Netezza and Sybase ASE data sources.

1.2.0

Collibra Data Lineage now supports SSIS transformations.

1.1.7

You can now create a technical lineage for SQL Server Integration Services (SSIS).

1.1.7

You can now create a custom technical lineage using a JSON file.

1.1.3

You need to provide specific information necessary to connect to Collibra cloud in the techlin section of the configuration file.

{
	"general": {
		"techlin": {
			"userKey": "my-userkey"},

1.1.3

The extractQueries field is now removed from the configuration file. The queries of your database are downloaded automatically.

1.1.3

You can now create a technical lineage for Informatica PowerCenter.

1.1.1

You can now create a technical lineage for the following data sources:

  • Amazon Redshift
  • Azure SQL server
  • Google BigQuery
  • HiveQL
  • IBM DB2
  • Microsoft SQL Server
  • Oracle
  • PostgreSQL
  • SAP Hana
  • Snowflake
  • Spark SQL
  • Teradata