lineage harvester change log

Collibra Data Lineage is updated and improved on a regular basis. On this page, you can see the most important changes between different versions of the lineage harvester. For a complete list, see the release notes.

Note In the documentation, we assume that you have the most recent version of the lineage harvester. We highly recommend to download and use the newest lineage harvester from the Collibra downloads page even if you are on an older version of Collibra Data Intelligence Cloud.

Warning If you upgrade to lineage harvester 1.3.0 or newer, you have to follow an upgrade procedure.

The following list contains the most important changes to the lineage harvester and its configuration file.

Changed in version

New lineage harvester functionality

Old lineage harvester functionality

1.4.4

The lineage harvester now supports:

  • Technical lineage for Matillion. Redshift and Snowflake projects in Matillion are supported.
  • Snowflake syntax for the CONNECT BY clause.

The lineage harvester did not support:

  • Technical lineage for Matillion.
  • Snowflake syntax for the CONNECT BY clause.
1.4.3

The lineage harvester log output now includes Collibra Data Lineage server processing information.

Collibra Data Lineage server processing information was not included in the log output.

1.4.2

Collibra Data Lineage has improved Teradata parsing.

Collibra Data Lineage can successfully process Teradata SQL statements, but there might be some parsing errors.

1.4.1

The lineage harvester for IBM DataStage now supports environment files.

The lineage harvester for IBM DataStage only supports DataStage project files.

1.4.1

You can now add connection information to the Informatica Intelligent Cloud Services <source ID> configuration file.

You cannot add additional connection information to the Informatica Intelligent Cloud Services <source ID> configuration file.

1.4.1

You can now request MicroStrategy as a lineage harvester integration in preview mode.

Collibra Data Lineage does not support MicroStrategy.

1.4.0

You can now request the following lineage harvester integrations in preview mode:

  • AWS Glue script annotations
  • Matillion
  • Power BI Report Server
  • SQL Server Reporting Services

The lineage harvester integrations are not available.

1.4.0

The lineage harvester logs now shows message codes to inform you of an issue.

The lineage harvester logs do not show message codes.

1.4.0

The user that runs the lineage harvester no longer need elevated permissions to access Snowflake metadata.

You need a role that can access the snowflake shared read-only database.

To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.

The user that runs the lineage harvester needs elevated permissions to access Snowflake metadata.

You need read access on information_schema.

To access information_schema, the Snowflake database requires you to have the admin user permission. If you do not have this permission, not all view definitions can be collected and processed.

1.3.5

The lineage harvester configuration file and Power BI harvester configuration files now have a useCollibraSystemName property. You use this property to enable the harvesters to process the value in collibraSystemName properties and map the structure of the data source to system > database > schema > table > column, which you can see in the technical lineage Browse tab pane.

By default, this property is set to False.

The useCollibraSystemName property cannot be used. You can add the system or server name of a database to the lineage harvester configuration file in the collibraSystemName property, but it won't be taken into account while processing the data source.

1.3.5

You can now create a separate configuration file for each data source to define the collibraSystemName property. For more information about this option, see the following topics:

Separate configuration files are not supported.

1.3.5

You can now use the customConnectionProperties field for Microsoft SQL Sever JDBC sources.

You cannot use the customConnectionProperties field for Microsoft SQL Sever JDBC sources.

1.3.4

HiveQL sources no longer have connection type JDBC. You can only create a technical lineage for HiveQL sources via folder.

Tip If you previously had a database section in the configuration file with a HiveQL source, change the section to match the properties of a directory section before you run the harvester 1.3.4 or newer.

HiveQL sources have connection type JDBC and folder.

1.3.4

You can now create a technical lineage for Informatica Intelligent Cloud Services. Specifically, for the Cloud Data Integration service.

You add the connection information to the lineage harvester configuration file.

You cannot create a technical lineage for Informatica Intelligent Cloud Services.

1.3.4

You can now also add other data source connectors to the connection definition file for DataStage.

You can only create a connection definition file for ODBC data sources in DataStage.

1.3.4

The lineage harvester configuration file for HiveQL, Spark SQL, PostgreSQL, Redshift and Snowflake data sources can now have a customConnectionProperties property to provide specific connection properties.

The lineage harvester cannot manage custom connection properties.

1.3.3

You can now use connection definitions to create a technical lineage for SQL Server Integration Services.

You cannot use connection definitions to create a technical lineage for SQL Server Integration Services.

1.3.3

You can add multiple Google BigQuery projects in the configuration file in the "projectIDs" property. The "projectName" property is now deprecated.

You can add only one Google BigQuery project in the configuration file in the "projectName" property.

1.3.2

Collibra Data Lineage can now process transformation logic for IBM DataStage.

Collibra Data Lineage does not support transformation logic for IBM DataStage.

1.3.2

The Collibra Data Lineage server now has an IP address for a server located in Canada, for Google cloud users: 35.197.182.41.

The Collibra Data Lineage server does not have a location in Australia.

1.3.1

The Collibra Data Lineage server now has an IP address for a server located in Canada, for AWS users: 15.222.200.199.

The Collibra Data Lineage server does not have a location in Canada.

1.3.1

You can now create a technical lineage for MySQL data sources.

You cannot create a technical lineage for MySQL data sources.

1.3.0

The lineage harvester now gives you the option to provide your passwords via stdin or a password manager.

The lineage harvester encrypts and stores your passwords in the lineage harvester folder.

1.3.0

The lineage harvester now supports IBM InfoSphere DataStage.

The lineage harvester does not support IBM InfoSphere DataStage.

1.3.0

The lineage harvester now supports Looker integration.

The lineage harvester does not support Looker integration.

1.3.0

The lineage harvester now connects to one of the servers with the following IP addresses:

  • 18.198.89.106 (techlin-aws-eu)
  • 54.242.194.190 (techlin-aws-us)
  • 35.205.146.124 (techlin-gcp-eu)
  • 34.73.33.120 (techlin-gcp-us)

Note The lineage harvester connects to different servers based on your geographical location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources and you have to restart your DGC service.

The lineage harvester only connects to the Collibra Data Lineage server with IP address 3.125.57.74.

1.2.1

You can now use the lineage harvester to import new Power BI assets, relations and a technical lineage into Data Catalog.

The lineage harvester does not support Power BI.

1.2.0

The general section of the configuration file shows the following:

  • A catalog section: This part contains the connection details needed to connect to Data Catalog.

You no longer need an API key to connect to Collibra cloud. This part of the configuration file is optional and not shown when you create the file via the lineage harvester. You can no longer use it in lineage harvester 1.3.0.

{
 "general": {
  "catalog" : {
   "url" : ""}
},

The general section of the configuration file shows the following:

  • A techlin section: You need an API key to connect to Collibra cloud. This part of the configuration file is mandatory.
  • A collibra section: This part contains the connection details needed to connect to Data Catalog.
{
 "general": {
  "techlin":
   "userKey": ""},
  "collibra" : {
   "url" : ""}
},

1.2.0

You can now create a technical lineage for Netezza and Sybase ASE data sources.

You cannot create a technical lineage for Netezza or Sybase ASE data sources.

1.2.0

Collibra Data Lineage now supports SSIS transformations.

Collibra Data Lineage does not support SSIS transformations.

1.1.7

You can now create a technical lineage for SQL Server Integration Services (SSIS).

You cannot create a technical lineage for SQL Server Integration Services (SSIS).

1.1.7

You can now create a custom technical lineage using a JSON file.

You cannot create a custom technical lineage using a JSON file.

1.1.3

You need to provide specific information necessary to connect to Collibra cloud in the techlin section of the configuration file.

{
	"general": {
		"techlin": {
			"userKey": "my-userkey"},

You need to provide specific information necessary to connect to Collibra cloud in the sqldep section of the configuration file.

{
	"general": {
		"sqldep": {
			"userKey": "my-userkey"},

1.1.3

The extractQueries field is now removed from the configuration file. The queries of your database are downloaded automatically.

You use the extractQueries field to indicate whether or not you want to download the queries of your database.

1.1.3

You can now create a technical lineage for Informatica PowerCenter.

You cannot create a technical lineage for Informatica PowerCenter.

1.1.1

You can now create a technical lineage for the following data sources:

  • Amazon Redshift
  • Azure SQL server
  • Google BigQuery
  • HiveQL
  • IBM DB2
  • Microsoft SQL Server
  • Oracle
  • PostgreSQL
  • SAP Hana
  • Snowflake
  • Spark SQL
  • Teradata

You can create a technical lineage for the following data sources:

  • Amazon Redshift
  • Azure SQL server
  • Google BigQuery
  • HiveQL
  • Microsoft SQL Server
  • Oracle
  • PostgreSQL
  • Snowflake
  • Spark SQL
  • Teradata