The lineage harvester change log

Collibra Data Lineage is updated and improved on a regular basis. On this page, you can see the most important changes between different versions of the lineage harvester. For a complete list, see the release notes.

Note In the documentation, we assume that you have the most recent version of the lineage harvester. We highly recommend to download and use the newest lineage harvester from the Collibra downloads page even if you are on an older version of Collibra Data Intelligence Cloud.

Warning If you upgrade to lineage harvester 1.3.0 or newer, you have to follow an upgrade procedure.

The following list contains the most important changes to the lineage harvester and its configuration file.

Changed in version

New lineage harvester improvements

2022.08
  • Previously, when you created a technical lineage for a supported BI tool, the nodes in the (Undefined variable: technical-lineage.techlingraphlc) had a gray background, even if the data objects from your data source were stitched to assets in Data Catalog. Data objects now have the intended yellow background when creating a technical lineage for Power BI. This enhancement was introduced for Tableau or Looker in Collibra 2022.07. Soon, the enhancement will also apply to SSRS and PBRS.
  • When synchronizing Tableau, the synchronization no longer fails if two data sources in the same project with the same name are returned from the Tableau API. The assets of both data sources are now synchronized in Collibra.
  • You can now filter on the Tableau project level.
  • When integrating Power BI, you can now ingest measures and show them in the technical lineage. Measures are included as the value in the Role in Report attribute on Power BI Column asset pages.
  • When attempting to integrate Power BI with invalid Power BI credentials, the lineage harvester log file now provides a more helpful error message.
  • When you specify the Power BI workspaces for ingestion, the filters are not case sensitive now.
  • When integrating Looker, the ownership information (email address only) for folders, Looks and dashboards is now ingested in Collibra. The new Owner in source attribute is included on Looker Folder, Looker Look and Looker Dashboard asset pages.
  • The lineage harvester log file now identifies whether you are using Tableau Online or Tableau Server, and the version of your Tableau environment.
2022.07
  • The lineage harvester now retries to get a batch status again if the first HTTP call failed due to a network error.
  • Fixed an issue that was causing custom SQL queries to be identified as belonging to two different Tableau data sources. This resulted in a "Unique constraint failed" error.
  • Fixed an issue that was resulting in the No asset matches the specified criteria error.
  • When the lineage harvester fetches an access key for a data store, only active records are now fetched. Inactive records are ignored.
  • The lineage harvester is more resilient against authorization expiration when ingesting Looker metadata.
  • The lineage harvester log file now includes the following information:
    • Your Tableau environment type: Tableau Online or Tableau Server type
    • The version of your Tableau environment
2022.06
  • When synchronizing Power BI, the last sync time is now correctly shown in the Sources tab page.
  • Fixed an issue that was causing the processing of harvested metadata batches to run without coming to completion.
  • When ingesting Power BI, if there are Oracle data sources, the Oracle service name is now used, instead of the database name.
  • When processing Tableau metadata, the Collibra Data Lineage servers no longer replace ">>" by "<}", which was resulting in parsing errors.
  • Fixed an [SQLITE_ERROR] issue that was breaking the technical lineage when attempting to synchronize a data source.
  • When processing Power BI metadata, SQL statements are now in upper case.
  • When creating a technical lineage for Tableau, any unnecessary brackets “][“ in the names of schemas are now removed.
  • When integrating Power BI, you can now ingest measures without DAX. They are shown as attribute type Role in Report on Power BI Column asset pages.
2022.05

Warning The lineage harvester 2022.05 includes an internal format change to the password manager pwd.conf file. This means that if you use Lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester.

  • You can now integrate Power BI in Data Catalog via the lineage harvester, meaning you no longer need to use the Power BI harvester. Additional benefits include the following:
    • Support for Power BI Data Flows.
    • Descriptions of Power BI Reports.
    • Statuses of Power BI Workspaces.
    • Filtering and domain mapping.
    Note The new Power BI integration method is specifically for new integrations. For those who have been ingesting Power BI via the Power BI harvester, we will soon release a migration script.
  • Collibra Data Lineage now also supports the following BI integrations:
    • MicroStrategy
    • SQL Server Reporting Services and Power BI Report Server.
  • You can now use token-based authentication when creating a technical lineage for Matillion.
    Warning This enhancement is not backwards compatible. You must update your configuration file.If you use the lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester.
  • The useCollibraSystemName property is now solely used for the configuration of the system name.
  • If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but don't define the system name in the Tableau <source ID> configuration file, the system name in the Tableau technical lineage shows DEFAULT as the system name.
  • If using a Tableau <source ID> configuration file:
    • You can now use wildcards throughout the file.
    • The hostName and connectorUrl properties are no longer case-sensitive.
  • The PostgreSQL JDBC driver is now upgraded from from 42.3.2 to 42.3.3.
  • The Apache Hive JDBC driver is now upgraded from 2.6.17.1020 to 2.6.19.2022.
  • The lineage harvester no longer hangs when harvesting metadata from certain data sources.
  • The lineage harvester automatically refreshes Tableau tokens.
  • You can now use the optional concurrencyLevel property in the lineage harvester configuration file, to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.
2022.04
  • You can now use the databaseMapping property in your Tableau <source ID> configuration file, to map a Tableau technical database name to the real database name.
  • When providing connection definitions for Informatica PowerCenter, the dbname property is no longer case-sensitive.
  • When integrating Informatica PowerCenter data sources, Collibra Data Lineage now correctly creates a technical lineage when useCollibraSystemName is set to true.
2022.03
  • By default, the lineage harvester no longer harvests images. If you want to include images, include the optional excludeImages property in your configuration file and set the value to false.
  • When ingesting Tableau metadata, you can now leave empty the collibraSystemName property in your configuration file, even if the useCollibraSystemName property is set to true.
  • The lineage harvester now correctly shows the help overview when you run the --help command.
  • Hive source now skips harvesting DDL of exclusively locked tables.
  • When you change the domain reference ID in the lineage harvester configuration file, Tableau assets are now successfully deleted from the previous domain and recreated in the new domain.
  • You no longer see a Fiber Failed error while running the lineage harvester.
  • Protobuf is upgraded to version 3.19.3.
  • Fixed an issue that was causing incomplete technical lineage and stitching issues when using custom SQL in Tableau.
  • Fixed an issue that resulted in a TableauHarvesterError when ingesting Tableau metadata via the linage harvester.
  • Fixed a NullPointerException when no column data type is harvested.
  • Fixed an issue that was causing the ingestion of Looker metadata to fail.
  • Fixed an issue that was causing a JsonParseError when ingesting Tableau metadata.
2022.02
1.4.4

The lineage harvester now supports:

  • Technical lineage for Matillion. Redshift and Snowflake projects in Matillion are supported.
  • Snowflake syntax for the CONNECT BY clause.
1.4.3

The lineage harvester log output now includes Collibra Data Lineage server processing information.

1.4.2

Collibra Data Lineage has improved Teradata parsing.

1.4.1

The lineage harvester for IBM DataStage now supports environment files.

1.4.1

You can now add connection information to the Informatica Intelligent Cloud Services <source ID> configuration file.

1.4.1

You can now request MicroStrategy as a lineage harvester integration in beta.

1.4.0

You can now request the following lineage harvester integrations in beta:

  • AWS Glue script annotations
  • Matillion
  • Power BI Report Server
  • SQL Server Reporting Services

1.4.0

The lineage harvester logs now shows message codes to inform you of an issue.

1.4.0

The user that runs the lineage harvester no longer need elevated permissions to access Snowflake metadata.

You need a role that can access the snowflake shared read-only database.

To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.

1.3.5

The lineage harvester configuration file and Power BI harvester configuration files now have a useCollibraSystemName property. You use this property to enable the harvesters to process the value in collibraSystemName properties and map the structure of the data source to system > database > schema > table > column, which you can see in the technical lineage Browse tab pane.

By default, this property is set to False.

1.3.5

You can now create a separate configuration file for each data source to define the collibraSystemName property. For more information about this option, see the following topics:

1.3.5

You can now use the customConnectionProperties field for Microsoft SQL Sever JDBC sources.