The lineage harvester change log

Collibra Data Lineage is updated and improved on a regular basis. On this page, you can see the most important changes between different versions of the lineage harvester. For a complete list, see the release notes.

Note In the documentation, we assume that you have the most recent version of the lineage harvester. We highly recommend to download and use the newest lineage harvester from the Collibra downloads page even if you are on an older version of Collibra Data Intelligence Cloud.

Warning If you upgrade to lineage harvester 1.3.0 or newer, you have to follow an upgrade procedure.

The following list contains the most important changes to the lineage harvester and its configuration file.

Changed in version	New lineage harvester improvements
2022.08	Previously, when you created a technical lineage for a supported BI tool, the nodes in the (Undefined variable: technical-lineage.techlingraphlc) had a gray background, even if the data objects from your data source were stitched to assets in Data Catalog. Data objects now have the intended yellow background when creating a technical lineage for Power BI. This enhancement was introduced for Tableau or Looker in Collibra 2022.07. Soon, the enhancement will also apply to SSRS and PBRS. When synchronizing Tableau, the synchronization no longer fails if two data sources in the same project with the same name are returned from the Tableau API. The assets of both data sources are now synchronized in Collibra. You can now filter on the Tableau project level. When integrating Power BI, you can now ingest measures and show them in the technical lineage. Measures are included as the value in the Role in Report attribute on Power BI Column asset pages. When attempting to integrate Power BI with invalid Power BI credentials, the lineage harvester log file now provides a more helpful error message. When you specify the Power BI workspaces for ingestion, the filters are not case sensitive now. When integrating Looker, the ownership information (email address only) for folders, Looks and dashboards is now ingested in Collibra. The new Owner in source attribute is included on Looker Folder, Looker Look and Looker Dashboard asset pages. The lineage harvester log file now identifies whether you are using Tableau Online or Tableau Server, and the version of your Tableau environment.
2022.07	The lineage harvester now retries to get a batch status again if the first HTTP call failed due to a network error. Fixed an issue that was causing custom SQL queries to be identified as belonging to two different Tableau data sources. This resulted in a "Unique constraint failed" error. Fixed an issue that was resulting in the No asset matches the specified criteria error. When the lineage harvester fetches an access key for a data store, only active records are now fetched. Inactive records are ignored. The lineage harvester is more resilient against authorization expiration when ingesting Looker metadata. The lineage harvester log file now includes the following information: Your Tableau environment type: Tableau Online or Tableau Server type The version of your Tableau environment
2022.06	When synchronizing Power BI, the last sync time is now correctly shown in the Sources tab page. Fixed an issue that was causing the processing of harvested metadata batches to run without coming to completion. When ingesting Power BI, if there are Oracle data sources, the Oracle service name is now used, instead of the database name. When processing Tableau metadata, the Collibra Data Lineage servers no longer replace ">>" by "<}", which was resulting in parsing errors. Fixed an [SQLITE_ERROR] issue that was breaking the technical lineage when attempting to synchronize a data source. When processing Power BI metadata, SQL statements are now in upper case. When creating a technical lineage for Tableau, any unnecessary brackets “][“ in the names of schemas are now removed. When integrating Power BI, you can now ingest measures without DAX. They are shown as attribute type Role in Report on Power BI Column asset pages.
2022.05	Warning The lineage harvester 2022.05 includes an internal format change to the password manager pwd.conf file. This means that if you use Lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester. You can now integrate Power BI in Data Catalog via the lineage harvester, meaning you no longer need to use the Power BI harvester. Additional benefits include the following: Support for Power BI Data Flows. Descriptions of Power BI Reports. Statuses of Power BI Workspaces. Filtering and domain mapping. Note The new Power BI integration method is specifically for new integrations. For those who have been ingesting Power BI via the Power BI harvester, we will soon release a migration script. Collibra Data Lineage now also supports the following BI integrations: MicroStrategy SQL Server Reporting Services and Power BI Report Server. You can now use token-based authentication when creating a technical lineage for Matillion. Warning This enhancement is not backwards compatible. You must update your configuration file.If you use the lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester. The `useCollibraSystemName` property is now solely used for the configuration of the system name. If you set the `useCollibraSystemName` property to `true` in your lineage harvester configuration file, but don't define the system name in the Tableau <source ID> configuration file, the system name in the Tableau technical lineage shows DEFAULT as the system name. If using a Tableau <source ID> configuration file: You can now use wildcards throughout the file. The hostName and connectorUrl properties are no longer case-sensitive. The PostgreSQL JDBC driver is now upgraded from from 42.3.2 to 42.3.3. The Apache Hive JDBC driver is now upgraded from 2.6.17.1020 to 2.6.19.2022. The lineage harvester no longer hangs when harvesting metadata from certain data sources. The lineage harvester automatically refreshes Tableau tokens. You can now use the optional `concurrencyLevel` property in the lineage harvester configuration file, to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.
2022.04	You can now use the `databaseMapping` property in your Tableau <source ID> configuration file, to map a Tableau technical database name to the real database name. When providing connection definitions for Informatica PowerCenter, the `dbname` property is no longer case-sensitive. When integrating Informatica PowerCenter data sources, Collibra Data Lineage now correctly creates a technical lineage when `useCollibraSystemName` is set to `true`.
2022.03	By default, the lineage harvester no longer harvests images. If you want to include images, include the optional `excludeImages` property in your configuration file and set the value to `false`. When ingesting Tableau metadata, you can now leave empty the `collibraSystemName` property in your configuration file, even if the `useCollibraSystemName` property is set to `true`. The lineage harvester now correctly shows the help overview when you run the `--help` command. Hive source now skips harvesting DDL of exclusively locked tables. When you change the domain reference ID in the lineage harvester configuration file, Tableau assets are now successfully deleted from the previous domain and recreated in the new domain. You no longer see a Fiber Failed error while running the lineage harvester. Protobuf is upgraded to version 3.19.3. Fixed an issue that was causing incomplete technical lineage and stitching issues when using custom SQL in Tableau. Fixed an issue that resulted in a TableauHarvesterError when ingesting Tableau metadata via the linage harvester. Fixed a NullPointerException when no column data type is harvested. Fixed an issue that was causing the ingestion of Looker metadata to fail. Fixed an issue that was causing a JsonParseError when ingesting Tableau metadata.
2022.02	Click here for the list of general changes. The Hive JDBC driver is upgraded to AmazonHiveJDBC42-2.6.17.1020. Netty libraries are upgraded to version 4.1.72. System name added to creation of SQL sub-batch. When ingesting metadata from Microsoft SQL Server data sources, the dash character “-“ in database names no longer causes ingestion to fail. Upgraded JDBC drivers for MySQL, PostgreSQL, Teradata, Snowflake, HiveQL, Spark SQL and Microsoft SQL Server data sources. Discontinued support for Active directory authentication for Azure data sources. Applied the "log4j2.formatMsgNoLookups=true" system property to the lineage harvester, as a mitigation step for CVE-2021-44228. The lineage harvester now correctly handles the * EXCEPT syntax for SQL scripts in BigQuery. The lineage harvester can now harvest parameter files in IICS data sources. The lineage harvester now renews the Looker API token if harvesting takes longer than one hour, to avoid an HTTP 401 Unauthorized error. When metadata from Snowflake data sources are analyzed, schema names are no longer wrapped in double-quotes. Fixed computational inefficiency in SQL scanner in case of multiple nested subSELECTs with wildcards. Support added for ALTER TABLE RENAME TO statement in Postgres. Support for Oracle package specifications split into multiple source files (e.g. package definition in one source file and package body in another source file). Click here for the list of parsing enhancements for various data source types. Microsoft SQL Server: all variants of hexadecimal literals. BigQuery: PARTITION BY clause in CREATE TABLE statements. Oracle: ENCRYPT algorithm in CREATE TABLE column definition. ON OVERFLOW clause of LISTAGG function. Redshift: Optional enclosing brackets [] for table references. DELETE queries that have WITH statements. SQL Server Integration Services: components of type "Microsoft.ScriptComponentHost", which is a subtype of the "Microsoft.ManagedComponentHost". HiveQL: Support for table references starting with numerical digits." Support for "pivot" as a table alias. Support for digit-starting column references Support for digit-starting aliases "default" allowed as schema name Support for grouping sets Better support for parsing "array" and "map" data types Support for parsing "struct" data types IBM DB2: Support for the STRIP function. Support for ¬=, ¬>, ¬<, !> and !< operators. Support for special registers, for example CURRENT SQLID and CURRENT SERVER. Support for CCSID clauses in CREATE TABLE statements. Support for APPEND clauses in CREATE TABLE statements. Support for VOLATILE clauses in CREATE TABLE statements. Support for DATA CAPTURE clauses in CREATE TABLE statements. Support for AUDIT clauses in CREATE TABLE statements. Fix CASE expression. Some keywords allowed as column names and column aliases. Click here for the list of Tableau-specific changes. The lineage harvester can now connect to Tableau Server or Tableau Online and ingest its metadata. Minor Tableau API improvements, including a fix for an issue that affected databases and tables. Upgraded JDBC drivers for MySQL, PostgreSQL, Teradata, Snowflake, Hive/Spark and Microsoft SQL. Dropped support for Active directory authentication for Azure sources. The new Tableau integration via lineage harvester supports the Tableau Explorer (can publish) role. Applied the "log4j2.formatMsgNoLookups=true" system property to the harvester as a mitigation step for CVE-2021-44228. You can now define custom pagination settings to help avoid node limit errors.
1.4.4	The lineage harvester now supports: Technical lineage for Matillion. Redshift and Snowflake projects in Matillion are supported. Snowflake syntax for the CONNECT BY clause.
1.4.3	The lineage harvester log output now includes Collibra Data Lineage server processing information.
1.4.2	Collibra Data Lineage has improved Teradata parsing. Click here for a list of the newly supported SQL syntax. ALTER TYPE ALTER PROCEDURE CREATE/REPLACE AUTHORIZATION UPDATE/INSERT (update set... else insert...) NORMALIZE table attribute in CREATE TABLE MLOAD (MultiLoad) RECORD (FastLoad) BEGIN/END QUERY LOGGING Functions with schema, for example schema_name.function.name(args...) Functions with conversation, for example function_name(args...) RETURNS VARCHAR(<number>) CHARACTER SET LATIN DEFAULT TIME data type attribute CHARACTER LARGE OBJECT as data type (equals CLOB) INLINE LENGTH data type attribute AS LOCATOR parameter type attribute Macro argument attributes ALL keyword in UPDATE statements Additionally, Collibra Data Lineage now supports the BTEQ command after a line ending with a multiline comment and has improved the parsing of the TABLE function.
1.4.1	The lineage harvester for IBM DataStage now supports environment files.
1.4.1	You can now add connection information to the Informatica Intelligent Cloud Services <source ID> configuration file.
1.4.1	You can now request MicroStrategy as a lineage harvester integration in beta.
1.4.0	You can now request the following lineage harvester integrations in beta: AWS Glue script annotations Matillion Power BI Report Server SQL Server Reporting Services
1.4.0	The lineage harvester logs now shows message codes to inform you of an issue.
1.4.0	The user that runs the lineage harvester no longer need elevated permissions to access Snowflake metadata. You need a role that can access the snowflake shared read-only database. To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.
1.3.5	The lineage harvester configuration file and Power BI harvester configuration files now have a `useCollibraSystemName` property. You use this property to enable the harvesters to process the value in `collibraSystemName` properties and map the structure of the data source to system > database > schema > table > column, which you can see in the technical lineage Browse tab pane. By default, this property is set to `False`.
1.3.5	You can now create a separate configuration file for each data source to define the `collibraSystemName` property. For more information about this option, see the following topics: The Informatica <source ID> configuration file The IBM DataStage or SQL Server Integration Services connection definition configuration files. The Informatica Intelligent Cloud Services <source ID> configuration file. The Power BI <source ID> configuration file. The Looker <source ID> configuration file. The JSON files with a predefined lineage.
1.3.5	You can now use the `customConnectionProperties` field for Microsoft SQL Sever JDBC sources.