The lineage harvester change log

Collibra Data Lineage is updated and improved on a regular basis. On this page, you can see the most important changes between different versions of the lineage harvester. For a complete list, see the release notes.

Note We highly recommend to download and use the newest lineage harvester from the Collibra downloads page, even if you are on an older version of Collibra Data Intelligence Cloud.

Note Hidden Tableau worksheets were briefly excluded from the Tableau ingestion, as part of code changes in the 2022.11 release. This change was reverted. Instead, we urge you to review the worksheets that have the attribute “Visible on server” in Tableau set to "false". If a worksheet is not hidden, the "Visible on server" attribute in Collibra is now set to “yes”. You can track the hiding or unhiding of the worksheet in the History tab of the asset page. If you have Collibra Data Marketplace, you can filter out hidden worksheets by excluding those that have "Visible on server" set to "false".

The following list contains the most important changes to the lineage harvester and its configuration file.

Changed in version

New lineage harvester improvements

2023.01
  • When you integrate Power BI,
    • Collibra Data Lineage now supports the Power Query M function Table.Combine. If Collibra Data Lineage can’t determine the column names in multiple sources, a dummy column with the value “*” is now created in the sources and Power BI tables, which preserves the technical lineage at the table level. For complete details, see Supported Power Query M functions. If you use this function, Table.Combine function is used. You can now view a technical lineage at the table level, where previously analyze error “Cannot determine source table for column”.
    • The technical lineage now correctly shows a yellow background when columns and tables are stitched.
    • If you use a <source ID> configuration file, you no longer have to include the filters section.
  • When you integrate Tableau:
    • If a Tableau worksheet is hidden in Tableau, the “Visible on server” attribute of the Tableau Worksheet asset in Collibra now has the value false. If it is not hidden, the attribute has the value true.
    • Metadata batches no longer fail if CREATE TECHLIN VIEW statements fail due to analysis errors.
    • Collibra Data Lineage service benefits from improved parsing of BigQuery quoted identifiers, for example `a.b`.`c`.
    • Tableau filtering now works as intended. Previously, filtering didn't work if, for example, you moved an older Tableau project under a newer project.
    • Fixed the ordering of columns for Tableau technical lineage custom queries.
    • Tableau Data Attributes are no longer shown twice, once with the UUID in the name and once without, in the technical lineage Browse tab pane.
    • The "Document size" attribute type and value are now shown for Tableau Workbook assets.
    • If you don't have permissions to access a parent project, but the lineage harvester identifies published data sources that belong to the project, the lineage harvester creates an ‘Unknown project’ that has the UUID of the inaccessible parent project. To avoid an error, the lineage harvester can now correctly link the published data sources to the unknown project.
  • Collibra Data Lineage service now supports the Power Query M function Value.NativeQuery.
    Note Query parameters are supported, but core parameters are not.
  • When you integrate Power BI or Azure Data Factory (currently in public beta), the lineage harvester now connects to the Microsoft cloud instance, instead of the login.microsoftonline.com host.
  • When you ingest SQL Server Reporting Services (SSRS) and you set the “useCollibraSystemName” property to “true”, SSRS now has its own node in the navigation tree of the Technical lineage Browse tab pane.
  • When you ingest Oracle data sources using the DatabaseOracle source type, passwords are now stored per url, username and db instead of just url and username. With this enhancement, you can connect to Oracle Pluggable Databases, for which a single user can have the same username and different passwords for each of their pluggable databases.
  • For Informatica PowerCenter technical lineage, when a PowerCenter mapplet had an associated shortcut, technical lineage in Collibra would be broken up. Now, there is end-to-end lineage within PowerCenter even when a mapplet has an associated shortcut.
  • Fixed a ValidationError related to the unsupported Exasol dialect. The Postgres dialect is now used in place of Exasol dialect.
2022.11
  • When you integrate Power BI:
    • Inactive workspaces and personal workspaces are no longer ingested.
    • Filtering is improved. You can now use the optional properties excludeWorkspaceNames and excludeWorkspaceIds to exclude specified workspaces. Before configuring your filters, ensure that you read all about the advantages, limitations and configuration considerations in Power BI workspaces.
    • The ownership information (admin and creator email addresses only) for reports is now ingested in Collibra. The "Owner in source" attribute is included on Power BI Report asset pages.
    • The email addresses of all admins and creators of Power BI data models and workspaces are now ingested. Previously only a single email address was ingested, even if there were multiple admins or creators of the data object in Power BI.
  • When you ingest Snowflake data sources, the databaseNames property is now correctly taken into consideration.
  • When you integrate Tableau:
    • Previously, when you filtered on a site, a Tableau Site asset was created in Collibra, but no metadata was ingested. Now, when you filter on a site, all metadata in the site is ingested in the specified domain. If, however, a site is specified in the lineage harvester configuration file, but not in the filters and domainMapping properties in the Tableau <source ID> configuration file, the metadata is ingested in the default domain.
    • You can now use wildcards in the filters property in the Tableau <source ID> configuration file. Also, the filters property is no longer case-sensitive.
    • You can now ingest sites that don't have workbooks.
    • Ownership information (email addresses only) for projects, data models, workbooks and dashboard is now ingested in Collibra. The Owner in source attribute is included on Tableau Project, Tableau Data Model, Tableau Workbook and Tableau Dashboard asset pages.
  • When you ingest Informatica PowerCenter data sources, the lineage harvester now correctly processes session mapplets. Previously, this failed with error message "'NoneType' object has no attribute 'lower'".
  • When you ingest Informatica Intelligent Cloud Services data sources and the useCollibraSystemNames property is set to true, databases are now shown in the Technical lineage Browse tab pane with the specified system name or as "UNDEFINED”, if a database could not be mapped to a system name. If set to false, then all databases are now shown directly under the DATABASE node.
  • When you ingest metadata from Oracle data sources, you can now add a new DatabaseOracle section in your lineage harvester configuration file, to specify the Oracle database name and ensure stitching without any workarounds.
  • If you integrate SSRS-PBRS and use a <source ID> configuration file, the CustomDataSource section in the <source ID> configuration file is no longer mandatory.
  • The lineage harvester now uses Looker 4.0 APIs, with paging options.
2022.10
  • The lineage harvester now supports the following IBM DB2 constructs: PREVVAL FOR <sequence>, PREVIOUS VALUE FRO <sequence>, NEXTVAL FOR <sequence> and NEXT VALUE FOR <sequence>.
  • You can now use the new optional "deleteRawMetadataAfterProcessing" property in your lineage harvester configuration file. With this property, you can delete your raw metadata from the Collibra Data Lineage service after processing. This property is applicable for all supported data sources.
  • When you specify a Data Catalog URL in the lineage harvester configuration file, it no longer matters whether you include a trailing slash (/) in the URL.
  • The Collibra Data Lineage service now supports the following transformations: Table.FromRecords and Table.IsEmpty.
  • Collibra Data Lineage now supports key-pair authentication when ingesting Snowflake data sources.
  • The PostgreSQL JDBC Driver is upgraded to version 42.4.1.
  • The Collibra Data Lineage service can now compute indirect lineage from set queries, which are queries with the UNION keyword with the ORDER BY clause.
  • When you integrate Power BI, the lineage harvester is now more resilient to OutOfMemory errors.
  • When you integrate Tableau and filter on a sub-project, the metadata of the parent project is no longer ingested in Collibra. However, the parent Tableau Project asset is created in the default domain, to preserve the hierarchy required for stitching.
  • Looker integration no longer fails if the "collibraSystemName" property is not included in the lineage harvester configuration file. If you want to specify the system name of a database in Looker, use the "collibraSystemName" property in the Looker source ID configuration file. If you don't specify a system name in the source ID configuration file, the system name in the technical lineage graph will be Default.
  • In the case of a lookup procedure when ingesting Informatica Intelligent Cloud Services data sources, if the CONNECTIONSUBTYPE parameter is empty, the Collibra Data Lineage service now looks to the CONNECTIONREFERENCE parameter for the name. If that is also empty, then the name in the VARIABLE parameter is used. The ensures the correct detection of the SQL dialect.
  • Fixed an issue related to dialect extraction when ingesting Informatica Intelligent Cloud Services data sources.
2022.09
  • Previously, when you created a technical lineage for Power BI, SQL Server Reporting Services (SSRS) or Power BI Report Server (PBRS), the nodes in the technical lineage graph had a gray background, even if the data objects from your data source were stitched to assets in Data Catalog. Data objects now have the intended yellow background when creating a technical lineage for Power BI, SSRS or PBRS. We introduced this enhancement for Tableau and Looker in Collibra 2022.07.
  • When you integrate Tableau, for every Tableau Workbook that you have permission to ingest, all Tableau Dashboards in the Workbooks are now correctly shown in the technical lineage graph. If you do not have permission on the Workbook or Dashboard level, the metadata of these data objects is not ingested.
  • When integrating Power BI, the ownership information (email address only) for reports is now ingested in Collibra. The new Owner in source attribute is included on Power BI Report asset pages.
  • The lineage harvester now uses Looker 4.0 APIs, with paging options.
  • When you integrate Power BI, the lineage harvester is now more resilient against OutOfMemory errors.
  • When you integrate Tableau and use domain mapping, subprojects are now ingested in the domains of their parent projects.
  • The Collibra Data Lineage service instances now benefit from the following parsing enhancements when integrating Snowflake data sources:
    • Support for the COLLATE keyword.
    • Support for EXTERNAL TABLE syntax.
  • When integrating Power BI, the descriptions of Data Set Tables and Data Set Columns in Power BI are now harvested.
  • Fixed an issue that was resulting in a processing error when a column referenced in an ORDER BY clause references a repeated column in the SELECT column list.
  • When integrating Tableau, you can now ingest sub-projects for which you have permission to ingest, even if you don’t have permission to ingest the parent projects.
2022.08
  • Previously, when you created a technical lineage for a supported BI tool, the nodes in the technical lineage graph had a gray background, even if the data objects from your data source were stitched to assets in Data Catalog. Data objects now have the intended yellow background when creating a technical lineage for Power BI. This enhancement was introduced for Tableau or Looker in Collibra 2022.07. Soon, the enhancement will also apply to SSRS and PBRS.
  • When synchronizing Tableau, the synchronization no longer fails if two data sources in the same project with the same name are returned from the Tableau API. The assets of both data sources are now synchronized in Collibra.
  • You can now filter on the Tableau project level.
  • When integrating Power BI, you can now ingest measures and show them in the technical lineage. Measures are included as the value in the Role in Report attribute on Power BI Column asset pages.
  • When attempting to integrate Power BI with invalid Power BI credentials, the lineage harvester log file now provides a more helpful error message.
  • When you specify the Power BI workspaces for ingestion, the filters are not case sensitive now.
  • When integrating Looker, the ownership information (email address only) for folders, Looks and dashboards is now ingested in Collibra. The new Owner in source attribute is included on Looker Folder, Looker Look and Looker Dashboard asset pages.
  • When integrating Power BI, the ownership information (email address only) for data sets and workspaces is now ingested in Collibra. The new Owner in source attribute is included on Power BI Data Model and Power BI Workspace asset pages.
  • The lineage harvester log file now identifies whether you are using Tableau Online or Tableau Server, and the version of your Tableau environment.
2022.07
  • The lineage harvester now retries to get a batch status again if the first HTTP call failed due to a network error.
  • Fixed an issue that was causing custom SQL queries to be identified as belonging to two different Tableau data sources. This resulted in a "Unique constraint failed" error.
  • Fixed an issue that was resulting in the No asset matches the specified criteria error.
  • When the lineage harvester fetches an access key for a data store, only active records are now fetched. Inactive records are ignored.
  • The lineage harvester is more resilient against authorization expiration when ingesting Looker metadata.
  • The lineage harvester log file now includes the following information:
    • Your Tableau environment type: Tableau Online or Tableau Server type
    • The version of your Tableau environment
2022.06
  • When synchronizing Power BI, the last sync time is now correctly shown in the Sources tab page.
  • Fixed an issue that was causing the processing of harvested metadata batches to run without coming to completion.
  • When ingesting Power BI, if there are Oracle data sources, the Oracle service name is now used, instead of the database name.
  • When processing Tableau metadata, the Collibra Data Lineage servers no longer replace ">>" by "<}", which was resulting in parsing errors.
  • Fixed an [SQLITE_ERROR] issue that was breaking the technical lineage when attempting to synchronize a data source.
  • When processing Power BI metadata, SQL statements are now in upper case.
  • When creating a technical lineage for Tableau, any unnecessary brackets “][“ in the names of schemas are now removed.
  • When integrating Power BI, you can now ingest measures without DAX. They are shown as attribute type Role in Report on Power BI Column asset pages.
2022.05

Warning The lineage harvester 2022.05 includes an internal format change to the password manager pwd.conf file. This means that if you use Lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester.

  • You can now integrate Power BI in Data Catalog via the lineage harvester, meaning you no longer need to use the Power BI harvester. Additional benefits include the following:
    • Support for Power BI Data Flows.
    • Descriptions of Power BI Reports.
    • Statuses of Power BI Workspaces.
    • Filtering and domain mapping.
    Note The new Power BI integration method is specifically for new integrations. For those who have been ingesting Power BI via the Power BI harvester, we will soon release a migration script.
  • Collibra Data Lineage now also supports the following BI integrations:
    • MicroStrategy
    • SQL Server Reporting Services and Power BI Report Server.
  • You can now use token-based authentication when creating a technical lineage for Matillion.
    Warning This enhancement is not backwards compatible. You must update your configuration file.If you use the lineage harvester 2022.05, you can no longer use the pwd.conf file with an older harvester.
  • The useCollibraSystemName property is now solely used for the configuration of the system name.
  • If you set the useCollibraSystemName property to true in your lineage harvester configuration file, but don't define the system name in the Tableau <source ID> configuration file, the system name in the Tableau technical lineage shows DEFAULT as the system name.
  • If using a Tableau <source ID> configuration file:
    • You can now use wildcards throughout the file.
    • The hostName and connectorUrl properties are no longer case-sensitive.
  • The PostgreSQL JDBC driver is now upgraded from from 42.3.2 to 42.3.3.
  • The Apache Hive JDBC driver is now upgraded from 2.6.17.1020 to 2.6.19.2022.
  • The lineage harvester no longer hangs when harvesting metadata from certain data sources.
  • The lineage harvester automatically refreshes Tableau tokens.
  • You can now use the optional concurrencyLevel property in the lineage harvester configuration file, to specify the internal sizing, meaning the amount of tasks that can be executed at the same time.
2022.04
  • You can now use the databaseMapping property in your Tableau <source ID> configuration file, to map a Tableau technical database name to the real database name.
  • When providing connection definitions for Informatica PowerCenter, the dbname property is no longer case-sensitive.
  • When integrating Informatica PowerCenter data sources, Collibra Data Lineage now correctly creates a technical lineage when useCollibraSystemName is set to true.
2022.03
  • By default, the lineage harvester no longer harvests images. If you want to include images, include the optional excludeImages property in your configuration file and set the value to false.
  • When ingesting Tableau metadata, you can now leave empty the collibraSystemName property in your configuration file, even if the useCollibraSystemName property is set to true.
  • The lineage harvester now correctly shows the help overview when you run the --help command.
  • Hive source now skips harvesting DDL of exclusively locked tables.
  • When you change the domain reference ID in the lineage harvester configuration file, Tableau assets are now successfully deleted from the previous domain and recreated in the new domain.
  • You no longer see a Fiber Failed error while running the lineage harvester.
  • Protobuf is upgraded to version 3.19.3.
  • Fixed an issue that was causing incomplete technical lineage and stitching issues when using custom SQL in Tableau.
  • Fixed an issue that resulted in a TableauHarvesterError when ingesting Tableau metadata via the linage harvester.
  • Fixed a NullPointerException when no column data type is harvested.
  • Fixed an issue that was causing the ingestion of Looker metadata to fail.
  • Fixed an issue that was causing a JsonParseError when ingesting Tableau metadata.
2022.02
1.4.4

The lineage harvester now supports:

  • Technical lineage for Matillion. Redshift and Snowflake projects in Matillion are supported.
  • Snowflake syntax for the CONNECT BY clause.