Supported data sources for technical lineage

Collibra Data Intelligence Cloud supports many data sources and metadata sources, including JDBC data sources, ETL tools and BI tools, for which you can create a technical lineage. You use these data sources when you prepare the configuration file and Data Catalog's physical data layer.

Note Using an older version of a data source might not work as expected; however, we don't expect problems if you use a newer version.

JDBC data sources

The following table shows the supported JDBC data sources and driver versions that have been tested. You can connect to them via a JDBC driver or by creating a folder.

JDBC data source type

Supported versions

Connection type

Scope

Amazon Redshift

1.2.34.1058 and newer

JDBC, Folder

SQL based input without stored procedures.

Azure SQL server

Newest version

JDBC, Folder

SQL based input and stored procedures.

Azure SQL Data Warehouse

Newest version

JDBC, Folder

SQL based input and stored procedures.

Azure Synapse Analytics

Newest version

JDBC, Folder

SQL based input and stored procedures.

Google BigQuery

Newest version

JDBC, Folder

SQL based input without stored procedures.

Greenplum

6.10 and newer

JDBC, Folder

SQL based input.

HiveQL (SQL-like statements)

2.3.5 and newer

JDBC, Folder

SQL based input and connection via an AWS host.

IBM DB2

11.5 and newer

JDBC, Folder

SQL based input without stored procedures.

Oracle

11g, 12c and newer

JDBC, Folder

SQL based input and stored procedures.

PostgreSQL

9.4, 9.5 and newer

JDBC, Folder

SQL based input without stored procedures.

Microsoft SQL Server

2014, 2016 and newer

JDBC, Folder

SQL based input and stored procedures.

MySQL

5.7, 8 and newer

JDBC, Folder

SQL based input without stored procedures.

Netezza

7.2.1.0 and newer

JDBC, Folder

SQL based input without stored procedures.

SAP Hana

2.00.40 and newer

JDBC, Folder

SQL based input and SAP HANA Information views, which includes attributes, analytic views and calculation views from database table or view data sources.

Script-based calculation views and stored procedures are out of scope.

Snowflake

Newest version

JDBC, Folder

SQL based input without stored procedures.

Spark SQL

2.4.3 and newer

JDBC, Folder

SQL based input and connection via an AWS host.

Sybase Adaptive Server Enterprise

16.0 SP02 and newer

JDBC, Folder

SQL based input without stored procedures.

Teradata

15.0, 16.20.07.01 and newer

JDBC, Folder

SQL based input, including BTEQ scripts.

ETL tools

The following table shows the supported ETL tools and driver versions that have been tested. You can connect to them via an API or by creating a folder.

ETL tool

Supported versions

Connection type

Scope

AWS Glue script annotations (beta)

N/A

Folder

Only script annotations including transformation details.

IBM InfoSphere DataStage

11.5 and newer

Folder

Commonly used DataStage ETL components including SQL overrides and transformation details.

Collibra Data Lineage supports IBM InfoSphere DataStage transformation logic.

You have to prepare a folder with all data objects that you want to process.

Informatica Intelligent Cloud Services, specifically Cloud Data Integration

Tip Data Integration is one of the Informatica Intelligent Cloud services.

Cloud, newest only

API

Commonly used transformations in Informatica Intelligent Cloud Services: Data Integration, including SQL overrides.

Supported data sources are locally stored flat files and databases.

Informatica PowerCenter

9.6 and newer

Folder

Commonly used transformations in Informatica PowerCenter, including SQL overrides.

You have to prepare a folder with all data objects that you want to process.

Matillion

Newest version

API

SQL based input without stored procedures.

The lineage harvester can only access Redshift and Snowflake projects.

SQL Server Integration Services (SSIS)

2012 and newer

Package format version 6 or newer.

Folder

All commonly used transformations in SSIS, data flows and mappings, including SQL overrides.

Important SQL statements from Excel are not supported.

You have to prepare a folder with all data objects that you want to process.

BI tools

The following table shows the supported BI tools.

BI tool

Tested versions

Connection type

Tableau

Newest

Tableau.

You have to prepare:

Power BI

Newest

Existing lineage.

You have to run the Power BI harvester and the lineage harvester to ingest Power BI metadata.

Power BI (NEW) Newest

Power BI.

The new Power BI integration includes many enhancements, including consolidated harvesters, meaning you no longer need the Power BI harvester. You only need to prepare:

Looker

Newest

Looker.

Collibra Data Lineage automatically creates a technical lineage, but stitching is not available.

You have to prepare a lineage harvester configuration file for Looker ingestion.

SQL Server Reporting Services or Power BI Report Server

2019 and newer

SSRS-PBRS.

You have to prepare:

MicroStrategy

Newest

MicroStrategy

Collibra Data Lineage automatically creates a technical lineage, but stitching is not available and the technical lineage does not show the relations to columns.

You have to prepare a lineage harvester configuration file for MicroStrategy ingestion.

You can access any local or remote PostgreSQL database. The MicroStrategy Intelligence Server has an embedded PostgreSQL repository, as its default repository. For complete information on the default, embedded repository, see the MicroStrategy repository documentation.

Tip For complete information on ingesting metadata from the following BI tools and creating a technical lineage, see the dedicated sections:

Custom technical lineage

You can create a custom technical lineage to include metadata of unsupported data sources. See Custom technical lineage.

Authentication

Technical lineage supports the following means of authentication:
  • For all data sources, except for external directories: username and password.
  • Tableau: username and password or token-based authentication.
  • Google BigQuery data sources: username and password or a service account key file. For more information, see the Google BigQuery documentation.
  • No other authentication methods are supported.