Supported data sources for technical lineage

Collibra Data Intelligence Cloud supports many data sources and metadata sources, including JDBC data sources, ETL tools and BI tools, for which you can create a technical lineage.

For a complete list of required permissions per supported data source type, see the Requirements and permissions section in Prepare the lineage harvester configuration file.

Note Using an older version of a data source might not work as expected; however, we don't expect problems if you use a newer version.

JDBC data sources

The following tables show the supported JDBC data sources.

The following table lists the supported JDBC data sources and connection types you can use when you add capabilities for different data sources. The Shared Storage connection is equivalent to the folder connection type when you use the lineage harvester.

JDBC data source type

Supported versions

Connection type

Scope

Steps to create technical lineage

Amazon Redshift

1.2.34.1058 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for Amazon Redshift on Edge.

Azure SQL Data Warehouse

Newest version

JDBC connection,

Shared Storage connection

SQL based input and stored procedures.

Create technical lineage for Azure SQL Data Warehouse on Edge.

Azure SQL server

Newest version

JDBC connection,

Shared Storage connection

SQL based input and stored procedures.

Create technical lineage for Azure SQL server on Edge.

Azure Synapse Analytics

Newest version

JDBC connection,

Shared Storage connection

SQL based input and stored procedures.

Create technical lineage for Azure Synapse Analytics on Edge.
Google BigQuery

Newest version

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for Google BigQuery on Edge.

Greenplum

6.10 and newer

JDBC connection,

Shared Storage connection

SQL based input.

Create technical lineage for Greenplum on Edge.

HiveQL (SQL-like statements)

2.3.5 and newer

JDBC connection,

Shared Storage connection

SQL based input and connection via an AWS host.

Create technical lineage for HiveQL on Edge.

IBM Db2

11.5 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for IBM Db2 on Edge.

Oracle

11g, 12c and newer

JDBC connection,

Shared Storage connection

SQL based input and stored procedures.

Create technical lineage for Oracle on Edge.

PostgreSQL

9.4, 9.5 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for PostgreSQL on Edge.

Microsoft SQL Server

2014, 2016 and newer

JDBC connection,

Shared Storage connection

SQL based input and stored procedures.

Create technical lineage for Microsoft SQL Server on Edge.

MySQL

5.7, 8 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for MySQL on Edge.

Netezza

7.2.1.0 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for Netezza on Edge.

SAP Hana

2.00.40 and newer

JDBC connection,

Shared Storage connection

SQL based input and SAP HANA Information views, which includes attributes, analytic views and calculation views from database table or view data sources.

Script-based calculation views and stored procedures are out of scope.

Create technical lineage for SAP HANA on Edge.
Snowflake

Newest version

JDBC connection,

Shared Storage connection

  • SQL based input without stored procedures.
  • SQL-API based input with stored procedures.

For more information, go to Technical lineage for Snowflake ingestion methods.

Create technical lineage for Snowflake on Edge.

Spark SQL

2.4.3 and newer

JDBC connection,

Shared Storage connection

SQL-based input without stored procedures and connection via an AWS host.

For Spark SQL data source, we recommend using the folder connection type to connect to the directory with your SQL queries.

Create technical lineage for Spark SQL on Edge.

Sybase Adaptive Server Enterprise

16.0 SP02 and newer

JDBC connection,

Shared Storage connection

SQL based input without stored procedures.

Create technical lineage for Sybase Adaptive Server Enterprise on Edge.

Teradata

15.0, 16.20.07.01 and newer

JDBC connection,

Shared Storage connection

SQL based input, including BTEQ scripts.

Create technical lineage for Teradata on Edge.

The following table shows the supported JDBC data sources and driver versions that have been tested. You can connect to them via a JDBC driver or by creating a folder.

JDBC data source type

Supported versions

Connection type

Scope

Amazon Redshift

1.2.34.1058 and newer

JDBC, Folder

SQL-based input without stored procedures.

Azure SQL server

Newest version

JDBC, Folder

SQL-based input and stored procedures.

Azure SQL Data Warehouse

Newest version

JDBC, Folder

SQL-based input and stored procedures.

Azure Synapse Analytics

Newest version

JDBC, Folder

SQL-based input and stored procedures.

Google BigQuery

Newest version

JDBC, Folder

SQL-based input without stored procedures.

Greenplum

6.10 and newer

JDBC, Folder

SQL-based input without stored procedures.

HiveQL (SQL-like statements)

2.3.5 and newer

JDBC, Folder

SQL-based input and connection via an AWS host. Stored procedures are not supported.

IBM Db2

11.5 and newer

JDBC, Folder

SQL-based input without stored procedures.

Oracle

11g, 12c and newer

JDBC, Folder

SQL-based input and stored procedures.

PostgreSQL

9.4, 9.5 and newer

JDBC, Folder

SQL-based input without stored procedures.

Microsoft SQL Server

2014, 2016 and newer

JDBC, Folder

SQL-based input and stored procedures.

MySQL

5.7, 8 and newer

JDBC, Folder

SQL-based input without stored procedures.

Netezza

7.2.1.0 and newer

JDBC, Folder

SQL-based input without stored procedures.

SAP HANA

2.00.40 and newer

JDBC, Folder

SQL-based input and SAP HANA Information views, which includes attributes, analytic views and calculation views from database table or view data sources.

Script-based calculation views and stored procedures are out of scope.

Important CollibraData Lineage supports SQL based input and SAP HANA Information views are supported for SAP HANA on-premises. However, calculated views are not supported for SAP HANA Cloud.
Snowflake

Newest version

JDBC, Folder

  • SQL-based input without stored procedures.
  • SQL-API-based input with stored procedures.

For more information, go to Technical lineage for Snowflake ingestion methods.

Spark SQL

2.4.3 and newer

JDBC, Folder

SQL-based input and connection via an AWS host. Stored procedures are not supported.

For Spark SQL data source, we recommend using the folder connection type to connect to the directory with your SQL queries.

Sybase Adaptive Server Enterprise

16.0 SP02 and newer

JDBC, Folder

SQL-based input without stored procedures.

Teradata

15.0, 16.20.07.01 and newer

JDBC, Folder

SQL-based input and stored procedures, including BTEQ scripts.

ETL tools

The following table shows the supported ETL tools.

The following table lists the supported ETL data sources and connection types you can use when you add capabilities for different data sources. The Shared Storage connection is equivalent to the folder connection type when you use the lineage harvester. The API connection type is not supported for Informatica Intelligent Cloud Services (IICS) and Matillion yet on Edge. You can use Shared Storage connections when you create the technical lineage for IICS and Matillion on Edge.

ETL tool

Supported versions

Connection type

Scope

Steps to create technical lineage
Azure Data Factory 2 and newer API Commonly supported transformations and activities in Azure Data Factory. For details, go to Supported transformation details. Create technical lineage for Azure Data Factory on Edge.

dbt Cloud (Beta)

1.4 or newer API Commonly supported model types in dbt. For details, go to Supported transformation details. Create technical lineage for dbt Cloud on Edge.

IBM InfoSphere DataStage

11.5 and newer

Shared Storage connection

Commonly used DataStage ETL components including SQL overrides and transformation details.

Collibra Data Lineage supports IBM InfoSphere DataStage transformation logic.

You have to prepare a folder with all data objects that you want to process.

Create technical lineage for DataStage on Edge.

Informatica Intelligent Cloud Services, specifically Cloud Data Integration

Tip Data Integration is one of the Informatica Intelligent Cloud services.

Cloud, newest only

Informatica Intelligent Cloud Services (IICS) connection

Note Collibra Data Intelligence Cloud 2023.03 or newer is required to use the Informatica Intelligent Cloud Services (IICS) connection.

Commonly used transformations in Informatica Intelligent Cloud Services: Data Integration, including SQL overrides.

Supported data sources are locally stored flat files and databases.

Create technical lineage for IICS on Edge.

Informatica PowerCenter

9.6 and newer

Shared Storage connection

Commonly used transformations in Informatica PowerCenter, including SQL overrides.

You have to prepare a folder with all data objects that you want to process.

Create technical lineage for Informatica PowerCenter on Edge.

Matillion

Newest version

Matillion connection

Note Collibra Data Intelligence Cloud 2023.03 or newer is required to use the Matillion connection.

SQL based input without stored procedures.

Technical lineage via Edge can only access Redshift and Snowflake projects.

Create technical lineage for Matillion on Edge.

SQL Server Integration Services (SSIS)

2012 and newer

Package format version 6 or newer.

Shared Storage connection

All commonly used transformations in SSIS, data flows and mappings, including SQL overrides.

Important SQL statements from Excel are not supported.

You have to prepare a folder with all data objects that you want to process.

Create technical lineage for SQL Server Integration Services on Edge.

The following table shows the supported ETL tools and driver versions that have been tested. You can connect to them via an API or by creating a folder.

ETL tool

Supported versions

Connection type

Scope

Azure Data Factory 2 and newer API Commonly supported transformations and activities in Azure Data Factory. For details, go to Supported transformation details.

dbt (Beta)

1.4 or newer API for dbt Cloud

Folder for dbt Core

Commonly supported model types in dbt. For details, go to Supported transformation details.

IBM InfoSphere DataStage

11.5 and newer

Folder

Commonly used DataStage ETL components including SQL overrides and transformation details.

Collibra Data Lineagesupports IBM InfoSphere DataStage transformation logic.

You have to prepare a folder with all data objects that you want to process.

Informatica Intelligent Cloud Services, specifically Cloud Data Integration

Tip Data Integration is one of the Informatica Intelligent Cloud services.

Cloud, newest only

API

Commonly used transformations in Informatica Intelligent Cloud Services: Data Integration, including SQL overrides.

Supported data sources are locally stored flat files and databases.

Informatica PowerCenter

9.6 and newer

Folder

Commonly used transformations in Informatica PowerCenter, including SQL overrides.

You have to prepare a folder with all data objects that you want to process.

Matillion

Newest version

API

SQL based input without stored procedures.

The lineage harvester can only access Redshift and Snowflake projects.

SQL Server Integration Services (SSIS)

2012 and newer

Package format version 6 or newer.

Folder

All commonly used transformations in SSIS, data flows and mappings, including SQL overrides.

Important SQL statements from Excel are not supported.

You have to prepare a folder with all data objects that you want to process.

BI tools

The following table shows the supported BI tools.

The following table lists the supported BI data sources and connection types you can use when you add capabilities for different data sources.

BI tool

Tested versions

Connection type

Capability

Steps to create technical lineage

Tableau

Newest

API

Technical Lineage for Tableau Create technical lineage for Tableau on Edge.

Power BI

Newest

API

Technical Lineage for Power BI Create technical lineage for Power BI on Edge.
MicroStrategy

Newest

API

Technical Lineage for MicroStrategy Create technical lineage for MicroStrategy on Edge.

The following table shows the supported BI tools.

BI tool

Tested versions

Connection type

Tableau

Newest

API.

You have to prepare:

Power BI Newest

API.

The new Power BI integration includes many enhancements, including consolidated harvesters, meaning you no longer need the Power BI harvester. You only need to prepare:

Looker

Newest

API.

Collibra Data Lineage automatically creates a technical lineage, but stitching is not available.

You have to prepare a lineage harvester configuration file for Looker ingestion.

SQL Server Reporting Services (SSRS) or Power BI Report Server (PBRS)

  • SSRS: 2017 and newer
    Note Due to a bug in 2017 that is resolved by the newer APIs, we recommend using SQL Server 2019 or newer Reporting Services.
  • PBRS: 2019 and newer

API.

You have to prepare:

MicroStrategy

Newest

Direct connection to the repository.

Stitching is not available and there is no true technical lineage. There is only a diagram view that you can access via a Column or Table asset, but not via MicroStrategy assets.

You have to prepare a lineage harvester configuration file for MicroStrategy ingestion.

You can access:

  • Microsoft SQL Server repository.
  • Any local or remote PostgreSQL database. The MicroStrategy Intelligence Server has an embedded PostgreSQL repository, as its default repository. For complete information on the default, embedded repository, see the MicroStrategy repository documentation.

MicroStrategy (NEW)

Newest

You have to prepare a lineage harvester configuration file for MicroStrategy ingestion.

Benefits of the new integration method include:
  • Support for the latest MicroStrategy APIs
  • Support for technical lineage and stitching.
  • New operating model.
  • No longer dependent on a direct connection to the repository.

Custom technical lineage

You can create a custom technical lineage to include data objects from data sources that are not listed above.

For information on creating a custom technical lineage via Edge, go to Create technical lineage via Edge for custom technical lineage.

For information on creating technical lineage by using the lineage harvester, go Custom technical lineage via the lineage harvester.