Supported data sources for technical lineage
Collibra Data Intelligence Cloud supports many data sources and metadata sources, including JDBC data sources, ETL tools and BI tools, for which you can create a technical lineage.
For a complete list of required permissions per supported data source type, see the Requirements and permissions section in Prepare the lineage harvester configuration file.
JDBC data sources
The following tables show the supported JDBC data sources.
- Lineage harvester
- Technical lineage via Edge (beta)
The following table shows the supported JDBC data sources and driver versions that have been tested. You can connect to them via a JDBC driver or by creating a folder.
|
JDBC data source type |
Supported versions |
Connection type |
Scope |
|---|---|---|---|
|
Amazon Redshift |
1.2.34.1058 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
Azure SQL server |
Newest version |
JDBC, Folder |
SQL based input and stored procedures. |
|
Azure SQL Data Warehouse |
Newest version |
JDBC, Folder |
SQL based input and stored procedures. |
|
Azure Synapse Analytics |
Newest version |
JDBC, Folder |
SQL based input and stored procedures. |
| Google BigQuery |
Newest version |
JDBC, Folder |
SQL based input without stored procedures. |
|
Greenplum |
6.10 and newer |
JDBC, Folder |
SQL based input. |
|
HiveQL (SQL-like statements) |
2.3.5 and newer |
JDBC, Folder |
SQL based input and connection via an AWS host. |
|
IBM Db2 |
11.5 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
Oracle |
11g, 12c and newer |
JDBC, Folder |
SQL based input and stored procedures. |
|
PostgreSQL |
9.4, 9.5 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
Microsoft SQL Server |
2014, 2016 and newer |
JDBC, Folder |
SQL based input and stored procedures. |
|
MySQL |
5.7, 8 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
Netezza |
7.2.1.0 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
SAP Hana |
2.00.40 and newer |
JDBC, Folder |
SQL based input and SAP HANA Information views, which includes attributes, analytic views and calculation views from database table or view data sources. Script-based calculation views and stored procedures are out of scope. Important Collibra Data Lineage supports SQL based input and SAP HANA Information views are supported for SAP HANA on-premises. However, calculated views are not supported for SAP HANA Cloud.
|
| Snowflake |
Newest version |
JDBC, Folder |
For more information, go to Technical lineage for Snowflake ingestion methods. |
|
Spark SQL |
2.4.3 and newer |
JDBC, Folder |
SQL based input and connection via an AWS host. For Spark SQL data source, we recommend using the folder connection type to connect to the directory with your SQL queries. |
|
Sybase Adaptive Server Enterprise |
16.0 SP02 and newer |
JDBC, Folder |
SQL based input without stored procedures. |
|
Teradata |
15.0, 16.20.07.01 and newer |
JDBC, Folder |
SQL based input, including BTEQ scripts. |
The following table lists the supported JDBC data sources and connection types you can use when you add capabilities for different data sources. The Shared Storage connection is equivalent to the folder connection type when you use the lineage harvester.
|
JDBC data source type |
Supported versions |
Connection type |
Scope |
|---|---|---|---|
|
Amazon Redshift |
1.2.34.1058 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Azure SQL server |
Newest version |
JDBC connection, Shared Storage connection |
SQL based input and stored procedures. |
|
Azure SQL Data Warehouse |
Newest version |
JDBC connection, Shared Storage connection |
SQL based input and stored procedures. |
|
Azure Synapse Analytics |
Newest version |
JDBC connection, Shared Storage connection |
SQL based input and stored procedures. |
| Google BigQuery |
Newest version |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Greenplum |
6.10 and newer |
JDBC connection, Shared Storage connection |
SQL based input. |
|
HiveQL (SQL-like statements) |
2.3.5 and newer |
JDBC connection, Shared Storage connection |
SQL based input and connection via an AWS host. |
|
IBM Db2 |
11.5 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Oracle |
11g, 12c and newer |
JDBC connection, Shared Storage connection |
SQL based input and stored procedures. |
|
PostgreSQL |
9.4, 9.5 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Microsoft SQL Server |
2014, 2016 and newer |
JDBC connection, Shared Storage connection |
SQL based input and stored procedures. |
|
MySQL |
5.7, 8 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Netezza |
7.2.1.0 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
SAP Hana |
2.00.40 and newer |
JDBC connection, Shared Storage connection |
SQL based input and SAP HANA Information views, which includes attributes, analytic views and calculation views from database table or view data sources. Script-based calculation views and stored procedures are out of scope. |
| Snowflake |
Newest version |
JDBC connection, Shared Storage connection |
For more information, go to Technical lineage for Snowflake ingestion methods. |
|
Spark SQL |
2.4.3 and newer |
JDBC connection, Shared Storage connection |
SQL based input and connection via an AWS host. For Spark SQL data source, we recommend using the folder connection type to connect to the directory with your SQL queries. |
|
Sybase Adaptive Server Enterprise |
16.0 SP02 and newer |
JDBC connection, Shared Storage connection |
SQL based input without stored procedures. |
|
Teradata |
15.0, 16.20.07.01 and newer |
JDBC connection, Shared Storage connection |
SQL based input, including BTEQ scripts. |
ETL tools
The following table shows the supported ETL tools.
- Lineage harvester
- Technical lineage via Edge (beta)
The following table shows the supported ETL tools and driver versions that have been tested. You can connect to them via an API or by creating a folder.
|
ETL tool |
Supported versions |
Connection type |
Scope |
|---|---|---|---|
| Azure Data Factory (beta) | 2 | API | Commonly supported transformations and activities in Azure Data Factory. For details, go to Supported transformation details. |
|
IBM InfoSphere DataStage |
11.5 and newer |
Folder |
Commonly used DataStage ETL components including SQL overrides and transformation details. Collibra Data Lineagesupports IBM InfoSphere DataStage transformation logic. You have to prepare a folder with all data objects that you want to process. |
|
Informatica Intelligent Cloud Services, specifically Cloud Data Integration Tip Data Integration is one of the Informatica Intelligent Cloud services. |
Cloud, newest only |
API |
Commonly used transformations in Informatica Intelligent Cloud Services: Data Integration, including SQL overrides. Supported data sources are locally stored flat files and databases. |
|
Informatica PowerCenter |
9.6 and newer |
Folder |
Commonly used transformations in Informatica PowerCenter, including SQL overrides. You have to prepare a folder with all data objects that you want to process. |
|
Matillion |
Newest version |
API |
SQL based input without stored procedures. The lineage harvester can only access Redshift and Snowflake projects. |
|
SQL Server Integration Services (SSIS) |
2012 and newer Package format version 6 or newer. |
Folder |
All commonly used transformations in SSIS, data flows and mappings, including SQL overrides. Important SQL statements from Excel are not supported. You have to prepare a folder with all data objects that you want to process. |
The following table lists the supported ETL data sources and connection types you can use when you add capabilities for different data sources. The Shared Storage connection is equivalent to the folder connection type when you use the lineage harvester. The API connection type is not supported for Informatica Intelligent Cloud Services (IICS) and Matillion yet on Edge. You can use Shared Storage connections when you create the technical lineage for IICS and Matillion on Edge.
|
ETL tool |
Supported versions |
Connection type |
Scope |
|---|---|---|---|
|
IBM InfoSphere DataStage |
11.5 and newer |
Shared Storage connection |
Commonly used DataStage ETL components including SQL overrides and transformation details. Collibra Data Lineage supports IBM InfoSphere DataStage transformation logic. You have to prepare a folder with all data objects that you want to process. |
|
Informatica Intelligent Cloud Services, specifically Cloud Data Integration Tip Data Integration is one of the Informatica Intelligent Cloud services. |
Cloud, newest only |
Informatica Intelligent Cloud Services (IICS) connection Note Collibra Data Intelligence Cloud 2023.03 or newer is required to use the Informatica Intelligent Cloud Services (IICS) connection. |
Commonly used transformations in Informatica Intelligent Cloud Services: Data Integration, including SQL overrides. Supported data sources are locally stored flat files and databases. |
|
Informatica PowerCenter |
9.6 and newer |
Shared Storage connection |
Commonly used transformations in Informatica PowerCenter, including SQL overrides. You have to prepare a folder with all data objects that you want to process. |
|
Matillion |
Newest version |
Matillion connection Note Collibra Data Intelligence Cloud 2023.03 or newer is required to use the Matillion connection. |
SQL based input without stored procedures. Technical lineage via Edge can only access Redshift and Snowflake projects. |
|
SQL Server Integration Services (SSIS) |
2012 and newer Package format version 6 or newer. |
Shared Storage connection |
All commonly used transformations in SSIS, data flows and mappings, including SQL overrides. Important SQL statements from Excel are not supported. You have to prepare a folder with all data objects that you want to process. |
BI tools
The following table shows the supported BI tools.
- Lineage harvester
- Technical lineage via Edge (beta)
The following table shows the supported BI tools.
|
BI tool |
Tested versions |
Connection type |
|---|---|---|
|
Newest |
API. You have to prepare:
|
|
|
Power BI (deprecated) |
Newest |
API and XMLA endpoints. You have to run the Power BI harvester and the lineage harvester to ingest Power BI metadata. Note Integration via the Power BI harvester is deprecated. We will continue to fix issues, but the development of new features and improvements is discontinued. |
| Power BI | Newest |
API. The new Power BI integration includes many enhancements, including consolidated harvesters, meaning you no longer need the Power BI harvester. You only need to prepare:
|
|
Newest |
API. Collibra Data Lineage automatically creates a technical lineage, but stitching is not available. You have to prepare a lineage harvester configuration file for Looker ingestion. |
|
|
SQL Server Reporting Services (SSRS) or Power BI Report Server (PBRS) |
|
API. You have to prepare:
|
|
Newest |
Direct connection to the repository. Stitching is not available and there is no true technical lineage. There is only a diagram view that you can access via a Column or Table asset, but not via MicroStrategy assets. You have to prepare a lineage harvester configuration file for MicroStrategy ingestion.
You can access:
|
For information on ingesting metadata from the following BI tools and creating a technical lineage
For information on creating custom technical lineage by using the lineage harvester, go to Working with custom technical lineage.
The following table lists the supported BI data sources and connection types you can use when you add capabilities for different data sources.
|
BI tool |
Tested versions |
Connection type |
Capability |
|---|---|---|---|
|
Newest |
API |
Technical Lineage for Tableau (Beta) | |
|
Newest |
API |
Technical Lineage for Power BI (Beta) |
Custom technical lineage
You can create a custom technical lineage to include data objects from data sources that are not listed above.
To custom technical lineage on Edge, add the Technical Lineage for Custom Lineage and synchronize the technical lineage.
For information about creating technical lineage by using the lineage harvester, go to Working with a custom technical lineage.
Authentication
- For all data sources, except for external directories: username and password.
- Google BigQuery data sources: username and password or a service account key file. For more information, see the Google BigQuery documentation.
- Power BI: username and password or service principal.
- Snowflake: username and password or key pair authentication.
- Tableau: username and password or token-based authentication.
- No other authentication methods are supported.