Supported Connections
This page is a list of supported data source connection types. A supported data source is a data source that is shipped with the images or standalone bundles, and thus, eligible for support from the Collibra DQ team.
Note Any data source that is compatible with the Java version and server to which you are connected can be used. However, if an issue occurs with an unsupported data source, we cannot guarantee support.
Production
The following is a list of drivers certified for production use.
Connections - Currently Supported
Connection | Certified | Tested | Packaged | Optionally Packaged | Pushdown | Estimate job | Filtergram | Analyze Data | Schedule | Spark Agent | Yarn Agent | Parallel JDBC | Session State | Kerberos Password | Kerberos Password Manager | Kerberos Keytab | Kerberos TGT | Standalone (non-Livy) | JDK8 Driver Compatibility | JDK11 Driver Compatibility |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Athena |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Athena CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
BigQuery |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
BigQuery CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Databricks JDBC |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Databricks CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
DB2 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Dremio |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Hive |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Hive CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Impala |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Impala CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Microsoft SQL |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
MYSQL |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Oracle |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Postgres |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Presto |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Redshift |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Snowflake |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sybase |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Teradata |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Tip A connection listed as Tested is one for which the Collibra DQ team has an environment and is included in regular regression testing.
Note The Dremio connection is compatible with JDK11 if you add the following to owlmanage.sh as a JVM option for the web and Spark instance: -Dcdjd.io.netty.tryReflectionSetAccessible=true
Remote Connections - Currently Supported
Connection | Certified | Tested | Packaged | Optionally packaged | Pushdown | Estimate job | Filtergram | Analyze data | Spark agent | Yarn agent |
---|---|---|---|---|---|---|---|---|---|---|
Azure Data Lake (Gen2) |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Google Cloud Storage |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
HDFS |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
S3 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Under Evaluation
The following is a list of drivers which are under evaluation (not certified yet for production usage). These connections are currently ineligible for escalated support services.
Connections - Tech Preview
Connection | Certified | Tested | Packaged | Optional packaging | Pushdown | Estimate job | Filtergram | Analyze data | Schedule | Spark agent | Yarn agent | Parallel JDBC | Session state | Kerberos state | Kerberos password manager | Kerberos keytab | Kerberos TGT | Standalone (non-Livy) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cassandra |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
MongoDB |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
MongoDB CDATA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
SAP HANA |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Solr |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Streaming - Tech Preview
Connection | Certified | Tested | Packaged | Optional packaging | Pushdown | Estimate job | Filtergram | Analyze data | Schedule | Spark agent | Yarn agent | Parallel JDBC | Session state | Kerberos password | Kerberos password manager | Kerberos TGT | CRDB metastore | Standalone (non-Livy) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kafka |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Files
File type | Supported |
---|---|
CSV (and all delimiters) |
![]() |
Parquet |
![]() |
AVRO |
![]() |
JSON |
![]() |
DELTA |
![]() |
Limitations
Authentication
- DQ Jobs that require Kerberos TGT are not supported on Spark Standalone or Local deployments
- Recommended to submit jobs via Yarn or K8s
File Limitations
File Sizes
- Files with more than 250 columns supported in File Explorer, unless you have Livy enabled.
- Files larger than 5gb are not supported in File Explorer, unless you have Livy enabled.
- Smaller file sizes will allow for skip scanning and more efficient processing
- Advanced features like replay, scheduling, and historical lookbacks require a date signature in the folder of file path
S3
- Please ensure no spaces in S3 connection name
- Please remember to select 'Save Credentials' checkbox upon establishing connection
- Please point to root bucket, not sub folders
Local Files
- Local files can only be run using NO_AGENT default
- This is for quick testing, smaller files, and demonstration purposes.
- Local file scanning is not intended for large scale production use.
Livy
- Livy is only supported for K8s environments
Spark Engine Support
- MapR is EOL and MapR spark engine not supported to run Collibra DQ jobs.
Databricks
Please refer to this page for more details on Databricks support
The only supported Databricks spark submit option is to use a notebook to initiate the job (Scala and Pyspark options). This is intended for pipeline developers and users knowledgeable with Databricks and notebooks. This form factor is ideal for incorporating data quality within existing Spark ETL data flows. The results are still available for business users to consume. The configuration is not intended for business users to implement. There are three ways that Databricks users can run DQ jobs using Databricks cluster or JDBC connection. 1. Notebook Users can directly open a notebook, upload Collibra DQjars and run a DQ job on Databricks cluster. The full steps are explained in below page. Collibra DQsupports this flow in production.
https://dq-docs.collibra.com/apis-1/notebook/cdq-+-databricks
2. Spark-Submit
There are two ways to run a spark submit job on Databricks's cluster. The first approach is to run a DQ spark submit job using Databricks UI and the second approach is by invoking Databricks rest API. We have tested both approaches against different cluster versions of DataBricks (See below table). Below is the full documentation to demonstrate these paths. https://dq-docs.collibra.com/apis-1/notebook/cdq-+-databricks/dq-databricks-submit\
Please note that these are only examples to demonstrate how to achieve DQ spark submit to Databricks's cluster. These paths are not supported in production and the Collibra DQ team does not support any bug coverages or professional services or customer questions for these flows. \
3. JDBC
Collibra DQ users can create JDBC connections in CDQ UI and connect to their Databricks database. This is scheduled for 2022.05 release.
Warning Delta Lake and JDBC connectivity has been validated against Spark 3.01 Collibra DQ package, Databricks 7.3 LTS and SparkJDBC41.jar. This is available as Preview. No other combinations have been certified at this time.
Warning Spark submit using the Databricks spark master url is not supported.