About creating and managing a data contract

The data contract is a key component of a data product. It defines the agreement between the data product owner and the data consumers. The data contract specifies the structure, format, service level, quality, and terms of use. It includes a high-level overview of the agreement and a data contract manifest. A data contract manifest is a YAML file that contains the definitions and contents of a data contract. Multiple versions of the manifest can exist. For more information, go to About data contracts.

Available tools to manage data contracts

To manage data contracts and their details, Collibra offers a set of tools. These tools are optimized for the Open Data Contract Standard (ODCS), an open-source framework that describes what is expected in a data contract manifest file.

Collibra provides:

Available actions to manage data contracts

Collibra offers multiple options to manage data contracts. You can:

Data Contract mappings

SLA information mapping

The following out-of-the-box SLA (Service Level Agreement) asset attributes are mapped to the manifest file. These attributes are used to generate a manifest file and are updated with manifest data when you apply a manifest file.

Collibra asset attribute Manifest property

Backup Frequency

backupFrequency
Latency latency
Most Recent Record Date mostRecentRecordDate

Processing Frequency

processingFrequency

Processing Method

processingMethod

Recency

recency

Recovery Point

recoveryPoint

Recovery Time

recoveryTime

Response Time

responseTime

Retention Period

retentionPeriod

Support Availability

supportAvailability

Unlimited Retention

isRetentionUnlimited

Uptime Percentage

uptimePercentage

When mapping manifest file data to Collibra only manifest file data with titlecase (camelCase) property names are mapped to Data Contract asset attributes. If the manifest property includes a unit, both the value and the unit are combined in the Data Contract attribute.

Example For the following manifest data, the resulting Retention Period attribute value in Collibra will be 5 months.
slaProperties:
	- property: retentionPeriod
	value: 5
	unit: months

Variations in the manifest, such as retentionperiod or retention_period won't be mapped.

Servers information mapping from Collibra to the Data Contract manifest

When you generate a data contract manifest file, Collibra includes a Servers section for the following data sources: Athena, BigQuery, Databricks, MySQL, Oracle, PostgreSQL, Redshift, Snowflake, SQL Server.

The Server information includes fields such as server, type, database, and schema. By default, this information is retrieved from the Edge Connection string if the JDBC data source is registered through Edge. If this isn't available, it is collected from the Database and Schema assets in Collibra.

Generic fields

The manifest properties use the values of the following Collibra asset attributes.

Manifest property Collibra asset attribute
id

Database asset UUID

server Edge Connection name
or
Database asset displayName
type This value is calculated by Collibra and refers to the data source, for example snowflake.
description

Edge Connection description
or
Database asset Description attribute

host

Edge Connection string, properties (host, server)
or
Database asset Location attribute

port

Edge connection string, properties (port)
or
Database asset Location attribute

Specific data source fields

The manifest properties use the values of the following Collibra asset attributes.

Data source Manifest property Collibra asset attribute
Athena catalog Edge Connection string, properties (catalog)
or
Database asset displayName
or
Database asset Location attribute
schema Schema asset displayName
regionName Edge Connection string, properties (region, awsregion)
or
Database asset Location attribute
stagingDir Edge Connection string, properties (s3stagingdirectory, outputlocation, s3outputlocation)
or
Database asset Location attribute
BigQuery project

Edge Connection string, properties (projectid)
or
Database asset Location attribute
or
Database asset displayName

dataset

Edge Connection string, properties (projectid)
or
Database asset Location attribute
or
Schema asset displayName

Databricks catalog Edge Connection string, properties (catalog)
or
Database asset displayName
or
Database asset Location attribute
schema Schema asset displayName
MySQL database Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName
schema

Schema asset displayName

Oracle serviceName

Edge Connection string, properties (servicename)
or
Database asset Location attribute
or
Database asset displayName

schema Schema asset displayName
PostgreSQL database

Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName

schema Schema asset displayName
Redshift database Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName
schema

Schema asset displayName

region Edge Connection string, properties (region)
or
Database asset Location attribute
account Edge Connection string, properties (account)
or
Database asset Location attribute
Snowflake database

Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName

account

Edge Connection string, properties (account)
or
Database asset Location attribute

warehouse

Edge Connection string, properties (database)
or
Database asset Location attribute

schema Schema asset displayName
SQL Server database Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName
schema

Schema asset displayName

Relations mapping from the Data Contract manifest to Collibra

When you apply a manifest file to Collibra, Collibra can create and update relations between Data Product Port and Table assets. To do this, Collibra collects data from the manifest file and finds matching assets in Collibra. To find a matching asset, the process determines the schema and database name using the following priority order:

The process uses the first source in the manifest file in which the required information is found. If the information isn't available in one source, the process moves to the next.

physicalName
If the physicalName contains 3 parts: database, schema, and table, the process uses this information to create the relation. If the physicalName contains only the table name, the process moves to the custom properties to resolve the schema and database name.
Custom properties
If the physicalName doesn't contain the schema or database name, the process uses the custom properties schemaName and databaseName. Both must be defined for the relation to be created. If only one is defined, the relation isn't created.
Servers section
If neither the physicalName nor the custom properties contain the schema or database name, Collibra uses the Servers section, if available.

The following fields are supported:

  • Schema: Defined as schema or dataset. Schema takes priority over Dataset.
  • Database: Defined as database, catalog, or project. Database takes priority over catalog, which takes priority over project.
Note 

Related topics

Helpful resources