About creating and managing a data contract

The data contract is a key component of a data product. It defines the agreement between the data product owner and the data consumers. The data contract specifies the structure, format, service level, quality, and terms of use. It includes a high-level overview of the agreement and a data contract manifest. A data contract manifest is a YAML file that contains the definitions and contents of a data contract. Multiple versions of the manifest can exist. For more information, go to About data contracts.

Available tools to manage data contracts

To manage data contracts and their details, Collibra offers a set of tools. These tools are optimized for the Open Data Contract Standard (ODCS), an open-source framework that describes what is expected in a data contract manifest file.

Collibra provides:

  • In-product features, available from the user interface (UI), to create, manage, and view the data contracts and manifest files.

  • CLIs and API calls to create, manage, and integrate data contracts and manifest files.

Actions to manage data contracts

Collibra offers multiple options to manage data contracts. You can:

  • Create and update a Data Contract.
  • Generate a manifest file based on the Collibra data.
    If you generate a manifest file, the process creates an Open Data Contract Standard manifest file based on Collibra knowledge graph information. For information on the mapping, go to Data Contract mappings.
  • Upload a new version of the manifest file.
  • Download a manifest file.
  • Set a manifest file as the active one.
  • Delete a manifest file.
  • Apply a manifest file to Collibra data.
    This action is available from the UI and API.
    If you apply a data contract manifest version to Collibra, the process:
    • Creates and updates relations between Data Product Port and Table assets based on the information in the active manifest version. For information on the mapping, go to Relations mappings.
    • Updates out-of-the-box SLA (Service Level Agreement) attributes on the Data Contract asset in Collibra directly based on the information in the active manifest version. For information on the mapping, go to SLA mappings.

Data Contract mappings

SLA mappings

The following out-of-the-box SLA (Service Level Agreement) asset attributes are mapped to the manifest file. These attributes are used to generate a manifest file and are updated with manifest data when you apply a manifest file.

Collibra asset attribute Manifest property

Backup Frequency

backupFrequency
Latency latency
Most Recent Record Date mostRecentRecordDate

Processing Frequency

processingFrequency

Processing Method

processingMethod

Recency

recency

Recovery Point

recoveryPoint

Recovery Time

recoveryTime

Response Time

responseTime

Retention Period

retentionPeriod

Support Availability

supportAvailability

Unlimited Retention

isRetentionUnlimited

Uptime Percentage

uptimePercentage

When mapping manifest file data to Collibra only manifest file data with titlecase (camelCase) property names are mapped to Data Contract asset attributes. If the manifest property includes a unit, both the value and the unit are combined in the Data Contract attribute.

Example For the following manifest data, the resulting Retention Period attribute value in Collibra will be 5 months.
slaProperties:
	- property: retentionPeriod
	value: 5
	unit: months

Variations in the manifest, such as retentionperiod or retention_period won't be mapped.

Servers information mapping from Collibra to the Data Contract manifest

When you generate a data contract manifest file, Collibra includes a Servers section for the following data sources: BigQuery, PostgreSQL, Oracle, and Snowflake. The Server information includes fields such as server, type, database, and schema. By default, this information is retrieved from the Edge Connection string if the JDBC data source is registered through Edge. If this isn't available, it is collected from the Database and Schema assets in Collibra.

Generic fields

Manifest property Collibra asset attribute
id

database asset UUID

server Edge Connection name
or
Database asset displayName
type This value is calculated by Collibra and refers to the data source, for example snowflake.
description

Edge Connection description
or
Database asset Description attribute

host

Edge Connection string, properties (host, server)
or
Database asset Location attribute

port

Edge connection string, properties (port)
or
Database asset Location attribute

Specific data source fields

Data source Manifest property Collibra asset attribute
PostgreSQL database

Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName

schema Schema asset displayName
Oracle serviceName

Edge Connection string, properties (servicename)
or
Database asset Location attribute
or
Database asset displayName

schema Schema asset displayName
Snowflake database

Edge Connection string, properties (database)
or
Database asset Location attribute
or
Database asset displayName

account

Edge Connection string, properties (account)
or
Database asset Location attribute

warehouse

Edge Connection string, properties (database)
or
Database asset Location attribute

schema Schema asset displayName
BigQuery project

Edge Connection string, properties (projectid)
or
Database asset Location attribute
or
Database asset displayName

dataset

Edge Connection string, properties (projectid)
or
Database asset Location attribute
or
Schema asset displayName

Relations mapping from the Data Contract manifest and Collibra

When you apply a manifest file to Collibra, Collibra can create and update relations between Data Product Port and Table assets. To do this, Collibra collects data from the manifest file and finds matching assets in Collibra. To find a matching asset, the process determines the schema and database name using the following priority order:

  • physicalName
  • Custom properties
  • Servers section

The process uses the first source in the manifest file in which the required information is found. If the information isn't available in one source, the process moves to the next.

physicalName
If the physicalName contains 3 parts: database, schema, and table, the process uses this information to create the relation. If the physicalName contains only the table name, the process moves to the custom properties to resolve the schema and database name.
Custom properties
If the physicalName doesn't contain the schema or database name, the process uses the custom properties schemaName and databaseName. Both must be defined for the relation to be created. If only one is defined, the relation isn't created.
Servers section
If neither the physicalName nor the custom properties contain the schema or database name, Collibra uses the Servers section, if available.

The following fields are supported:

  • Schema: Defined as schema or dataset. Schema takes priority over Dataset.
  • Database: Defined as database, catalog, or project. Database takes priority over catalog, which takes priority over project.
Note 
  • If a relation to a Table asset already exists and the database, schema, and table combination is found more than once in Collibra, the existing relation remains unchanged.
  • If no relation exists and the database, schema, and table combination is found more than once, the relation isn't created.

Related topics

Helpful resources