About creating and managing a data contract
The data contract is a key component of a data product. It defines the agreement between the data product owner and the data consumers. The data contract specifies the structure, format, service level, quality, and terms of use. It includes a high-level overview of the agreement and a data contract manifest. A data contract manifest is a YAML file that contains the definitions and contents of a data contract. Multiple versions of the manifest can exist. For more information, go to About data contracts.
Available tools to manage data contracts
To manage data contracts and their details, Collibra offers a set of tools. These tools are optimized for the Open Data Contract Standard (ODCS), an open-source framework that describes what is expected in a data contract manifest file.
Collibra provides:
-
In-product features, available from the user interface (UI), to create, manage, and view the data contracts and manifest files.
-
CLIs and API calls to create, manage, and integrate data contracts and manifest files.
Actions to manage data contracts
Collibra offers multiple options to manage data contracts. You can:
- Create and update a Data Contract.
- Generate a manifest file based on the Collibra data.
If you generate a manifest file, the process creates an Open Data Contract Standard manifest file based on Collibra knowledge graph information. For information on the mapping, go to Data Contract mappings. - Upload a new version of the manifest file.
- Download a manifest file.
- Set a manifest file as the active one.
- Delete a manifest file.
- Apply a manifest file to Collibra data.
This action is available from the UI and API.
If you apply a data contract manifest version to Collibra, the process:- Creates and updates relations between Data Product Port and Table assets based on the information in the active manifest version. For information on the mapping, go to Relations mappings.
- Updates out-of-the-box SLA (Service Level Agreement) attributes on the Data Contract asset in Collibra directly based on the information in the active manifest version. For information on the mapping, go to SLA mappings.
Data Contract mappings
SLA mappings
The following out-of-the-box SLA (Service Level Agreement) asset attributes are mapped to the manifest file. These attributes are used to generate a manifest file and are updated with manifest data when you apply a manifest file.
| Collibra asset attribute | Manifest property |
|---|---|
|
Backup Frequency |
backupFrequency |
| Latency | latency |
| Most Recent Record Date | mostRecentRecordDate |
|
Processing Frequency |
processingFrequency |
|
Processing Method |
processingMethod |
|
Recency |
recency |
|
Recovery Point |
recoveryPoint |
|
Recovery Time |
recoveryTime |
|
Response Time |
responseTime |
|
Retention Period |
retentionPeriod |
|
Support Availability |
supportAvailability |
|
Unlimited Retention |
isRetentionUnlimited |
|
Uptime Percentage |
uptimePercentage |
When mapping manifest file data to Collibra only manifest file data with titlecase (camelCase) property names are mapped to Data Contract asset attributes. If the manifest property includes a unit, both the value and the unit are combined in the Data Contract attribute.
slaProperties: - property: retentionPeriod value: 5 unit: months
Variations in the manifest, such as retentionperiod or retention_period won't be mapped.
Servers information mapping from Collibra to the Data Contract manifest
When you generate a data contract manifest file, Collibra includes a Servers section for the following data sources: BigQuery, PostgreSQL, Oracle, and Snowflake. The Server information includes fields such as server, type, database, and schema. By default, this information is retrieved from the Edge Connection string if the JDBC data source is registered through Edge. If this isn't available, it is collected from the Database and Schema assets in Collibra.
Generic fields
| Manifest property | Collibra asset attribute |
|---|---|
| id |
database asset UUID |
| server | Edge Connection name or Database asset displayName |
| type | This value is calculated by Collibra and refers to the data source, for example snowflake. |
| description |
Edge Connection description |
| host |
Edge Connection string, properties (host, server) |
| port |
Edge connection string, properties (port) |
Specific data source fields
| Data source | Manifest property | Collibra asset attribute |
|---|---|---|
| PostgreSQL | database |
Edge Connection string, properties (database) |
| schema | Schema asset displayName |
|
| Oracle | serviceName |
Edge Connection string, properties (servicename) |
| schema | Schema asset displayName |
|
| Snowflake | database |
Edge Connection string, properties (database) |
| account |
Edge Connection string, properties (account) |
|
| warehouse |
Edge Connection string, properties (database) |
|
| schema | Schema asset displayName |
|
| BigQuery | project |
Edge Connection string, properties (projectid) |
| dataset |
Edge Connection string, properties (projectid) |
Relations mapping from the Data Contract manifest and Collibra
When you apply a manifest file to Collibra, Collibra can create and update relations between Data Product Port and Table assets. To do this, Collibra collects data from the manifest file and finds matching assets in Collibra. To find a matching asset, the process determines the schema and database name using the following priority order:
- physicalName
- Custom properties
- Servers section
The process uses the first source in the manifest file in which the required information is found. If the information isn't available in one source, the process moves to the next.
- physicalName
- If the physicalName contains 3 parts: database, schema, and table, the process uses this information to create the relation. If the physicalName contains only the table name, the process moves to the custom properties to resolve the schema and database name.
- Custom properties
- If the physicalName doesn't contain the schema or database name, the process uses the custom properties schemaName and databaseName. Both must be defined for the relation to be created. If only one is defined, the relation isn't created.
- Servers section
- If neither the physicalName nor the custom properties contain the schema or database name, Collibra uses the Servers section, if available.
-
The following fields are supported:
- Schema: Defined as schema or dataset. Schema takes priority over Dataset.
- Database: Defined as database, catalog, or project. Database takes priority over catalog, which takes priority over project.
- If a relation to a Table asset already exists and the database, schema, and table combination is found more than once in Collibra, the existing relation remains unchanged.
- If no relation exists and the database, schema, and table combination is found more than once, the relation isn't created.
Related topics
- Steps overview: Creating and managing data contracts from Collibra
- Creating and maintaining data contracts and their manifest files through CLI and API
- Viewing all data contracts via the Data Contract Registry
- About data products
- Data product asset types and operating model
Helpful resources
- Video: What is a data contract?
- Collibra University: Scaling trust with Data Contracts
- Blog post: Accelerating trusted data product delivery with data contracts