About data products and data contracts
About data products
A data product is a reusable package that provides data to answer a business question or solve a specific business problem. It includes everything you need to understand, access, and use the data. This makes it actionable and ready to support business decisions. A data product is secure, easy to use, and designed for anyone, including those who are not domain experts.
A data product includes not only the data elements but also context about the data and how to access it. It consists of 4 main components: Context, Data, Controls, and Access.
| Context | The context includes background information, such as why the data product was created, who owns it, and details related to quality and privacy. |
| Data | The data can refer to a table, view, or a business asset such as a report or a model. |
| Controls | The related policies and quality checks. |
| Access | The access information includes details on how to access the data and the policies that govern access. |
In Collibra, data products are assets with asset type Data Product. The asset provides information about the 4 main components.
For more information, go to Using data products.
HR data is collected in various systems, such as a company's Human Resource system, employee surveys, and learning management systems. Analysts can struggle to locate the data they need for analysis and decision-making because the data is spread across multiple sources. By creating an HR Analytics data product, all relevant HR data points, such as turnover rates, employee satisfaction, and hiring efficiency, are consolidated into a single entry point. The data product helps analysts understand what data is included, the purpose of the metrics, and how to access the data. As a result, the data product simplifies the analysis process and supports data-driven decision-making in HR.
Data products are created and built based on specific asset types, relations, and attributes. Community workflows are also available to support the request, creation, and building of data products.
For all information about the out-of-the box model, go to Data product asset types and operating model. For information about available workflows, go to Configuring and building data products.
About data contracts and manifests
An essential element of data products is the data contract. A data contract describes the structure, format, service level, quality, and terms of use of the data involved. Data engineers can create multiple versions of a data contract, which are referred to as data contract manifests.
- A data contract is a stable, governed asset in Collibra representing the formal agreement on the structure and semantics of data exchanged between systems.
- A data contract manifest is a precise YAML file that is used to define and store the details of that data contract. In essence, the manifest is a component that helps bring the data contract to life, allowing for its creation, validation, and deployment. Collibra offers multiple ways to generate and maintain data contract manifests.
Here you find a more elaborated overview of the differences between a data contract and a data contract manifest.
-
The data contract is the governed entity within Collibra that represents the formal commitment.
-
It is an asset type that defines the commitments the data product owner makes to consumers regarding the structure, format, service level, and quality.
-
It is centrally registered in the Data Product Catalog domain and linked to a Data Product Port.
-
The Data Contract asset maintains its history, allowing users to explore historical versions of the contract manifest via the dedicated Contract Manifest tab. It is also tracked in the Data Contract Registry, acting as a centralized source of truth for all contracts.
-
When a consumer views the Data Product asset, they access the contents of the Data Contract through a dedicated tab in the output ports viewer widget.
-
-
The Data Contract Manifest is the physical, machine-readable file that contains the instructions and specifications for the data contract.
-
It is the YAML file that outlines the detailed sections, such as the fundamentals (description, status, version), schema validation (tables, columns), data quality information, and service level agreements (SLAs).
-
It is a file that Data Engineers or Data Producers use in their development workflows. They can use CLI or API calls to initialize the Data Contract asset and upload a new manifest to register a new version of the Data Contract.
-
If the manifest adheres to the Open Data Contract Standard (ODCS), it contains a unique id (the Manifest ID). This ID allows the platform to automatically identify which Data Contract asset should be updated when a new manifest file is pushed.
-
The format of the uploaded manifest defines how the contract appears in Collibra. If it adheres to the Open Data Contract Standard, the information is shown in a structured way; otherwise, it is shown as code only.
-
Collibra can generate a manifest file based on the available Data Product data.
-
Example The contract can outline service-level objectives (SLOs) related to system uptime and latency for a data product. It can also include details about pipelines or data delivery mechanisms and provide information about the data, such as schema and expected quality metrics.
For more information about creating and managing data contracts, go to About data contract creation and maintenance. For information about using data contracts as a data consumer, go to Using data products.
Related topics
- Data product asset types and operating model
- Set up your Collibra environment for data products and data contracts
- Configuring and building data products
- About data contract creation and maintenance
- Using data products
Helpful resources
- Video: What's a data product?
- Video: What is a data contract?
- Data Product learning path on Collibra University