Data product basics
In this topic
What is a data product?
A data product is a reusable package that provides data to answer a business question or solve a specific business problem. It includes everything you need to understand, access, and use the data. This makes it actionable and ready to support business decisions. A data product is secure, easy to use, and designed for anyone, including those who are not domain experts.
A data product includes not only the data elements but also context about the data and how to access it. It consists of 3 main components: Context, Data, and Access.
Context | The context includes background information, such as why the data product was created, who owns it, and details related to quality and privacy. |
Data | The data can refer to a table, view, or a business asset such as a report or a model. |
Access | The access information includes details on how to access the data and the policies that govern access. |
What's needed to create a data product?
Data product asset types and operating model
Data products use specific operating model asset types, relations, and attributes.
Tip Using the specific data product operating model allows you to scale as Collibra data product features grow over time.
- Asset types
- Data Product
A data product is a reusable package that provides data to answer a business question or solve a specific business problem. - Data Product Output Port
A data product output port represents the way the output of a data product is exposed to a data consumer. A data product can have multiple output ports, each representing a different way of exposing the data. - Data Contract
A data contract defines the commitments a data product owner makes to data consumers. The data contract documents the structure, format, service level, quality, and terms of use.Example The contract can outline service-level objectives (SLOs) related to system uptime and latency for a data product. It can also include details about pipelines or data delivery mechanisms and provide information about the data, such as schema and expected quality metrics.
-
Data Product Import Port
A data product input port is a functional representation of the source data that is used by a data product. A data product can use multiple data sources, each represented by an input port. One input port can be related to multiple assets in the data source, such as linking to multiple tables in the same database.
- Data Product
- Asset type groups
- Data Product Input: The physical input for a data product.
- Data Product Output: The physical output that is generated by a data product.
- Relations
- Data Product exposes data as / is output port for Data Product Output Port.
- Data Product Output Port is implemented as / implements Data Product Output.
- Data Contract governs functioning of / should operate according to Data Product Output Port.
- Data Contract information to be provided / is mentioned in the terms of Data Attribute.
- Data Product Input Port is input port for / consumes data through Data Product.
- Data Product Input implements / is implemented as Data Product Input Port.
-
Data attributes
Some specific attributes have been defined, such as:- Data product category: Defines whether a data product is targeting business users or more technical users. The out-of-the-box values are Derived (for business users) and Foundational (for more technical users).
This attribute is available for Data Product assets. - Target delivery date: Indicated the date when the requested resource should be made available.
This attribute is available for Data Product assets. - Access method: Indicates the method that can be used to access the data.
This attribute is available for Data Product Output Port assets.Tip This attribute can be used in workflows to create output port assets per access method.
- Access instructions: Provides instructions on how to access the data.
This attribute is available for Data Product Output Port assets.
- Data product category: Defines whether a data product is targeting business users or more technical users. The out-of-the-box values are Derived (for business users) and Foundational (for more technical users).
Data product life cycle and workflows
The detailed life cycle of a data product can look like this:
- Data products can be created beforehand or requested by consumers. (Request)
- The data product is built and published.
- The requirements are defined. (Discover)
- The output port details are refined. (Discover)
- The sources are prepared. (Build)
- The data product is published and made available as curated data. (Publish)
- The requestor or any data consumer can find and use the data product. (Access)
- The data product is then monitored and updated when needed. (Monitor)
Workflows can help with the data product request, creation, and access request.
Tip You can find some data product-specific workflows in Collibra Marketplace.
These workflows are published as community offerings and include dedicated documentation.
The following example is based on the community workflows: