Reference data lifecycle

Reference data is relatively easy to govern because it concerns predictable data. Often, the code sets are related to the assets in the Business Glossary application.

The process of managing reference data in Collibra Platform generally involves the following phases.

Phase

Description

1. Create

Gather all existing reference data content, analyze it, and enter the relevant parts in Collibra Platform as Code Set and Code Value assets. We recommend that you use a specific Codelist domain for each code set.

Tip You can create the assets manually, but usually it is easier to use the import functionality to enter thousands of assets at the same time.

To describe the code set completely, you can create relations between the Code Set and Code Value assets, as well as other relevant assets.

Relation type

Head assets

Tail assets

Description

Code Value is part of / contains Code Set

Code Value assets

Code Set asset

Relations of this type link the Code Value assets to the corresponding Code Set asset.

Business Term has code / is code for Code Value

Business Term asset

Code Value asset

Relations of this type link Business Term assets to Code Value assets to provide more information about the meaning of the Code Value asset.

Data Element allowed value set / applies to Code Set

Column asset

Code Set asset

Relations of this type describe which code set is used to restrict the possible values of a column.

Data Element allowed value / allowed value for Code Value

Column asset

Code Value asset

Relations of this type describe the actual code values that are used in a column.

The outcome consists of Code Set and Code Value assets, organized in different Codelist domains. The assets can have relations to other assets and still have the Candidate status.

2. Complete

Create responsibilities by assigning users or user groups to roles for the respective Codelist domains:

  • The DataStewards improve the bulk import and prepare it for review. They also hold the ultimate decision-making authority in the approval process.
  • Subject Matter Experts review the correctness of the assets.
  • The Stakeholders comment on the assets and validate the correctness.

Use the Approval and Simple Approval workflows to update and approve the Code Set and Code Value assets.

The outcome consists of Code Set and Code Value assets with the Approved status.

3. Map

The DataStewards map code values and crosswalks between corresponding Code Value assets. A Crosswalk asset may have additional attributes to describe the transformation logic. Often, this transformation logic is hidden or implicit.

The Crosswalk assets originally also have the Candidate status. Therefore, they should also be reviewed and approved via the Approval and Simple Approval workflows.

4. Publish and trace

After you have created the required assets and added the required relations, you can use diagrams to trace the lineage.

The approved code values can also be provided to the business users in different ways:

  • You can export them to an XLSX or CSV file using Collibra workflows. However, this file will be attached to a community or domain within Collibra.
  • You can use the Collibra APIs to pull information from Collibra via external orchestrators, ETL tools, or programming languages.

To indicate that the code sets are published, you can create a status, for example, Published.

5. Use and maintain

Finally, the business users use the published code sets in their own applications, for example, in reporting software.

Typically, there will be inconsistencies or incompleteness in the code sets. These issues can be reported, which starts a workflow to fix the issue.