Reference data lifecycle

Reference data is relatively easy to govern because it concerns predictable data. Very often, the code sets are related to assets in the Business Glossary application..

Typically, managing reference data in Collibra Data Intelligence Cloud consists of the following phases:

Phase

Description

1. Creation

Gather all existing reference data content, analyze it and enter the relevant parts in Collibra Data Intelligence Cloud as Code Set and Code Value assets. We recommend that you use a specific Codelist domain for each code set.

Tip You can create the assets manually, but usually it is easier to use the import functionality to enter thousands of assets in one go.

To fully describe the code set, you can create relations between the Code Set and Code Value assets on the one hand, and other relevant assets on the other:

Relation type

Head assets

Tail assets

Description

Code Value is part of / contains Code Set

Code Value assets

Code Set asset

Relations of this type link the Code Value assets to the corresponding Code Set asset.

Business Term has code / is code for Code Value

Business Term asset

Code Value asset

Relations of this type link Business Term assets to Code Value assets to provide more information about the meaning of the Code Value asset.

Data Element allowed value set / applies to Code Set

Column asset

Code Set asset

Relations of this type describe which code set is used to restrict the possible values of a column.

Data Element allowed value / allowed value for Code Value

Column asset

Code Value asset

Relations of this type describe the actual code values that are used in a column.

The outcome consists of Code Set and Code Value assets, organized in different Codelist domains. The assets can have relations to other assets and still have the Candidate status.

2. Completion

Create responsibilities by assigning users or user groups to roles for the respective Codelist domains:

  • The DataStewards improve the bulk import, and make it ready for review. They also have the final say in the approval process.
  • Subject Matter Experts review the correctness of the assets.
  • The Stakeholders comments on the assets and validate the correctness.

Use the Approval and Simple Approval workflows to update and approve the Code Set and Code Value assets.

The outcome consists of Code Set and Code Value assets with the Approved status.

3. Mapping

The DataStewards map code values and crosswalks between corresponding Code Value assets. A Crosswalk asset may have additional attributes to describe the transformation logic. Often, this transformation logic is hidden or implicit.

The Crosswalk assets originally also have the Candidate status, so they should also be reviewed and approved via the Approval and Simple Approval workflows.

4. Publication and traceability

Once you have created the required assets and added the required relations, you can use diagrams to trace the lineage.

The approved code values can also be provided to the business users in different ways:

  • You can export them in XLSX or CSV format.
  • The Collibra API also offers ways to push approved assets to external applications. You can configure this to take place regularly via custom workflows.

To indicate that the code sets are published, you can create a new status, for example Published.

5. Use and maintain

Finally, the business users use the published code sets in their own applications, for example in reporting software.

Typically, there will be inconsistencies or incompleteness in the code sets. These issues can be reported, which starts a workflow to fix the issue.