Reference Data basics
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
Reference data is data used to classify, categorize, structure, or constrain other data. Typically, they are static or slowly changing over time, for example, units of measurement and country codes.
Reference data often includes a known code set, which consists of code values that rarely change. As the name suggests, reference data is designed to be referenced by various other data to create a standard vocabulary and structure across diverse systems and data sources. Some examples of reference data are country codes, language codes, product codes, and account identifiers.
The Reference Data product aims at a systematic approach to manage reference data, including code sets and code values. For example, you can define relations between Code Set assets and Column assets for which they are the allowed values, or between Code Value assets and the Business Assets that they represent. Additionally, you can define complex mappings between them in order to enable crosswalks from one information system to another, considering the differences in the code sets through time.
With all the reference data gathered in a single place, you can build an organization-wide understanding of how your data is organized, classified, and collected.
In the following diagram, you can see that the Customer Information table contains the Address Type column, which can only contain code values from the Address Type Values code set.
In this topic
How to open Reference Data
You can open Reference Data if you have the Product Rights > Reference Data Manager global permission. To open Reference Data, on the main toolbar, click
→
Reference Data.
Reference Data tabs
Reference Data contains the following tabs.
Tab | Description |
---|---|
Code Value / Sets | A table with all Code Value and Code Set assets. |
Metrics |
A variety of statistics related to how the assets are used. |
Hierarchies | A table with all Hierarchies domains. |
Reference Data lifecycle
Reference data is relatively easy to govern because it concerns predictable data. Often, the code sets are related to the assets in Business Glossary. The process of managing reference data in Collibra Platform generally involves the following phases.
Phase |
Description |
||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Create |
Gather all existing reference data content, analyze it, and enter the relevant parts in Collibra Platform as Code Set and Code Value assets. We recommend that you use a specific Codelist domain for each code set. Tip You can create the assets manually, but usually it is easier to use the import functionality to enter thousands of assets at the same time. To describe the code set completely, you can create relations between the Code Set and Code Value assets, as well as other relevant assets.
The outcome consists of Code Set and Code Value assets, organized in different Codelist domains. The assets can have relations to other assets and still have the Candidate status. |
||||||||||||||||||||
2. Complete |
Create responsibilities by assigning users or user groups to roles for the respective Codelist domains:
Use the Approval and Simple Approval workflows to update and approve the Code Set and Code Value assets. The outcome consists of Code Set and Code Value assets with the Approved status. |
||||||||||||||||||||
3. Map |
The DataStewards map code values and crosswalks between corresponding Code Value assets. A Crosswalk asset may have additional attributes to describe the transformation logic. Often, this transformation logic is hidden or implicit. The Crosswalk assets originally also have the Candidate status. Therefore, they should also be reviewed and approved via the Approval and Simple Approval workflows. |
||||||||||||||||||||
4. Publish and trace |
After you have created the required assets and added the required relations, you can use diagrams to trace the lineage. The approved code values can also be provided to the business users in different ways:
To indicate that the code sets are published, you can create a status, for example, Published. |
||||||||||||||||||||
5. Use and maintain |
Finally, the business users use the published code sets in their own applications, for example, in reporting software. Typically, there will be inconsistencies or incompleteness in the code sets. These issues can be reported, which starts a workflow to fix the issue. |