Reference data

Reference data is data that is used to structure and constrain other data. It is typically stable information with a known code set, which consists of code values that rarely change. As the name suggests, reference data is designed to be referenced by a wide variety of other data. This is done in order to create a standard vocabulary and structure across diverse systems and data sources.

Example 
  • country codes
  • language codes
  • product codes
  • account identifiers
  • ...

However, not all systems use the same versions of a code set for the same type of information. This leads to problems when these systems exchange information.

Example 

The same organization could use the two-character country ISO codes for its Customer Relationship Management (CRM) system, but the three-character country ISO codes for its Enterprise Resource Planning (ERP) system.

Besides technical problems, business users may have the following questions:

  • What version of the ISO country codes is used in each database?
  • What is the difference between the version of ISO country codes of last year as compared to the one currently operational internally?
  • If I cannot find a code for a specific account or project, whom should I report it to?

Reference Data in Collibra

The Collibra Reference Data application aims at a systematic approach to manage reference data, including code sets and code values. For example, you can define relations between Code Set assets and Column assets for which they are the allowed values, or between Code Value assets and the Business Assets that they represent. Additionally, you can define complex mappings between them in order to enable crosswalks from one information system to another, taking into account differences in the code sets through time.

Example 

In the following diagram, you can see that the Customer Information table contains the Address Type column, which can only contain code values from the Address Type Values code set.