About out-of-the-box data classes in the Unified Data Classification method
Out-of-the-box data classes are example data classes created by Collibra. Once an out-of-the-box data class has been imported, it's considered as a regular data class; you can edit the data class and change its classification rules. You can decide if and which out-of-the-box data classes you want to use and adapt them to your needs. This allows you to have only the data classes you are interested in and reduces the risk of similar, overlapping data classes.
Important Use the out-of-the-bos data classes as a starting point and update them to your needs to increase the classification accuracy.
Example Out-of-the-box , Collibra provides the data classes: Credit card and Credit card: Visa. Both are overlapping because a Visa credit card is also a credit card. You can decide which data class you want to use depending on the granularity you need, and update the classification rules if needed.
Import process
If you import out-of-the-box data classes, we'll detect whether you have data classes with the same name. We'll inform you about this and indicate whether their rules are different. The following statuses are available:
Status | Description |
---|---|
New | This data class is not yet available in your environment. You can import this data class without any risks that you erase existing data. |
Exists (no changes) |
A data class with the same name is already available in your environment and the definition of the different classification rules are the same. |
Exists (changed) |
A data class with the same name but with different classification rules is already available in your environment. |
- Global properties, such as data class description, confidence score threshold, and examples are not taken into account.
If you import the out-of-the-box data class, these properties are not updated. - Classification rule descriptions are not taken into account.
However, if you import the out-of-the-box data class, the classification rules, including the classification rule descriptions, are updated.
Best practices
- Only import the data classes you need.
Many out-of-the-box data classes have overlapping definitions. That means that their rules are so close that some columns are likely to be classified with the same data classes.Example Following out-of-the-box data classes are available: Credit card and Credit card: Visa. Both are overlapping because a Visa credit card is also a credit card. You can decide which data class you want to use depending on the granularity you need.
- Adapt the data classes after they are imported, based on your needs.
Many out-of-the-box data classes have a good and precise definition. Others are meant to be expanded. This is especially true for data classes defined as a list of values. Use these data classes as a starting point and update them to your needs. -
If you changed an imported data class, rename the data class.
If you import an out-of-the-box data class and a data class with the same name already exists, the import process will replace the existing definition with the out-of-the-box definition.
If you want to restart the data class configuration from the base definition or if the out-of-the-box definition was updated, you will want to replace the existing definition. But if you have customized the definition heavily, you will want to keep your existing version. So, rename an imported data class if it differs significantly from the original definition after you have customized it.
Renaming is also useful if you want to import it again without erasing your existing version, either to start a new variation or simply to compare your definition with the out-of-the-box one.Tip You can disable a data class if you just want to keep it for reference.
Available out-of-the-box data classes
Important Use the out-of-the-bos data classes as a starting point and update them to your needs to increase the classification accuracy.
Out-of-the-box data class |
Description |
Example |
---|---|---|
ABA | An American Banking Association (ABA) routing number. |
058327451 |
ATIN (US) | An Adoption Taxpayer Identification Number (ATIN) from the US Internal Revenue Service. | 930-93-3562 |
BIC/SWIFT code | A Business Identifier Code (BIC), also sometimes called bank SWIFT code, as defined by the ISO 9362 standard. |
MLCOUS3GCAR |
Browser | An internet browser. |
Mozilla |
City | A city. |
New York |
Country | A country. | Belgium |
Country code | A country code. |
USA |
Credit card | A credit card number. | 534960971837932 |
Credit card CVV | A credit card Card Verification Value (CVV). | 058 |
Credit card: American Express | An American Express credit card number. | 371449635398431 |
Credit card: Discover | A Discover credit card number. | 6011111111111117 |
Credit card: Mastercard | A Mastercard credit card number. | 2223000048400011 |
Credit card: Visa | A Visa credit card number. | 4012000033330026 |
Currency code | An ISO currency code. | USD |
CUSIP | A Committee on Uniform Securities Identification Procedures (CUSIP) number. The length is 9 characters. Characters 1, 2, 3 are digits. Characters 4, 5, 6, 7, 8 are either letters or digits. Characters 6, 7, 8 can also be *, @, #. Character 9 is a check digit. | 37833100 |
CUSIP (International) | A Committee on Uniform Securities Identification Procedures (CUSIP) number that matches many international CUSIPs. The length is 9 characters. Characters 4, 5, 6, 7, 8 are either letters or digits. Characters 6, 7, 8 can also be *, @, #. Character 9 is a check digit. | 12345*A29 |
Date: date | A date, in various formats. |
24 January 2004 |
Date: date and time | A date and time, in various formats. |
2018-08-29 20:25:25.0
|
Date: HHMMSS time | A time in format: HHMMSS. |
11:04:33 8:30:00 |
Date: MM/DD/YY | A date in format: MM/DD/YY. |
01/22/19 1/22/19 |
Date: MM/DD/YYYY | A date in format: MM/DD/YYYY. | 1/22/2019 |
Date: month | A month in text format. | September |
Date: numeric, no format | A date that is numerical without any formatting. | 20190123 |
Date: time | A time, in various formats. | 8:52 AM |
Date: weekday | A weekday in text format. | Fri |
Date: YYYY-MM-DD | A date in format: YYYY-MM-DD. | 22/01/2021 |
DEA number | A Drug Enforcement Administration (DEA) registration number. | AA1234567 |
Driver's license (UK) | A UK driver's license. | wklrS604032zb31785 |
Driver's license (US) | A US driver's license. | QP080580F |
Education level | An education level. | post-secondary |
EIN | An Employer Identification Number (EIN), also known as a Federal Tax Identification Number. | 94-4349283 |
An email address. | [email protected] | |
Email (personal) | A personal email address. | [email protected] |
Email domains | A common email domain. | gmail.com |
Employee ID (US) | An employee ID in the US. | 91-0675223 |
Employment status | An employment status. | office holder |
Ethnicity | An ethnicity. | |
FDA NDC: billing | A National Drug Code from the US Food and Drug Administration for billing. | 53407-0155-12 |
FDA NDC: package | A National Drug Code from the US Food and Drug Administration for a package. | 54868-4742-1 |
FDA NDC: product | A National Drug Code from the US Food and Drug Administration for a product. | 58118-0623 |
File path | A file path. | E:\x9xOL\VB2ER_2E\ |
Fiscal code (Italy) | An Italian fiscal code (codice fiscale italiano). | MRCNRR91L18H501H |
Four last digits | Four digits, typically used when we store only the last digits of a long identification number for verification. | 0486 |
Gender | Female/Male gender definition. | F |
GUID | A Global Unique Identifier (GUID). | 9bb2d37d-b686-4fc1-898a-70085a070890 |
IBAN | An International Bank Account Number (IBAN). | FR29 5218 3745 58B7 GH7N FYGZ Q50 |
IBAN (Italy) | An Italian International Bank Account Number (codice IBAN italiano). | IT60X0542811101000000123456 |
ICD10 diagnosis | An ICD-10 diagnosis code. | T37.0x1A |
ICD10 procedure | An ICD-10 procedure code. | B9261ZZ |
ICD9 diagnosis | An ICD-9 diagnosis code. | E8345 |
ICD9 procedure | An ICD-9 procedure code. | 123 |
Identity card (Italy) | An Italian identity card number (Numero di carta d'identità italiana). | AA1234567 |
IMEI | An International Mobile Equipment Identity (IMEI). | 996116726508880 |
Indian aadhar | A unique identity number that can be obtained voluntarily by the citizens of India. Also called UIDAI ID or UIDAI Number. | 2161 6729 3627 |
Indian PAN tax card | A Permanent Account Number (PAN) from the Indian Income Tax Department. | BAJPC4350M |
International passport | An international passport number. | 3BCNFILV35GTS5882493F6360606RYI5V8JQRQACLZ69 |
IP Address | An Internet Protocol (IP) address. | 228.203.28.137 |
ISBN | An International Standard Book Number (ISBN). | 106115687-7 |
ISIN | An International Securities Identification Number (ISIN). | US0378331005 |
ITIN (US) | An Individual Taxpayer Identification Number (ITIN) from the US Internal Revenue Service. | 915-78-5757 |
Language | A language. | Deccan |
Language code | A language code. | HAU |
Latitude | A latitude (LAT) geocode coordinate. | 40.707088 |
Licence plate (US) | A US licence plate. | 0HB8609 |
License plate (Italy) | An Italian car license plate (targa automobilistica italiana). | AB123CD |
Longitude | A longitude (LNG) geocode coordinate. | -74.012817 |
Mac address | A Mac address. | 4E-A0-23-78-53-50 |
Marital status | A marital status. | unmarried |
Medical condition or treatment (UK NHS) | A medical condition or treatment listed by the National Health Service (NHS). | Arthritis |
Medical treatment | A medical treatment. | Oxygen therapy |
Medicine (UK NHS) | A medicine listed by the National Health Service (NHS). | Aspirin |
MIC | A Market Identifier Code (MIC). | NYSE |
Money | A money format that includes the dollar and euro symbols. | $23.06 |
NHS number (UK) | A National Health Service (NHS) number. | 928358542 |
NPI number | A National Provider Identifier (NPI) number. | 1234567890 |
Occupation | A professional occupation. | transit coach operator |
OPCS-4 | An OPCS-4 code. | A12.3 |
Password (medium) | A password that matches the following validation rules: 1 number, 1 letter, and 1 uppercase letter. | Bob@12fg |
Percentage | A percentage. | 23% |
Person's name | A first name, last name, or full name. | John smith |
Person's name: first name | A first name. | John |
Person's name: last name | A last name. | Smith |
Person's name: suffix | A suffix for names. | Jr |
Person's title | A person's title. | Ms |
Phone number | A phone number. | 212-555-0107 |
Phone number: Area code (US) | A US area code for phone numbers. | 52 |
Phone number: Country code | A country calling code for phone numbers. | 32 |
PIN code | A Personal Identification Number (PIN). | 781589 |
Postal code (Canada) | A Canadian postal code. | P2H 8L6 |
Postal code (Italy) | An Italian postal code (CAP, Codice di avviamento postale). | 123 |
Postal code (UK) | A UK postal code. | |
PTIN (US) | A Preparer Tax Identification Number (PTIN) from the US Internal Revenue Service. | P12345678 |
Religion | A religion. | Buddhist |
RIC | A Reuters Instrument Code (RIC). | JPM.N |
RTN | A bank Routing Transit Number (RTN). | 44000037 |
SEDOL | A Stock Exchange Daily Official List (SEDOL) ID. | B1XH2C0 |
Shirt size | A T-shirt size. | M |
SSN | A Social Security Number (SSN). | |
State code (US) | A 2-letter US state code. | NY |
State name (US) | A US state name. | New York |
Stock symbol | A stock exchange symbol. | GOOG |
Street address | A street address. | 4032 Maple Street |
Temperature (Celsius) | A temperature in degrees Celsius. | 10°c |
Temperature (Fahrenheit) | A temperature in Fahrenheit. | 10°f |
Time zone offset (list) | A UTC time zone offset based on a list of values. | UTC+05:45 |
Time zone offset (regex) | A UTC time zone offset based on a regular expression. | -04:30 |
URL | A website URL. | https://google.com |
URL domain | A URL domain. | time.com |
UUID | A Universally Unique IDentifier (UUID). | 00000000-0000-0000-0000-000000031108 |
VAT number (Italy) | An Italian Value Added Tax (VAT) number (partita IVA italiana). | 12345678901 |
VIN | A Vehicle Indentification Number (VIN). | 5F308DpWFZAqpyAnn |
VIN (US) | A US Vehicle Indentification Number (VIN). | 1GNEK13Z93R293940 |
Zip code | A numeric zip code. | 20001 |