Bank Loans

It is common for banks to lend money in return for monthly payments with interest. However, to do so a bank must make sure that the applications are valid and well formed to begin the underwriting and approval process. The following list comprises some basic lending concepts to Collibra DQ.

  1. Credit Score Validation
  2. SSN Validation
  3. Loan to Value Validation
  4. Interest Rate Validation
  5. Duplicate Loan Applications
  6. Loan Amount Validation
  7. Loan Completeness Validation

1. Credit Score

Business Check Collibra DQ Feature Manual vs Auto
Is the credit score a whole number? BEHAVIOR AUTO

Is the credit score within a valid range?

(between 300 - 850)

RULE credit_score between 300 and 850
Is the credit score NULL or Missing? BEHAVIOR AUTO

2. SSN Validation

Business Check Collibra DQ Feature Text
Is the SSN format valid? RULE AUTO-SSN detection
SSN is PII. SENSITIVITY AUTO-SSN labeled
Is the SSN NULL or Missing? BEHAVIOR AUTO
Does the SSN belong to the Applicant? PATTERN SSN -> first_name, last_name

3. Loan to Value

Business Check Collibra DQ Feature Text

Are loan amount and asset value (home or auto) valid numbers?

BEHAVIOR AUTO
95% loan to value ratio to approve? RULE loan / asset_value < .95

4. Interest Rate

Business Check Collibra DQ Feature Text

Interest rate between min and max allowable range for the loans credit rating.

RULE COMPLEX

loan l join rates r on l.credit_rating = r.credit_rating

where l.rate between r.min_rate and r.max_rate

5. Duplicate Loan Applications

Business Check Collibra DQ Feature Manual vs Auto
Ensure we don't issue the same loan twice. DUPE first_n, last_n, SSN, Address

6. Loan Amount

Business Check Collibra DQ Feature Manual vs Auto
Loan amount within lendable range OUTLIER AUTO

Loan amount within lendable range.

Only lend money between 50K and 3M.

RULE loan_amount between 50000 and 3000000

Resulting DQ Check

Copy
-lib "/home/opt/owl/drivers/postgres" \
-cxn postgres-gcp \
-q "select * from public.loan_risk_grade where last_pymnt_d = '2019-04-01'" \
-key member_id -alias loan_risk \
-ds public.loan \
-rd "2019-04-01" \
-dl -loglevel INFO \
-h 10.142.0.29:5432/owltrunk \
-numexecutors 10 -executormemory 1g -drivermemory 4g \
-master yarn -deploymode cluster \
-sparkprinc [email protected] \
-sparkkeytab /tmp/user2.keytab -tbin MONTH \
-dupe -dupeinc purpose -fpgon -fpgkey grade \
-fpginc grade,sub_grade -fpglb 365 -fpgdc last_pymnt_d \
-record member_id -dupecutoff 60 -dupepermatchupperlimit 99 

Which components did we use?

We made use of Profiles, Duplicates, Outliers and Rules in this example. The experiments were automatically cataloged and put on a job scheduler. The next time a loan issue arises, we will be able to take remediation action using the workflow Q. Over time we can see how the bank loan program is running via the report section.

Files that can be used to replicate this example

interest_rates.csv
Binary

Owl Dataset (2).csv
Binary