- define key business objectives around scoring credit for existing and new customers
- assess what if any data you might be missing
- consider how AI and ML can help you build a credit scoring model that meets your business objectives
To that end I’ll provide insights on:
- how much data is enough data to build a credit scoring model
- how you can build credit scores for thin files and no file consumers
- the importance of artificial intelligence and machine learning
- credit scoring model validation
Check out part 1, where I review aspects of what comprises:
- the right data
- useful credit scores
- the importance of defining success to your credit score model algorithm
How much data is enough data to build a credit scoring model??
Traditional credit scores are being used outside the scope for which they were originally designed. Faced with limited options, landlords, auto and home insurers, cell phone and utility providers, employers and other subscription service providers have had to rely on credit scores to make non-financial decisions. These are all types of non-traditional credit as the payments are usually after the services have been provided. And, not surprisingly, there is a lack of evidence to support using traditional scores for these types of uses. There have been limited options, until now.
So it seems unfair that in this non-traditional credit system, consumers rarely get rewards for their timely payments and are usually penalized for late or missed payments.
The above highlights the incomplete nature of the data being used to develop traditional credit scores. Credit scores that should be inclusive of payment records for services such as electricity, telephone, cable television, rents and other digital or otherwise subscriptions.
You should use the appropriate contextual data to deliver a credit scoring algorithm that is aligned with a consumer’s loan type .
In addition to Trust Science proprietary data and alternative consumer data provided, clients can provide additional data capturing the Cs of credit such as character of a borrower in the context of willingness to repay, capacity to repay, and initial capital provided by the consumer as their measure of commitment. All of these provide contextual data, the right data, to build a credit scoring model.
How do you get credit scores for thin or no files?
More than half of Americans are either credit invisible or have poor credit. A credit invisible is someone having no established credit history with a major credit bureau.
About 70 million adult Americans lack enough or recent enough credit history to be given a credit score. In other words, current and traditional credit scores cannot predict or indicate much about consumers with no tradeline data. Ethnic minorities, lower-income consumers, divorcees, single parents, the young and seniors are more likely to fall into these categories. And lack of adequate information about these individuals lead them to being subjectively classified as high risks. Yet a high risk profile is one that confirms a consumer has proven and irresponsible financial credit habits.
The reason it’s important to consider credit scoring models that use multiple data sources is that traditional bureau scores can result in unfair decisions. Poor credit scores from major credit bureaus strongly correlate with race, gender and other demographic data.
Over 50% of American credit users have subprime credit scores, and one in three has a score lower than 620. Black and Hispanic Americans report a much higher rate of subprime scores than do whites or more likely to be credit invisible or unscorable than white individuals.
This inaccurate and traditional credit scoring reinforces historical biases captured in the data. And it is used by credit bureaus to develop their credit reports.
There is a need to use data that has been debiased or certified to be free of discrimination in building a credit or trust system that determines who gets out of poverty cycle or who has access to credit at affordable rate, term and capacity. The data required to identify and remove discrimination includes:
- gender identity
- household composition
- marital status
- national origin
- recipient of public assistance
- sexual orientation.
We work with clients to acquire these protected attributes for their data so that we deliver solutions that are debiased and certified to be free of disparate impact or treatment. We’re able to demonstrate the value in our AI and machine learning offering through performance reporting.
Why AI/ML gets you the right credit scores
Trust Science knows that transitioning from traditional credit underwriting rules to AI/ML-driven solutions is often met with various forms of skepticism. Clients with decades of industry experience in using status quo business rules to underwrite risks often express trust issues in deploying automated “black box” algorithms. It is our responsibility at Trust Science to demonstrate the usefulness and the improvements of our solution over business rules that clients are used to. There are clients who also like our AI/ML solutions to be inclusive of attributes they use in their business rules for ease of interpretation. We require clients to share relevant data that they use in their underwriting decisioning with us so that we ensure our AI/ML solution supplements their underwriting rules, quantify returns on their investments, and demonstrate how our solutions compare with their underwriting rules. We also provide qualifiers and reason codes that explain how the attributes used in the model influence the modeled business objectives. An access to client’s underwriting rules could help us frame the reason codes and the qualifiers in a language their credit analysts and their underwriters are familiar with.
We believe that clients have stakes in making sure the credit models developed at Trust Science meet their expectations. Therefore, we coordinate efforts related to data understanding with clients. This coordination begins from definition of success, clarity on business objectives and priorities, and access to client underwriting rules, related data or scores used by clients in their underwriting decisioning, consented (tradeline and non-tradeline) attribute data points on each customer, as well as reject data.
After the POV (proof of value) phase of our modeling, we like to know all of the data that clients plan to use for decisioning as this is essential for ensuring a smooth run of the algorithm in a production environment as well as successful monitoring of the deployed model for drifts in clients’ data.