In the first two blogs of our four part blog series we covered structured and unstructured data in underwriting and the importance of ontologies, or agreements of what the data mean. This brings us to the next topic, time, and its impact when analyzing loan data. Time series modeling is a very popular and powerful way to make predictions based on historical data. In loan data analysis, roughly 80% of people are working with time series data. In this post we discuss the following:
Assumptions made regarding time
Time series data
Time and compliance
Assumptions of Time-stamps
We previously covered in part one of our blog series that unstructured data must be converted to structured data in order for a machine learning credit underwriting model to learn and understand the data. Programmers are often required to make assumptions about the data in order to make the data structured.
One notorious type of assumption relates to time. Time is very important in unstructured data, as it provides us with time-stamps for loan data analysis and predictions. We require a mutual agreement of what the time-stamp means between all parties interpreting the time-stamp.
In the real-word, seven o’clock means something different for every human being on the planet. And human time and programmer time tend to be two different things. Human time is interpreted as unstructured data and the programmer interprets time as structured data.
Time Series Data
In a typical machine learning database observations are recorded. For example, a lender collects information on a borrower’s loan amount, see below.
|Loan Amount ($ USD)|
However, with a time series dataset, there is a time-stamp that provides an order between the observations. A time dimension is present and observations are taken at a specific time. Let’s say you’re the same lender and you provide the date at which the loan was originated. See the time-stamp and information below.
|Date||Loan Amount ($ USD)|
Time Series in Data in Lending
For credit scoring and lending decisions, time series data is ubiquitous in credit decisioning models. The history of a loan itself is time series data. It comprises the events happening during a certain period of time and allows us to better interpret the loan data. See below an example of a borrower’s loan history.
|1||10/15/2020||Loan payment made for $200|
|2||11/11/2020||Interest compiled and loan went up by $0.50|
|3||12/3/2020||Loan payment made for $200, but was two weeks late|
Each lender may take a different approach to how they may categorize and update a borrower’s loan information. For example, lenders often have a column called late 30 (or 60 or 90), that indicates how many times the borrower was 30 days late for a payment. A borrower for example, could be over 30 days late for a payment three times.
Now the challenge we face with this field of data is quite subtle. If a borrower was 2 months late, does the lender categorize this as 30 days late once or twice with two missed payments? If the borrower then pays back one of those loans and they’re 60 days late, are they now zero days late or still 30 days late? This is another instance where we need to clearly define what the loan data means and apply ontologies to ensure the data, models, and scores are accurate.
Find out how lenders can take advantage of alternative data, loan automation, machine learning and more in this on-demand webinar: 2020 Underwriting Trends: What to Expect.
Time and Compliance
Lastly, time-stamping is important from a compliance standpoint. Time-stamps ensure that the correct data is being used for the purpose of credit scoring. Without this piece of information, the approach to credit scoring would risk being unfair and not compliant.
Let’s consider the example of a repeat borrower that applies for a loan at two different times. Once in January and another one previously in June. We need to make sure that we are only using the customer’s relevant information in the instance when they made the loan request to predict the customer’s credit score. This complies with fairness criteria. If there was no time-stamp, there is a risk of using the information in June to determine the credit score in January.
For loan underwriting, time-stamps and what they mean provide us with a crucial understanding of loans and ensure that fairness and compliance is upheld.
Stay turned, our final blog of or four part blog series on structured and unstructured data in underwriting covers the common data mistakes hindering your ability to adopt AI and ML.
In the meantime, read our previous blog on the importance of ontologies in data or how lenders can take advantage of alternative data, loan automation, machine learning and more in this on-demand webinar: 2020 Underwriting Trends: What to Expect.