Creating an end-to-end solution for a leading Indian NBFC to assess customer’s risk using SMS & Bureau data and to compare how much improvement in predictive power SMS data can provide over and above bureau data.
Customer’s bureau data provides significant amount of information about the risk of a customer. However, some alternate sources of data (e.g., SMS) provide a lot of information about customer’s lifestyle and behaviour. These unconventional data can help identify low-risk and high-risk customers through analysis of payment behavior. They also provide current and future information for qualitative assessment of borrowers. Alternative data technology can be utilized to mitigate challenges like low coverage of credit bureaus and manual verification of information.
To provide a complete end-to-end solution for client to utilize SMS and Bureau data for predicting the risk of customers, the following steps were adopted
Data cleaning and treatment
Four segments of customers were defined (Bureau thick & SMS thick, Bureau thick & SMS thin, Bureau thin & SMS thick, Bureau thin & SMS thin). The segmentation was done according to the availability of data for the customer provided by the bureau and SMS data.
“Change is the only constant”. We have upgraded and adapted with latest technology changes to provide the best solutions.
Useful independent variables were obtained from the SMS data that quantify the lifestyle and financial behaviour of the customers. 10,000 + variables were generated from bureau and SMS combined data
Separate models were built for each segment using - 1) Only SMS data 2) Only Bureau data 3) Both data. The predictive power in each model was calculated and compared to check for improvement due to SMS data
Information Extracted from SMS Data
The creation of meaningful independent variables is the biggest task when it comes to utilizing unstructured data like SMS. The diverse nature of information carried by SMS data were:
There has been a steep rise in customer-generated digital footprint. This type of alternate data is being utilized by lenders to contribute to the pool of information and provide micro-insights into a customer’s risk profile. Models thus generated provide an opportunity for lenders to expand their target audience to the previously unreachable population segments.
Predictive power for model built on each segment is calculated and improvement due to SMS data is assessed. Evaluation shows significant enhancement in SMS thick segments especially in segment where bureau data is thin/limited.
|Segment||Only Bureau data||Only SMS data||Bureau & SMS data|
|Bureau thick & SMS thick||52%||38%||59%|
|Bureau thick & SMS thin||50%||28%||53%|
|Bureau thin & SMS thick||35%||41%||47%|
|Bureau thin & SMS thick||32%||27%||36%|