Customer Risk Assessment Using Alternate Data (SMS Data)

The Challenges

Creating an end-to-end solution for a leading Indian NBFC to assess customer’s risk using SMS & Bureau data and to compare how much improvement in predictive power SMS data can provide over and above bureau data.

Customer’s bureau data provides significant amount of information about the risk of a customer. However, some alternate sources of data (e.g., SMS) provide a lot of information about customer’s lifestyle and behaviour. These unconventional data can help identify low-risk and high-risk customers through analysis of payment behavior. They also provide current and future information for qualitative assessment of borrowers. Alternative data technology can be utilized to mitigate challenges like low coverage of credit bureaus and manual verification of information.

The Solution

To provide a complete end-to-end solution for client to utilize SMS and Bureau data for predicting the risk of customers, the following steps were adopted


Data cleaning and treatment

Four segments of customers were defined (Bureau thick & SMS thick, Bureau thick & SMS thin, Bureau thin & SMS thick, Bureau thin & SMS thin). The segmentation was done according to the availability of data for the customer provided by the bureau and SMS data.


Defining segments

“Change is the only constant”. We have upgraded and adapted with latest technology changes to provide the best solutions.


Variable Creation

Useful independent variables were obtained from the SMS data that quantify the lifestyle and financial behaviour of the customers. 10,000 + variables were generated from bureau and SMS combined data


Model Development

Separate models were built for each segment using - 1) Only SMS data 2) Only Bureau data 3) Both data. The predictive power in each model was calculated and compared to check for improvement due to SMS data

Information Extracted from SMS Data

The creation of meaningful independent variables is the biggest task when it comes to utilizing unstructured data like SMS. The diverse nature of information carried by SMS data were:


Includes lifestyle event related information, data from apps, time of activity


Updates regarding utility bills, DTH and telecom payments, etc.

Negative indicators

Information like default, late payments, bounced cheques, etc.

Consumption behaviour

This reflects the customer’s consumption of goods and services like groceries, movies, travel & accommodations, etc.

Credit behaviour and payment regularity

Information like the number of bank products, consistency of investment and insurance premium payments

Spending pattern

Information like payment wallets and apps used by the customer, bank statement and transaction history


There has been a steep rise in customer-generated digital footprint. This type of alternate data is being utilized by lenders to contribute to the pool of information and provide micro-insights into a customer’s risk profile. Models thus generated provide an opportunity for lenders to expand their target audience to the previously unreachable population segments.

Predictive power for model built on each segment is calculated and improvement due to SMS data is assessed. Evaluation shows significant enhancement in SMS thick segments especially in segment where bureau data is thin/limited.

Segment Only Bureau data Only SMS data Bureau & SMS data
Bureau thick & SMS thick 52% 38% 59%
Bureau thick & SMS thin 50% 28% 53%
Bureau thin & SMS thick 35% 41% 47%
Bureau thin & SMS thick 32% 27% 36%
Our Post

Recent Posts


Get in touch for Consultation