Krishna Venkatraman: In the case of understanding credit-worthiness, for example, the signals that will allow us to make good decisions reside in many different places. The first is information you get from Bureaus. That will tell you something about how businesses have been discharging on their past obligations. How did they behave when they had past loans? We get information about the industry in which they are.
There are multiple data providers that can give us that. We also work a lot with transaction data and bank data that give us an idea of cash flow. How is the business performing at the current point? What are the trends we see in the business revenue, income, and expenses? Are they consistent with what we think would be the norm for a business in that industry? It’s important for us to be able to do all of this in an automated fashion as possible, and also be prudent about where we need exceptional handling. That’s what we’ve built our system from.
Sramana Mitra: Can you do this in a more granular way? I would like our audience to get a visceral feel for how you are treating data. Pick three industry segments that are particularly interesting and walk us through what data you are taking, how you are normalizing that data, and how you are correlating that data.
Krishna Venkatraman: Let’s take a restaurant. We have partnerships with a lot of credit bureaus. We get data from them. They’ll tell us something about the lines that business had, past loans they had, and the number of lines that are open. For each of those different data sources, we build a model that will tell us something about credit-worthiness. For instance, if there’s a consistent history of missed payment, that should tell us something about what we should expect.
Sramana Mitra: This is obvious. That’s not so difficult to correlate or draw conclusions. I’m more interested in the more esoteric stuff that you do that brings in better nuance.
Krishna Venkatraman: What I would say is it may be obvious but doing that at scale and automating the process is not. Every data source has different formats which may change over time. You have to actually tune your models and be able to do that in a very systematic fashion so that for the data source you’re working with, you’re getting the best information possible.
If you have a business and you have only one of three data sources, what should you do? Is that telling you something about the fact that two are missing? Does that mean that business has weak credit history? Is that particular industry better represented in that data source? It seems obvious, but putting machinery in place and actually building systems that can accommodate those types of variations is harder than it seems.
Part of machine learning is having the discipline of doing those things very well so that you can automate the mundane. That’s the first order of business. Of course, there are other data sources we look at. Cash flow or granular transaction data have their individual nuances as well. Bank transaction data will not only tell you about the current state of the business but maybe something more persistent.
Are there trends that you should be looking it that you need to be aware of when you’re making a decision? That’s where you also have to rely not just on the particular business but also on businesses like that business. We built this record of database so we can track macro aspects of a particular industry like seasonality and other factors that may not show in one individual record.
I would say machine learning is all about extracting information in the most efficient manner possible. A lot of people may talk about using social data and some new novel way but I’d say a lot of the data sources already exist, and they haven’t been used in quite the most efficient possible yet.
This segment is part 2 in the series : Thought Leaders in Big Data: Krishna Venkatraman, Senior Vice President of Analytics at OnDeck
1 2 3