Sramana Mitra: What is one of the most significant representative use cases outside of finance?
Sandy Steier: We will use retail, where there are a couple of interesting use cases. The first is an internal kind of data analysis. Retailers produce a fair amount of data – as we all know – every time a cash register beeps, that information goes somewhere. For a reasonably large retailer, that can be tens of billions of items sold over a period of two years, for example.
Then there is the inventory, which is even more information and sorts of other information. So, there is a lot of data in a retailer. A reasonably sophisticated question – and we are only touching the tip of the iceberg here – has to do with affinity analysis. Affinity analysis talks about the likelihood of various products being bought together. If a customer buys a two-liter bottle of Pepsi or Coke, what is the likelihood that that customer buys certain other products? What is the most likely other thing he or she buys? To do that analysis, you really need detailed information. You can’t use summary information. You need to know for each shopping cart what items were in it and then do a comparison.
A very practical business use case, where the dollars and cents are a bit more obvious, is just to know the following: suppose I put an item on sale. The question is, when people come into the store to buy that item, are they buying anything else, and if so, what else are they buying? Is it something that is not on sale? Because if people come into the store and just buy the sale item, that is a loss situation. If you get people to come into the store to buy the item that is on sale and then they go and buy a bunch of other items – hopefully expensive items – then it is a win. Just asking that question: How do the people who buy the item on sale buy other things? is important.
There are also cross-promotional questions. There are questions about where to locate items within stores. What do you put on the ends of the isles, which are most visible to people? How should the shelves be arranged? What items should be near other items? These are all sorts of questions from a marketing perspective or from a store operation perspective.
Then there is category management in terms of “What items should I stock in this store?” What items are selling well, which are not selling well? Why is that happening? What kinds of customers buy which items? Are they customers that come in very often? They may buy one set of items, but customers who just come in occasionally buy a different set of items. It is important to know how that works so you can cater to the right group of customers. There is an endless stream of questions one might ask. The standard use case for a retailer is that we will be the analysis platform where the retailers send us all their data and they will do their analysis on 1010data. That is the value that we supply. In the case of some retailers – and I will use a specific example just to make it more real – Dollar General originally brought us in to do this kind of analysis. Then they began to use us more and more, to the point where we became their standard platform for all their operational reporting and analysis. In other words, we became their enterprise data warehouse, and they shut down their old one. We are now the enterprise data warehouse for all operational reporting and analysis for Dollar General – and there are others that are on the same path.
I would like to talk about an even more interesting use case that in a way has similarities to the example I talked about earlier with mortgage-backed securities. The thing about retailers is that data they have – their sale data – is a window into the market. It really is the market. For companies like Dollar General, that has more than 10,000 stores, the items they sell are a very good window into the market. Who is buying what? How much are they buying? It isn’t just a retailer that finds value in that data, but the suppliers: Procter & Gamble, Pepsi, Coca-Cola, etc. They provide products to the retailers, and those companies tend to be even more analytically sophisticated than retailers. Procter & Gamble, for instance, is known to have many PhDs on their staff. They are experts at analysis. But many other so-called CPGs (consumer packaged goods companies) are like that as well, and that makes sense because they have to develop new products. When you have to be in R&D mode to develop new products, trying to sell your products and distinguish it from others, this requires a lot of analysis.
The CPG companies tend to be more sophisticated and are therefore in a position to help retailers sell more products and give them advice on how to do so. If Procter & Gamble can tell Dollar General how to sell more Bounty paper towels, then Dollar General is very happy to get that advice. In order for Procter & Gamble to give Dollar General the best possible advice, they should have access to Dollar General’s data and as much other data as they can possibly want. Historically, the way it has worked is that those companies have gotten samples. They have gotten various aggregates from companies like Nielsen and IRI. In some cases the CPG companies actually got detailed information, but it was only for a tiny sample. They got information for a few products, for a few stores, and for a few days. That is what they had to work with.
But what if Procter & Gamble can get all the information? Every item that Dollar General sold over a two- or three-year period, every single cash register beep, no restrictions, and in every store. They can do amazing things with it. The advice they can give to Dollar General is far better than if they didn’t have that information.
So, a company like Dollar General has incentives to share its data with a company like Procter & Gamble. That is in fact what is happening. Dollar General has opened its database to hundreds of CPG companies, and all these CPG companies are able to analyze data and give Dollar General better advice. It is a symbiotic relationship. Here we have another case where one company has the data and another company is using it. That is an interesting point because what it really is is a combination of big data and the cloud– the cloud that supplies the neutral ground where the data can exist so that everybody can get to it.
This segment is part 5 in the series : Thought Leaders in Big Data: Interview with Sandy Steier, CEO of 1010data
1 2 3 4 5 6