Sramana Mitra: Are there any other verticals?
Todd Goldman: Telecommunications is another one. This is a never-ending topic. When I started my career in HP years ago in network management, I sold to the telecom market, and they were trying to solve the problem of churn. They are still trying to solve it to the extent that they are looking at your behavior, dropped calls, and other issues you might have when using their services. Their goal is to anticipate potential churn and keep you from leaving them – both by the service they are providing you at a technical level and by making you an offer before you decide to drop them and move to another carrier.
SM: All of the areas you are talking about we also talked with companies that provide various platforms and specific big data applications for certain areas within these spectra. We have been doing this series for more than six months, and we had a lot of people discuss different perspectives and how they are tackling big data in different industry segments.
TG: The issue we see horizontally across all of them is that 80% of the effort involved in these big data initiatives isn’t analytics. Eighty percent is about getting the data in and cleaning it up. Just because you have big data doesn’t mean it is good data – aligning the multiple data sources you have so you can actually do the analytics and then finally do the analytics [is what is important].
We just had a Hadoop summit. There was a lot of talk about how companies underestimate the amount of effort involved in data pipelining and data refining before they actually get to the data analytics and visualization. That is a problem Informatica is directly involved in solving, because we believe that it is not going to be solved by training more data scientists. It is going to be solved by simplifying that work, so somebody doesn’t have to have all this knowledge about Hadoop, MapReduce, etc. They don’t have to know all that getting data ready.
SM: What would you say has been the big transformation in the past few years? From where I sit, I see that in the past, when all the Hadoop infrastructure and other things were not available, the analytics job was happening on samples. Today analytics is happening on actual data. Is that a reasonable conclusion?
TG: Yes. That is what I was referring to as the long tail before. Before we cut off the tail, we would have to take a sample. Either you would take a sample or you would decide which attributes are interesting, keep those, and get rid of the rest. Either way you are sampling the data and making an educated guess on what is important and what is not. Now we don’t have to do that as much.
This segment is part 3 in the series : Thought Leaders in Big Data: Interview with Todd Goldman, VP and GM for Enterprise Data Integration at Informatica
1 2 3 4 5