Sramana Mitra: If you were to present the industry layers of infrastructure that are going into making that transition from analyzing samples to analyzing actual data, what does that landscape look like?
Todd Goldman: There is the raw data warehouse, which today is evolving to be the Hadoop layer. I don’t think the legacy appliances will go away – they are still used for reporting. But for pure analytics, we have the raw data warehouse and the Hadoop layer.
The next layer is data integration and data refining layer. I have my raw sewage of data, I need to clean it up, filter it, make it get it ready for analytics and the analytics engines. All verticals we discussed are investing in creating different kinds of analytics engines for specific kinds of problems, and they might have foreknowledge of the kinds of data they are analyzing, whether it is pharma, telecom, financial services, etc. On top of that you have the visualization layer – how to make it graphically appealing and easily understandable by a business person who has to make a decision.
SM: What do you see in terms of interesting companies or startups in each of those layers?
TG: There are a bunch of companies at that layer – the Hadoop layer. Frankly, there are too many companies at that layer, so there has to be some shakeout at that layer before things get interesting.
SM: The Hadoop layer is pure infrastructure. That doesn’t have any domain specifics or heuristics.
TG: That is correct, it is purely horizontal. The value all those companies are adding on top of Hadoop is still pure infrastructure, but all around the management of the Hadoop cluster. The next layer up is the layer Informatica plays in. Other companies in this layer are Revolution Analytics, which focuses on big data analytics using open source R on Hadoop; Ayasdi, which focuses on high-dimensional big data analytics; InsightsOne, which focuses on customer analytics, Apixio, for big data analytics for healthcare; and DataSift, which focuses on social data access and analytics. All those are very interesting. There is a lot of venture capital going into that layer.
SM: Let’s switch gears and point out problems where you see areas that do not have this kind of over-investment in and over-infusion of startups.
TG: The interesting space is not so much the horizontal capability. If you are a new startup and you are saying, “Let me introduce a horizontal value at the visualization layer,” I don’t think that is going to be interesting. What is going to be interesting is taking the knowledge you have about a particular space and applying it.
Let’s say you are an expert on the recent “Obamacare” law [the Patient Protection and Affordable Care Act, most provisions of which will take be phased in by January 1, 2014] and you have knowledge of what that is going to mean in terms of how insurance companies have to analyze that data in order to offer competitive rates, take that knowledge and build out a combination of the activities you have to have for specific applications in the healthcare market. What kinds of analytics do you need to have that pick data from multiple sources in that market, and then how would you present that and put a package together in a more vertical way? Don’t think that you are just going to build the analytics layer. You have to build out the whole vertical slice.
What is missing in the whole big data space right now is that level of expertise, practically applied. What is happening right now is that companies themselves are buying all the horizontal layers and adding their vertical expertise to solve a particular problem. What will happen next in the industry is what typically happens. Some smart person will say, “Why am I building this verticalized capability for my chemical or pharmaceutical company? I can get venture backing and sell this over and over again.” I think we are still a bit early, so the time is still right for seed funding or A round funding, but I think it is a bit late if you are trying to build a better horizontal mousetrap. That ship has sailed.
This segment is part 4 in the series : Thought Leaders in Big Data: Interview with Todd Goldman, VP and GM for Enterprise Data Integration at Informatica
1 2 3 4 5