Sramana Mitra: I would like you to take off the HP hat and wear more of an industry thought leader hat. Give me some pointers to where you see open problems.
Robert Youngjohns: I think the issue of unstructured data is one of the big problems facing the industry. It is really hard science to do this well. You can conceptualize it easily – sometimes Autonomy may have been guilty of oversimplifying that. When you get in to it and say, “I have a trillion emails in this archive and I am looking for evidence of insider trading,” how do you fold that together? What kind of search do you do to find that? No one is going to write an email saying “Hey, don’t pass this email on, because it could be considered as insider trading.” So, a keyword search is not going to get this. We have to be able to analyze the content to try and understand what these emails are really about. That is tough. It is tough in two scales. First there is the question of the meaning of these documents and how to distill that. The second thing is doing that on a massive scale. Either of those problems is hard. If you add the two together, you have a significantly more difficult problem. It is very hard to predict exactly what pattern you are going to see in these things. I think that is a big issue the industry is looking at and one we spend a lot of time thinking about.
SM: Help me bridge the gap between where Autonomy’s capabilities max out and where you are saying there are still lots of technology and solution gaps. Autonomy, from the time I remember tracking the company, has always positioned itself as being the company to go to for unstructured data analysis.
RY: That is correct. What is stressing us – where we put most of our developing effort into – are those two areas, which is taking categorization and meaning to the next level. Doing that on a big scale is where big data comes in. If you give me a repository of 1,000 emails, it is relatively easy. If you give me a trillion emails, I don’t have that capability anymore. I have to rely on machine intelligence far more to do the categorization and delivery of value. That is what we focus most of our development efforts on – making the actual analysis algorithms much more sophisticated, and also making sure they can operate at truly vast scale. We look at the data growth in enterprises.
Seventy to eighty percent of all data in enterprises is unstructured. When you throw video and audio into that mix, you have a vastly more complicated problem. So, we are sharpening up, not just to have better analytics tools, but to make sure they can operate on a vast scale. I was talking to a call center manager recently. They are logging one million calls per day in their call centers. They are looking for our software to help them categorize those calls, work out where the call center representative may have gone beyond the brief, make sure that if there are subsequent complaints about the calls, we can go back, track them and derive what people were talking about. Wherever you are in this industry, that is the challenge you see – an explosion of unstructured data sources and an increasing demand to get meaning out of it.
This segment is part 3 in the series : Thought Leaders in Big Data: Interview with Robert Youngjohns, SVP and GM at HP Autonomy
1 2 3 4 5