Sramana Mitra: We have done a story on Clarabridge, and I met the founders of Attensity a long time ago, when they were just starting up. Can you talk about the technological point of view? Who is doing what, and how would you rate those approaches?
Rick Kieser: I mentored a firm that was a pioneer in text analytics. That company sold its technology to Insight, which was acquired by Business Objects, which was then acquired by SAP. We use the SAP technology in our NLP engine. This is technology that our predecessor firm created back in 1998. In essence, it is the same thing. They have made some tweaks to it, but it is not dramatically different from our core base one.
Clarabridge has their own independent technology that they developed. They say they have a super granular sentiment analysis engine. It probably has the smallest window in regards to the 2×2 matrix I mentioned. It is not ideal in all circumstances.
Semantria is a small firm, and they have a simple plug-in that goes into Excel. It is a commodity-based text analytics provider. Then there is a company called Lexalytics that has a text analytics engine. The biggest difference between us and Clarabridge and Attensity is that we have been doing this since 1999, and we have classified more than two billion comments. We have professional users grade experience in terms of managing these things at a granular level.
But again, we use a blended approach. We are using SAP’s engine, and we have also licensed some technology from the Italian National Institute of Technology, which is the basis for our machine-based approach. We have taken the approach of identifying best-in-class for our customers and then baking it into our platform. We are constantly looking for best-in-class, and we are leveraging that in our environment. It is about how to be most effective for our customers by delivering the best technologies as opposed to just having one technology.
SM: You said in the context of language processing that the nature of language has changed. The language used in Twitter streams or in SMS-style messages, which social media users often adopt, is not “full” language. “See you soon” becomes “cu soon,” for example. Traditional NLP engines don’t really work well there. Can you elaborate on how you are dealing with that?
RK: That is exactly right. You have to create specialized rules. I come back to this 2×2 matrix once again. In the upper right hand corner, social media is very difficult to interpret because the characteristics you just mentioned are preventing the use of full sentences.
Our approach, since we use blended technology, depends on our customer’s channel of feedback. We can cut that a number of different ways. If it is ideally suited for NLP, we do it. We tend to use NLP more as an exploratory and a quick categorization tool, but not as a classifying engine. Our machine-based learning is more of a classifying engine with semi-automated technologies. As an example, one of our large market research firms had a significant consumer packaged goods client that wanted to process vast amounts of social media data related to one of its products. We were being fed batches of 50,000 Twitter feeds or Facebook comments on this product. Our approach was to a look at it and quickly curate it. We searched it and identified brand names and relevant ideas, because we knew there was a large amount of garbage and few really high-quality comments. So we quickly searched it – we used a multistep process to be able to do this – and identified it. Once we narrowed it down to maybe 10,000 comments, we ran it through our text analytics engine to quickly come up with “what is the sentiment as it relates to it and what are the topics being discussed.” From there, we curated it again, got it down to about 5,000 comments, took a look at it manually, quickly identified and cleaned up a code book, and then we created a machine-based learning technology to post it.
Whenever more of those comments would come in, we could use the model we already created – because of this multistep process that we use with NLP – to refine it and then train the semi-automated learning technology to then go through and continue with process automatically. This is how we are different at Ascribe. One technology is not right given the vast variety of channels of feedback. You need to be able to apply the best of these technologies to get what the customer is looking for.
This segment is part 4 in the series : Thought Leaders in Big Data: Interview with Rick Kieser, CEO of Ascribe
1 2 3 4 5