categories

HOT TOPICS

Thought Leaders in Big Data: Interview with Ron Bodkin, CEO of Think Big Analytics (Part 2)

Posted on Friday, May 3rd 2013

Sramana Mitra: What problem doesn’t Hadoop solve?

Ron Bodkin: That is a good question. At any given time you have certain capabilities in a number of technologies, and then you also have a vector for how new capabilities are being added. I would say that today Hadoop is not well suited for real-time use cases. So if you want to have [a set] of data into an environment, and you want to be able to respond streams of data with that same low latency. Hadoop doesn’t do a good job of handling that. We typically see it integrated where you will use it for archival, long-term storage and processing, simulating and scoring predictive models, and then pushing out that information into a real-time environment where decisions can be made. Such decisions are recommendations for a consumer or decisions about the risk of a device failing. These kinds of decisions tend to be done through at least NoSQL databases and often through real-time processing frameworks.

SM: We have been talking to some of the companies you mentioned. You are bringing up a very interesting view point, because you are working across different platforms. Many of these platform vendors come and talk to us. Since you are working with various platforms, when you are faced with the real-time use case, which vendors are you bringing in to those engagements?

RB: There tends to be a maturing process that companies often start out to have an ability to do basic real-time response off a pre-computed model. So, they need to have the ability to do things like compute a predictive model that would then strive for a near real-time response like a recommendation, or bidding on an ad or an exchange, forecasting the likelihood of failure for a device, etc.

Often the first step is to simply have a NoSQL database that runs across multiple data centers to deal with a large stream of data. There are other use cases where the need is for search – technologies that can be provided by a number of vendors that can do a near real-time look-up of an item. This means being able to quickly find the relevant part or item in a service interface, being able to pull up information, and then being able to address that. More sophisticated use cases start to get into streaming analytics, where you want to be able to distribute processing. We have a lot of work to do using more traditional lower balanced application servers [where] scaling the logic doesn’t work so well. That is where technologies like Storm start becoming interesting.

SM: What is, in your opinion, the number one real-time framework out there right now?

RB: I think there are a number of frameworks. Generally, we are believers of open source technology. We think that a lot of what is exciting about big data is the ability to have things based on standards. We see Storm emerging for real-time processing, and we see a handful of NoSQL databases as a fundamental underpinning capability to enable the storage and response in a reasonable way. Those would be technologies like HBase and Cassandra or MongoDB, as well some of the leading NoSQL databases that fit in with that environment.

This segment is part 2 in the series : Thought Leaders in Big Data: Interview with Ron Bodkin, CEO of Think Big Analytics
1 2 3 4 5 6

Hacker News
() Comments

Featured Videos