categories

HOT TOPICS

Thought Leaders in Big Data: Interview with Sandy Steier, CEO of 1010data (Part 4)

Posted on Wednesday, Mar 27th 2013

Sramana Mitra: You made a comment that some of this data [consists of] regular large data repositories, but some of it falls in the domain of big data. Would you elaborate on that? What do you define as big data?

Sandy Steier: I think defining big data is a serious challenge, because I don’t think there is a good definition of big data and everyone has their own definition, which really means there isn’t one. What I meant in that sentence was big data in the volume sense, because there are billions and billions of records. That is not a Google- or an Amazon-sized problem, but to a trader in a hedge fund or an investment bank, that is a lot of data for them to deal with, and the kind of analysis they want to do is extremely sophisticated. They try to be innovative, so they try to do an analysis that nobody else has ever thought of. That means they need access to all of the data in its rawest forms so that they can work with it effectively, and they need to have complete flexibility over the kinds of analyses they want to do. It is a big data problem in the sense that it is a lot of data that requires innovative kinds of analysis.

SM: And what infrastructure do you use? In terms of technology stacks, what are your decisions in providing this solution to your customers?

SS: It is a proprietary technology. It is a technology that has its roots in the technical work we were doing back on Wall Street. Both Joel and I worked on the business side – I in mortgage-backed securities and Joel in equity trading – but Joel ran what I believe was the first high-frequency trading desk at Morgan Stanley. At one point Joel’s group did 2% of the volume of the NYSE. So, one out of 50 shares that was traded on the NYSE was traded by Joel’s group, because the computers did it automatically. The point is that in both our experiences on Wall Street, we used and developed technologies to help our business along and to be competitive against all the other firms out there.

Both Joel and I relied on and perfected certain technologies which allowed us as business users to really excel at the technology side as well. Those techniques were what we brought when we launched 1010data. So it is a proprietary technology that uses series of various techniques to get its speed and power. It is not a hardware solution. We typically run on very little hardware – inexpensive and very little of that hardware – in contrast to a Hadoop installation, where you might have thousands of servers. We wouldn’t know what to do with thousands of servers. We can handle many billions of records on a few servers. An example of that technology is the notion of a column database. Column databases are now becoming popular. Over the last three or four years, column databases have come to be considered the standard way to do scalable and quick data analysis. That is very odd to me because I have been using column databases since the late 1970s. We have perfected that technology to a degree beyond which anyone else has perfected it so far because we have been using it for a very long time. But there are many other examples like that, which all come together to make it very powerful.

The only thing I would add to that is that this approach is required. The reason it is required is because I think if you took any other database technology and layered a spreadsheet-like flexibility on top of it – that is to say that a non-technical user who doesn’t know anything about the database, technical structures of databases, how it is architecture or metadata – a naïve user can also do very complicated things. Sometimes by mistake, sometimes intentionally. If you put that kind of power in the hands of a naïve user who is running on a standard database, that database would not support that kind of load. The database could just fail outright. It would collapse and go down, or it would take impossibly long for any query to run. Databases need to anticipate what is going to be happening down the road, and with a spreadsheet the user could come up with all sorts of new things over time, so it could not anticipate it. To support that kind of flexibility, we had to develop our own technology. And that is what we did.

This segment is part 4 in the series : Thought Leaders in Big Data: Interview with Sandy Steier, CEO of 1010data
1 2 3 4 5 6

Hacker News
() Comments

Featured Videos