Sramana Mitra: What do you see around your space that are open problems?
Harry Glaser: One of the fun things about our space is that every problem is an open problem. One of the biggest open problems that we see that we are working hard to solve is, there is so much going on and it all can be a little bit disconnected. You will have data scientists training production models.
On one part of the team, you’ll have analysts turning around reports on historical performance. On another part, you’ll have engineers integrating new data sources. Of course, new data from new data sources impact the reporting. That impacts the quality of the models that you’re supplying. If the team is not collaborating well together, you will have engineers integrating new sources and spending effort on that but not generating the business value at the other end. Each segment of this team is iterating so fast.
The market is changing so fast around them that it’s hard for them to collaborate and stay on the same page. We hope to provide, and we hope that they can find a platform where they can collaborate together. The impact of change on the integration front is you’ll be able to see and model the changes to reports all the way to the other end of the pipeline. Keeping everybody on the same page and working together and collaborating well is a key challenge that we see.
Sramana Mitra: Answer this question now from the perspective of a problem that you’re not solving. What are the open problems out there that, if you were starting a company today, you would be considering digging into?
Harry Glaser: Data integration and data storage at scale is a very interesting problem. Data shapes are changing rapidly. You may need to store a large volume of data but not necessarily query it very often or you may need a lot of compute for querying, but you may not need a whole ton of storage. This model where we buy it computer by computer is really breaking out.
Every scaled-out enterprise I’ve talked to about this has multiple data storage systems with a lot of overlap both on-premise and in the cloud. Some are cold storage. Some are warm. Keeping it all managed together is a big pain. I haven’t seen a good solution to a real data storage and integration system yet.
Sramana Mitra: Any other open problems that you want to point to?
Harry Glaser: There’s nothing but open problems. Another thing that I think is interesting is the way that communication happens around data. You see business folks requesting reports of the data team. That opens up a line of communication that goes on for a while. You’ll have follow-up questions and projects that respond based on that report and follow-up analyses that are requested on those follow-up projects. I see data teams use simple ticketing systems for this. This is an area that could use some specialized workflow software.
Sramana Mitra: Good. Thank you for your time.
This segment is part 3 in the series : Thought Leaders in Big Data: Periscope CEO, Harry Glaser
1 2 3