KL: Yes. You said you covered a lot of unstructured data, and you have talked to a lot of companies like Autonomy, for example. I would venture to say that this field has nothing well understood. Because of the misunderstanding, practically all of the implementation of large enterprises is [set up] to fail. This snippet of wisdom is critical and has been missed by practically all of the vendors. I have not heard any of them mention this, and many of the installations are getting burned by this oversight.
This is the oversight: I mentioned to you all of the applications of the storage management, compliance management, e-discovery, litigation support, and record-keeping. The issue here is that all of these applications came in across the last 12 years. They came in as silos, or silo applications that produced silo data. The larger vendors like Autonomy, EMC, Symantec or IBM, seeing the management of unstructured data as a hot space, started acquiring these vendors.
However, if you lift the cover, they are still siloed data and siloed applications. That spells trouble ahead, because the volume of data we are talking about is humongous – one client of ours, for example, was ingesting 50 million documents a day at one point. That compares to the entire storehouse of books in the U.S. Library of Congress, with 22 million documents. Anything that can break will break.
That leads to the second issue, which is that if you had to put together these siloed applications, one typically tries to do that with integration of processes, such as using APIs. Now here is the problem: APIs fail when you talk about Google-type volumes. APIs were never meant for that. If you have APIs that handle Google-type volumes, they will crash and burn. The only way to handle this kind of problem is to have one unified platform that has one data scheme for all of the applications. If you make a change in a document in e-discovery, that is it. That is all you do, because all other applications will see that change instantaneously. Instead, if it is cobbled together through APIs, you do it on one document, and you have to now find the first one in duplicates across the entire enterprise across all silos and then try to do the same kind of update. That is doomed for failure.
SM: What is your solution to that? How do you get data from all these different parts of an organization into one repository that performs at scale?
KL: That is exactly what we say is the long-term approach and we think it is the only approach that works – to have all of the data production systems keeping on producing transactional data. Once it is fixed, it is thrown into one single, unified repository so that all parts of the enterprise can see it and make use of it. That is the vision that is missing from the economies of the world.
SM: Your main differentiation is in the architectural approach of how you handle unstructured data?
KL: That is the first differentiation.
SM: What other ones do you bring to the table?
KL: The other differentiation is a bit more subtle, but it requires you to really think twice about what you are doing. The coherence of the data is critical and it addresses more the topic of what you are trying to do with the data. When we do a search across the enterprise for e-discovery, what are we trying to do?
Here are the three sins of the silos that most of the enterprises are facing today. The first one is the proliferation duplicate, because no one can do a single instance across silos – it is practically impossible. Number two is that you have inconsistent search, because each silo has its own search engine. Therefore the search depends on the vagary of the search in each particular application. For example, the search engine used in e-discovery, the search engine used in storage and the search engine used in compliance are all different. The same search command acting on the same data set will yield different results. That is absolutely inacceptable in e-discovery, for example. The search end of the silo is incompatible.
When you cobble together three applications, each one has a different capability of retaining. Some will do it based on chronology, some will do it based on the role of the employee, and some will do it based on the content within the document. They are all different. Therefore, the same document in different silos will have a different life. If you put these three together, which are the proliferation of duplicates, inconsistent search, and disjointed retention; you have lost control of your data. Therefore, the unified approach, where you have one data schema for all of the above, is an essential. It is not a choice. It is one version of the truth.
This segment is part 3 in the series : Thought Leaders in Big Data: Interview with Kon Leong, CEO of ZL Technologies
1 2 3 4 5 6