Atanas Kiryakov: If you want to solve this problem of massive data with any relational database, you’d probably need to hire several hundred people. Currently, we have just two employees working to maintain this data warehouse in our offices. Even at this scale, the ratio between the effort that we need to do and the effort that you need with mainstream technologies is like 1:100. It results in massive savings. This was the biggest driver for pharmaceutical companies to be early adopters of this technology many years ago. They started using it actively seven to eight years ago.
They also use the other part of the semantic technology and that is the text mining. You probably have an idea that pharmaceutical companies are, by regulation, obliged to answer queries of FDA related to potential risks of their drugs. They get quite a substantial number of these types of inquiries. They are obliged to do their best to answer the queries and to make sure that they have traced all pieces of relevant information that they have in-house. This is extensive work. What we did for them is develop a technology that starts off with a data warehouse of public databases like different chemical compounds and different drugs. For each specific client, we integrate another ten or five databases that are proprietary to these enterprises. We integrate them with the public ones to end up with even bigger and more comprehensive data sets. We use these data sets as a big body of knowledge that we utilize when we analyze the text.
At the very first phase of text analysis, you always try to make sure that you can recognize in text all the concepts that you already know in your knowledge base. We have a sort of linguistic heuristics that are used to recognize unseen concepts or variations. But at the very least, you always have to start with the simple task of recognizing all the objects that you already know. You start with this massive database integrated from public and private sources. On top of this, we analyze text so that we can recognize various concepts within the text and annotate the text with links back to the database.
Whenever we see aspartame or asthma in clinical trials reports for specific drugs, we don’t annotate this by just saying, “This is a chemical compound or that’s a disease.” We annotate it with an identifier, with a fully qualified link back to the database. We link the text with a very specific object of concept in the database. We call this semantic annotation. We analyze all sorts of documents they have in the pharmaceutical companies that are relevant to drug development and clinical trials. Then we link it to this massive body of data.
The final result looks simple. You just do semantic search that allows you to search for concepts. You can also make sure that you can search for clinical trial reports in which there is relevant information such as drugs that shouldn’t be used for patients with asthma. A simple search interface lies on a very complex and precise analysis of the text.
Sramana Mitra: Who are the users of this?
Atanas Kiryakov: The biggest user of this is Astra Zeneca.
Sramana Mitra: Who within Astra Zeneca uses this and for what purpose?
Atanas Kiryakov: The division that is tasked to answer regulatory enquiries.
This segment is part 3 in the series : Thought Leaders in Big Data: Atanas Kiryakov, CEO of OntoText
1 2 3 4 5