Sramana Mitra: How easy is it to access the sanctioned list in the structured category?
Charlie Delingpole: That is easy because the US Treasury makes it publicly available. The real challenge is name matching. In Latin alphabets, it’s easy to name match because you can do simple heuristics.
If my name has 20 characters in it, and if there is another name in which three characters are different, then I can match on that. It’s different if it’s in Chinese. You only have three characters, but you have 50,000 permutations rather than 26.
You need a strategy in doing name matching in Arabic and Cyrillic. You need to be able to transliterate between Vladimir Puten and Putin and resolve that this is the same person. We have 270 people right now in our company. We’ve raised $100 million and spent roughly six years building this company.
We think that we are still at the beginning of what we are building because what we are fundamentally trying to do is to constantly expand the depth, breadth, and accuracy of the risk information graph that we are building.
We are constantly adding new sources and connections. If you miss a single connection or fact, then that would completely alter your perspective of the counterparty. It has to be as close to perfect as possible because that could be life or death.
Sramana Mitra: In terms of the AI that you are using to achieve these outcomes, natural language processing is one of them. What other AI techniques are you applying to achieve your goal?
Charlie Delingpole: We build a vast data pipeline. AI can be used in different processes as part of that step. This can be from the data extraction, ingestion, entity resolution, and multi-level classification. The huge expense for us is training data in terms of classifying different languages. We have to build stemming libraries for each language.
We have to do multiple different entity resolution algorithms. It’s not AI but there is a lot of work around things like the underlying ontology and knowledge graph because you have to structure the data correctly to know which field is the same as the map.
There’s lots of AI, but also there a lot of technology around the scalability of data ingestion. There are lots of streaming data sets and cloud data sets. In terms of the weapons in our arsenal, I would say that AI is the most powerful one.
Sramana Mitra: What data structure do you use? It sounds like it is a data structure problem. If you can organize the data in a structure that is scalable, you can then feed it and run AI algorithms on it. What is the data structure strategy?
Charlie Delingpole: In my house, I have seven different Alexas. We have some people that have been hired from the Alexa team who are specialists in ontology. The big part of it is defining your ontology and mapping out the fields. You are taking all these disparate data sets and you want to ensure that each field is defined accurately. You know the data lakes in terms of having data exposed so that you can then find a connection.
That is an obvious win for most companies. Similarly, you want to be able to turn the internet into a data lake where you can find connections. You can then find hidden linkage patterns. We can then deepen, broaden, and extend our own proprietary risk information graph.
This segment is part 3 in the series : Thought Leaders in Artificial Intelligence: Charlie Delingpole, CEO of ComplyAdvantage
1 2 3 4 5 6