categories

HOT TOPICS

Thought Leaders in Artificial Intelligence: Francisco Webber, CEO of Cortical.io (Part 2)

Posted on Tuesday, Jun 22nd 2021

Sramana Mitra: Let me interrupt you for a second and try to understand the technology better. I completely understand the statistical approach versus more of the semantic modeling approach. How do you start? Let’s say you get a dataset. How does this begin?

Francisco Webber: One big problem that we have in statistical representation is that we lack something like semantic grounding. There are no fundamental ground truths to which we can tie the features and properties of what we want to represent. In our approach, we model what the brain does by the continuous saving of all the experiences.

Since your moment of birth, every experience you have has been stored in the neocortex through an associated method where you end up with a large pattern with the weight set to your model of reality. We were able to synthetically produce that semantics space by ingesting reference information. Contrary to the statistical method that tries to train a model based on the actual data you want to work with, we train the model based on same human things. We take textbooks, encyclopedias, and reference works. They get compiled into the language level ground truths for a specific use case.

For example, you want to create a system that understands medical prescriptions. We don’t train on medical prescriptions, but we train on textbooks and medicine. That is the first step. Once we have that world knowledge in the system and have it become the standard for all the representations, for every word, sentence, or paragraph, we generate a binary representation that corresponds to the distribution of topics according to our grand truth map. That happens automatically. The arrangement of that map is the only machine learning part at that level.

For example, if you render the fingerprints for the word sports car and the word Ferrari, you will see that 60% of their features overlap. They have a big overlap indicating that these two words are similar at least in the context where they overlap. This is done by a simple overlap measure which is the fastest thing that you could possibly do on a microprocessor.

As you can imagine, that is very fast. I can do half a million of those comparisons per second on my laptop. It is an efficient way to do at least medium-level depth semantics on texts. Initially, when you go into machine learning especially with texts, it is all about accuracy, precision, recall, and those measures. Early on, we said that we didn’t want to engage in academic benchmarking and comparing with different approaches.

We were set up to be a company and we said, “We need to prove the meaningfulness of our technology with actual use cases. We have to go up there and try to fix someone’s problems with our technology.” We were lucky enough to have quite some interest from a number of early adopters. That was in the year 2016. A very large New York bank came to us and said, “We have this number of use cases and we have tried all of the big players in the sector and none of them could provide us with a good way of doing this.”

Good work is when you deal with large-scale texts or work with little human intervention because that is what takes time and costs money. For example, a use case was to ingest some 100,000 credit agreements and find out which of them are not up to date with the compliance requirements. If you have no machine, you have to get a couple of hundred attorneys and have them read through and tag sectors of certain documents where there might be an issue.

The goal was to generate a system that can be easily trained in doing this and then be applied with a small footprint to the collection of 100,000 documents. That was what we did. We took advantage of our semantic representation. Our semantic representation has a much higher semantic payload in it than a simple statistical sampling. It turns out that if we trained with the fingerprint representation, we need orders of magnitude and less training data to do so. 

This segment is part 2 in the series : Thought Leaders in Artificial Intelligence: Francisco Webber, CEO of Cortical.io
1 2 3 4 5 6 7

Hacker News
() Comments

Featured Videos