categories

HOT TOPICS

Thought Leaders in Artificial Intelligence: David Talby, CTO of John Snow Labs (Part 4)

Posted on Thursday, Mar 18th 2021

Sramana Mitra: Talk about the healthcare and life science domain-specific products. Is that all internal or is some of that also open source? 

David Talby: That’s our product. That’s a licensed product. We have two licensed products: Spark NLP for healthcare and Spark OCR. Spark NLP for healthcare is an extension of the open-source library but it uses a separate code base and a separate set of models.

Clinical and biomedical NLP are two areas where the problems are fundamentally different from general NLP. Even in the academic space, there are separate conferences and workshops because the problem is different. 

Sramana Mitra: What is the difference?

David Talby: Medical language has its own vocabulary. It’s not English. That’s the first mistake that people make. The vocabulary is several times the size of the English vocabulary. It has its own grammar and very often when you read each other’s medical records, none of those sentences are valid in English.

In general, if you don’t have a medical background, you cannot read them. This is why people go to medical school. It’s specialty-specific. If you are a radiologist, you are just not going to understand your dental X-ray, rheumatology results, or your post-op inspections.

On top of that, there are a few things that we only see in medicine. There are micro languages and it’s interesting academically. Here is what happens. You have a hospital and you have internal medicine A on the sixth floor and internal medicine B on the seventh floor.

What you see is that they develop different languages in which they speak and write. You have seven doctors and twenty nurses for example and they’ve worked there for 10 years. Naturally, they develop their own language. What happens is that what people write and don’t write is different because they have specific assumptions already.

One of the unique challenges other than the fact that the whole language is different is you need to learn those micro-languages. Another challenging thing in healthcare is that very often you are working on small data. There aren’t billions of cancer patients and even if you have access to all of Mayo clinics, when you get to a specific type of cancer at a specific stage, you don’t have a lot of data.

For example, if you are talking about stage four cancer, you are maybe talking about a thousand people. It’s not big data. It’s very specific and specialized. That’s another thing that you see in healthcare. Another example is if you go to your dentist. Half of what they do is the simple stuff like cleaning or the occasional root canal.

If you look at the other half, there are probably 200 to 300 procedures that they do. Some of those procedures, they don’t even do every year. This thing happens everywhere in healthcare. The thing is, you cannot ignore the patient because of the fact that they are only 0.1% of the population. You cannot drop them.

You cannot do that with healthcare. Not only is there a lot of variety, but you are also dealing with small data in high dimensionality. It’s also hard to get each one right. There are hundreds of variables and they are all connected.   

This segment is part 4 in the series : Thought Leaders in Artificial Intelligence: David Talby, CTO of John Snow Labs
1 2 3 4 5 6 7

Hacker News
() Comments

Featured Videos