categories

HOT TOPICS

Thought Leaders in Artificial Intelligence: David Talby, CTO of John Snow Labs (Part 3)

Posted on Wednesday, Mar 17th 2021

Sramana Mitra: If I understand this correctly, you have the NLP engine, which is fairly horizontal; and you are applying various domain-specific heuristics and workflows on top of that to create solutions for different use cases in different industry segments.

Although they are all data science users, the workflow is different and the domain is different. The oncology knowledge is different from the clinical trial identification.  

David Talby: Here is how we do it. First of all, we have Spark NLP, which is an open-source project. It is an Apache license and it’s completely free. We have over 1,000 models that are published on top of it. Our goal here is to provide state-of-the-art NLP to the open-source community as inclusively as possible worldwide. 

Sramana Mitra: Is that purely horizontal or does that also have healthcare-related nuances?

David Talby: It’s purely horizontal. That, by itself, had a 9x growth in downloads in 2020. By several industry reports, it’s the most widely used NLP library in the enterprise outside of research.

A lot of the focus of the library has been on products by taking the latest papers that come out and providing production-grade implementations that are scalable. It’s called Spark NLP. Whenever you get to a domain-specific use case, you do need to train or tune custom models for a problem. 

Sramana Mitra: How many users do you have for the pure horizontal open-source library?

David Talby: That is a good question. Nobody knows for sure, but we closed 3 million downloads about a month ago. Now, we have 500,000 downloads per month. 

Sramana Mitra: There is a large community of open source developers who are using the NLP library?

David Talby: Definitely, yes. We support it and we keep increasing it. In the recent release, we’ve added question and answer, translation, and summarization. We are also doing a lot of work supporting global languages.

Historically in NLP, there is a lot of work in English and Mandarin, but now we are proud to say that we have state-of-the-art pre-trained models for Bengali, Amharic, Farsi, Russian, and Portuguese. Right now, we support over 200 languages out of the box. Just to make AI more inclusive, a wonderful goal is to make sure that state-of-the-art deep learning transfer NLP libraries are available as open-source for all the major spoken and written languages of the world. 

Sramana Mitra: This is a sizable effort to support this kind of open-source initiative.

David Talby: It is.

Sramana Mitra: How many people are working on this? 

David Talby: We have about 20 and then we have the external community.

Sramana Mitra: How big is the external community contributing to advancing the technology? 

David Talby: I’m not sure. It’s quite a few people. The question becomes what you count as helping. You help if you contribute to your model, but you also help if you fix the documentation or if you provide an example. You also help if you go on the select channel and help other people on the board and answer questions. You can just be a solid community member. All are welcomed and all help. 

This segment is part 3 in the series : Thought Leaders in Artificial Intelligence: David Talby, CTO of John Snow Labs
1 2 3 4 5 6 7

Hacker News
() Comments

Featured Videos