Sramana Mitra: What is your business model regarding these data scientists who are building these algorithms for you?
Divyabh Mishra: Let’s say the community did not exist and I had subcontracted the work to someone else. They get paid for the work they do. The data scientists are getting paid for building the models. Once we pay them, that model becomes ours. We then deploy and maintain those algorithms and we charge the client for access to those models.
Sramana Mitra: How many data scientists are part of this community?
Divyabh Mishra: The community has about 25,000 data scientists across 50 plus countries. In the community, we define active as those who contribute at least once a month to a competition. That would be more like 30% of the community.
It’s like a funnel. Many are students. They want to come in and participate and work on real-life problems. Eventually, some of them start contributing as well. We keep them engaged and at some point, they will be of value to us.
Sramana Mitra: What do these people need to know by way of related technology? Do they need to know programming languages, systems, and data lake technology? What do you expect them to know?
Divyabh Mishra: Most of the coding happens in Python now. It started in multiple languages, but it’s all Python-based. The competitions are of three types. Kaggle failed because they failed to realize that you need to do a lot to convert a business problem into a data challenge.
Like you are alluding to, they can’t know everything, so we need to figure what their core skill is and present the problem in that format. There are three types of contests that we have. It can be turned into a forecasting problem where we are predicting a time series. It can be turned into a predictive modeling problem where it’s like predicting one value.
Probability of something happening is an example. Credit risks and scores are an example of that. The third is deep learning algorithms where the data could be in the form of audio, video, or images where you are looking for certain patterns. We break the problem down most of the time. 99% percent of the problems can be converted into one of these three.
Once we have done that, they don’t get access to data lakes, the structuring of the data, and the preparation of the data so that these contests can happen. Usually, models are built on samples of data, so even if there are terabytes of data that the clients have, we would extract the samples.
Sometimes the client does not want to expose all of the data, so we use simulated data to get the models from the competitors and then we optimize them internally. We have about 70 people in the house as well who are trying to lead those.
Sramana Mitra: These 70 people are in Texas?
Divyabh Mishra: They are in Bangalore and Texas.
This segment is part 3 in the series : Thought Leaders in Artificial Intelligence: Divyabh Mishra, CEO of CrowdANALYTIX
1 2 3 4 5