Divyabh Mishra: We do about 10 million API calls per day for one of our customers. For many others, it’s not that huge, but it is still pretty large. We also work with a lot of distributors. We have a lot of B2B clients like electrical or industrial distributors. They have a huge catalog.
For anyone that has a huge catalog and cannot structure this content by hand, there is no other alternative. It’s expensive to do it manually. Our approach of 80% to 85% automation and the rest manual is driving the growth. Those are our current use cases.
Every client that you interact with has their own taxonomy or their own way of defining how to structure the information. Usually, you customize these algorithms for each client to make it effective. I have a precision of 90% or higher. For every client, we are able to quickly come up with hundreds of algorithms that would do the extraction. This is way cost-effective than someone else that has more of a traditional team. As a result, we are succeeding where others are still struggling.
Sramana Mitra: The e-commerce catalog tagging is the primary use case. I would like to double-click down on the technology side of this use case, but I also want to understand what you have done with that community that you started with. What is the current situation with the community? What is the strategy of following with that community? That reminds me of Kaggle.
Divyabh Mishra: If you want to structure the catalog for Walmart, you need to build thousands of algorithms each due to each attribute of the product in the catalog. Every product has unique attributes like color, size, and material.
Each of these requires an algorithm for you to search through previous documents and images and extract the values for each of those attributes. It’s extremely expensive if you hired a normal team of data scientists to build these algorithms and then deploy them.
Walmart tried to do so. They used to take 16 to 18 weeks to build one model when they needed 10,000 of these models. That is where the community came in. We were able to parallel process all of these algorithms. The community is still there. It’s like an R&D outfit for us where we use them to build algorithms and then the internal team deploys them. That is how the community is used.
How is it different from Kaggle? Kaggle is part of a more self-served platform. They expected people to come on. They exposed the platform to the client. We don’t do that. They wanted the client to come on and host their own contests, create their own data, and anonymize it.
That didn’t work as well because there are so many hurdles into creating a contest that ends up getting you useful models. Most people started using Kaggle as a hackathon platform – to test and find data scientists to hire. This is why they had to do a quick sell to Google as opposed to us. We are still valued at $100 million.
We had a recent partial activation through Macnica, a Japanese company. The difference is that the community is used as an efficient R&D outfit. I don’t sell for access to the community. I sell to access the algorithms that the community builds on a subscription basis. All of the relationships have a subscription to customize algorithms for each client. They are hosted as an API on the Google cloud. It is the API access that they subscribe to.
This segment is part 2 in the series : Thought Leaders in Artificial Intelligence: Divyabh Mishra, CEO of CrowdANALYTIX
1 2 3 4 5