categories

HOT TOPICS

Bootstrapping First, then Raising Money to Build a $10M+ Generative AI Startup: Anthony Scodary, Co-Founder of Gridspace (Part 7)

Posted on Wednesday, May 22nd 2024

Sramana Mitra: I have one last question on the engineering stack. To what extent were you able to leverage existing models and components that are out there? To what extent did you have to kind of do things from scratch?

Anthony Scodary: That’s a good question. In general, we try to build everything in Gridspace from scratch, but I’ll try to answer your question in as much detail as I can.

Initially, before we had our speech recognition model, we had tried using one or two from SRI, essentially models that Siri uses. For a while we’re using the open-source platform Caldy. Eventually, as we needed to push the envelope more in terms of neural networks, we had to use pure machine learning frameworks, like TensorFlow and PyTorch. Now, all of our stuff is built in jacks and we just build the models from scratch.

Today, where we’re a larger organization and our models tend to be a little bit more bleeding edge than you could find publicly online anyway. you’re not going to find like a call center or speech synthesis model like Hugging Face, right?

If there’s a paper or an idea or a way that we want to improve an aspect of the system, we might re-implement several of those papers from scratch, do a trade study, and then pick what direction we want to do. For instance, right now we’re trying to further improve the latency of Grace. When you talk to most voice systems today, especially if people are cobbling together a speech recognition API, GPT-4, or a speech synthesis API, you have tons of latency. It doesn’t feel like a person’s there at all. If you want something that’s very responsive, within sub sub 500 milliseconds, we’ve tried to build everything monolithically.

Once you leave the yellow brick road of the standard problem that’s being researched in speech recognition, which is just generally word error rate on standard datasets, there’s not that much work on it. So, we have to either look in other fields like signal processing, or we just have to come up with our own ideas. Then we’ll do a trade study where we’ll try all of them. That’s where open-source models or public models can sometimes be very useful because there might be something online that is good to validate an idea, even if it’s not exactly what we’re intending to build. Generally, when we’re getting to the point where we’re trying to build something for a trade study or to get it built into the product, we’ll reimplement it typically in some sort of a scientific computing platform from scratch.

Sramana Mitra: It’s actually not a bad stage where you have to do that, right? You’re $10M plus in revenue. You’re established to a large degree. You have large customers. You have the ability to raise money. So it’s not so much such a big problem.

In your first 18 months when you were trying to get to some validation and so on, were you able to leverage more of existing components?  

Anthony Scodary: A little bit. In our early days, we’d first use a speech recognition model from SRI. That was really the main one. We otherwise have mostly just used the machine learning platforms. In the early days of deep learning, the first real deep learning framework was Fiano, which came out of Canada.

We were using Fiano and then later on François Chollet built out the Keras platform, which was on top of Fiano and then later TensorFlow. He’s at Google now. We really benefited from those open-source libraries, as did almost everyone working in deep learning, but what we’re doing is so specialized that, we couldn’t go very long without having to build our own models.

I really think machine learning is just another way of building software. For many years, you just had to write out procedures and sequences of steps. Machine learning replaces that with two other interfaces, right? There’s one is your loss function, your mathematical definition of what you’re trying to optimize. And the other is your data, not just the data itself, but how it was labeled. We collect and label a lot of our own data ourselves. We run our own crowdsourcing platform called Gridspace Mixer, where people do simulated call center calls all day and label that data on the platform. And we’ve been doing this for a long time.

Let me tell you a quick story. A couple of years ago, a friend of mine who runs AILA asked me at the last minute to help with a environmental machine learning hackathon. I don’t really like hackathons very much, but I agreed to do it to be nice. Out there, they had all decided they wanted to build a convent that could identify trash as to whether it could be recycled and whether it was a bottle or a can. I don’t know if that’s useful. They were just a couple of engineers who’d done a little bit of machine learning, but they’re searching for public data set of trash. They’re going on Google image search, looking for pictures of bottles.

I said, “If you want to build this model, the model is very simple. It’s just a convent. You can build it in ten minutes. The data is everything. That is the main lever you have when you build this.”

They said, “How are we going to get data of tens of thousands of images of bottles and cans.”

I walked them two blocks away to a recycle center, which is near where this was happening. I said, “This is already sorted by category. Each of you get out your iPhones and take a video of bottles from as many angles and light sources as possible. We’re going to then use FFM peg to extract all the frames.” We got back and the model trained great because all they needed was data.

Sramana Mitra: The model’s already there and you just used the data. This is where I am after thinking about this a fair amount.

Where you are is that you can do a lot with existing components. If you’re trying to do a pre-seed or seed stage AI company, get it to a level where you have a problem and you can figure out product market fit. Then if you want to go to the next level like you have gone to the next level, you can build as much modeling as you want once you have product-market fit.

Well, I loved your story and we look forward to covering it.

This segment is part 7 in the series : Bootstrapping First, then Raising Money to Build a $10M+ Generative AI Startup: Anthony Scodary, Co-Founder of Gridspace
1 2 3 4 5 6 7

Hacker News
() Comments

Featured Videos