categories

HOT TOPICS

Building a Venture Scale AI Developer Tool Company: Moderne CEO Jonathan Schneider (Part 5)

Posted on Sunday, Apr 20th 2025

Sramana Mitra: So now, tell me more about the product and especially where is AI in this story?

Jonathan Schneider: We think of Moderne as large-scale automated code change. That could be for application modernization, security vulnerability repair, or code quality.

The initial use cases were the ones I’d been working on at Netflix and the ones we had heard from our Pivotal customers. They were framework migrations- Spring Boot one to two, Java eight to seventeen, those kinds of things. Over time, it became increasingly clear to us that that was one special case of a larger effort we call tech stack liquidity, which is just trying to move from A to B. That could be a version update or an end-of-life piece of software.

But it also could be, “I’m trying to kill off Oracle and move to PostgreSQL” or, “Broadcom just bought VMware and Tanzu just had a six X price increase. I’m trying to move from Tanzu to OpenShift now.” It’s all sorts of things. Anytime they’re trying to consolidate tech stacks, I bought something here in M&A and I’m trying to align it to the rest of my tech stack.

So, the solution is general purpose to anytime I have to make a horizontal change across potentially hundreds of millions of lines of code. That’s where we come in. This sort of large-scale change is what we’ve been doing.

AI comes around and obviously code change is one of the most compelling usages of GenAI in the first couple of years. However, the text to code is insufficient to make predictable or accurate change. The strongest use cases of AI for code are in authorship experiences, they’re in generating new code.

The reason is because those tools are able to connect to the IDE that already has a rich understanding of a piece of code structure. It’s able to collaborate with the IDE to get data and make decisions on a more accurate basis.

You’ve seen GitHub Copilot doing completions in the IDE. Then, you have Cursor and Windsurf. All these solutions are attacking code authorship because of the relative technical ease of doing so.

I would suppose that the more economic value is attached to that four billion lines of code that you don’t have open in the IDE right now. And you’re trying to move 3,000 applications from WebSphere to something else.

That’s where we sit. Even to this day, I think there’s kind of a bit of a bloodbath in code assistance and new code authorship. A ton of money is being thrown there.

We’re sitting somewhat alone in the space of, what about everything else? What about every other repository in the business unit you don’t have open in the IDE? It’s hard to compete there because they just don’t have the data to back those kinds of solutions. It’s been additive to us ultimately.

Sramana Mitra: Double click down and be a bit more technical. I’m a computer scientist from MIT. So, I will understand what you’re talking about. So go deep into how you do it.

What data are you training on and how are you managing to find the heuristics with which to do this?

Jonathan Schneider: The data that we need in order to work is what we call the lossless semantic tree. It’s kind of an invented phrase here, but it’s basically the abstract syntax tree of the code, but also everything the compiler infers about its symbols – all the transitive dependencies.

Compilation normally goes through a few phases – tokenization of the text, production of an abstract syntax tree, then a pass over that syntax tree to do symbol solving. Like where is this method defined? Where did it come from?

We capture all of that deeply. We operate on the representation. Then we have programs called recipes which make mutations on that tree and we materialize it back to source code.

When a compiler is tokenizing and producing AST, it’s shedding all the original formatting of the code, like the white space and comments and things that don’t matter anymore, because after all it’s marching towards producing some machine code or intermediate representation at the end where those things are not relevant.

We’ve to guide a compiler through the first few phases to get to the point where it’s solved all the symbols and then go back and bolt all the original white space and everything onto this representation. That’s a cyclic representation, unfortunately, because types can be self-referential as we have to cut cycles in this. This whole kind of messy tree is what we store.

When we go into an enterprise, we want to iterate over every repository that they have and produce an artifact, which is the lossless semantic tree.

The reason that’s so brutally difficult is because they have tens of thousands of repositories with different build tools, language versions, and configurations. We don’t want to rely on an enterprise manually configuring each repository to fold into the platform. So, we’ve developed hundreds of heuristics around looking at a repo, understanding what its build tool configuration and language version is, and everything inside, injecting ourselves into it, ripping all this data out as an artifact, and moving on to the next one.

So, when we go on site to a new customer, we can just ask them to list 10,000 repos. We start a process. We go to lunch. We come back. I’ve got all the data I need and can start working.

This segment is part 5 in the series : Building a Venture Scale AI Developer Tool Company: Moderne CEO Jonathan Schneider
1 2 3 4 5

Hacker News
() Comments

Featured Videos