categories

HOT TOPICS

Scaling a Fat Startup: MongoDB CEO Max Schireson (Part 3)

Posted on Friday, May 16th 2014

Sramana: Tell me a bit more about the founding of MarkLogic. What was going on and how did you get hooked up with those people?

Max Schireson: The founders were Paul Pederson and Chris Lindblad. They had been search engine engineers and they realized that they could apply some of the technologies of search engines to databases to help manage semi-structured and unstructured information. That is what they were doing and I have been working on applications in that space. I got introduced to them through a friend of a friend. Not long before that conversation, my long-time boss at Oracle had left and I was starting to think that it was time to do something new.

I started talking to them and we hit it off. At one point in time, I thought I had wanted to be a math professor. Paul Pederson had been a math professor at Cornell and a computer science professor at UCLA. We connected well, so I decided to try it.

Sramana: Tell me a bit more about the actual computer science underneath. You mentioned this has search engine DNA. Our audience is highly technical and they can grasp core computer science concepts.

Max Schireson: A search engine is an inverted file index where you have a bunch of posting lists which describe, for any given word, where that word occurs in the location of various documents. Sometimes they would be word-pairs. The words “blue car” could appear as a phrase, so you could work out where longer phrases occur by knowing if “the blue car” and “blue car” are similar phrases, that when “the blue car” appears then “blue car” also appears. That is not necessarily true 100% of the time. It could be “the blue motorcycle and the red car”, but at least it is worth checking out.

What the founders of MarkLogic realized is that with XML documents, you can take a half expression like phrases and put slashes in the spaces. If you have an XML document and you have a path where you are looking for an ‘a’ inside of a ‘b’ inside of a ‘c’, then you could index the paths as though they were phrases. You could index an ‘ab’, and ‘bc’ and if there was an ‘ab’ and a ‘bc’, then you could verify if they aligned to make an ‘abc’ or a ‘dbc’. That was the basic insight that they had. The same way you validate searches for phrases could also validate searches for paths in an XML document. The same indexing technology could handle both cases.

You could also extend that to whatever combination of path expression and textual phrase you wanted to. This would be a powerful way to query documents, not just for content but structure and where in the structure the content occurred. I had been working on proposal management and knowledge management and felt that this type of technology would be very helpful in those use cases.

This segment is part 3 in the series : Scaling a Fat Startup: MongoDB CEO Max Schireson
1 2 3 4 5 6 7

Hacker News
() Comments

Featured Videos