HOT TOPICS

Thought Leaders in Big Data: Interview with Jeremy Howard, President and Chief Scientist of Kaggle (Part 3)

Posted on Monday, Mar 11th 2013

Sramana Mitra: What about the data repositories on which these algorithms are being run? Is that still fitting inside corporate data warehouses, and R software plugs in to them?

Jeremy Howard: The hard thing generally is training the algorithms, not so much running them. Training the algorithms basically is oversimplified – it is coefficients in an algorithm. Once you figured out what they are, what you are left with is a very simple mathematical formula.

SM: How do you do the simulation? Do you do your experiments to see what is working and what is not? If you are trying to train the algorithm, it needs to run on large data sets to see if what you are trying to do is working or not. Is that correct?

JH: We would think of them as large data sets, but they fit easily onto a laptop computer. Training these machine learning modules does not in generally improve if you use large computers.

SM: So, the sample set that you need to work with in order to test its efficacy is not too large – whatever space you need is on the laptop.

JH: That is right. We are still talking millions of [pieces of information], which to start with you would consider large, but it doesn’t require server [space].

SM: And corporations are providing those sample data sets to the contestants?

JH: There are two types of competitions – private and public. In both cases the sponsoring organization is providing the data set necessary to train and test the models. In the case of public competitions, the data set is made available to everybody on the Internet to download. In the case of private competitions, a subset of 10 or 15 of the most successful Kaggle competitors will be invited to compete in secret. Then they will have to sign a non-disclosure agreement before given access to that private data.

SM: Let’s talk about a few different metrics. How many public competitions on average are you running these days?

JH: We generally try to run five or six competitions any one time. We find that to be the perfect number in terms of maximizing engagement and interest.

SM: Can you give examples of the types of public competitions you are running right now or have run recently?

JH: Right now our largest two competitions are running. Our largest one in terms of prize money is called the Heritage Health Prize. The prize money for that is $3 million for the winner. The data they provide is anonymized health records from Californian patients, which covers their claims history, their lab results, and their prescriptions. The goal of the competition is to predict which of these patients will become hospitalized. This is because it is estimated that $40 billion in America is wasted on unnecessary hospitalization. The idea here is to come up with an algorithm to try to identify patients who more urgently need a higher level of care, to try and keep people out of hospitals when different previous healthcare could have helped them out.

The next-largest competition in terms of prize money and the largest in number of people signed up for it is the GE Flight quest. That one is for GE and it is looking at trying to improve the ability to estimate which flights will be delayed. People have a complete picture of the U.S. Airspace over a two-month period, and they have to make predictions for every flight, when each one will land and when each one will arrive at the gate.

This segment is part 3 in the series : Thought Leaders in Big Data: Interview with Jeremy Howard, President and Chief Scientist of Kaggle
1 2 3 4 5 6

() Comments

Featured Videos

The Other 99% Can 1M/1M Help Me Raise Money? How Does 1M/1M Democratize Entrepreneurship Education? How Does 1M/1M Democratize Management Consulting? When Is The Right Time To Join 1M/1M? Can 1M/1M Help Me With Business Development? Can 1M/1M Help Me With Market Sizing? Can 1M/1M Help Me Validate My Product? Will I Have Private 1-on-1 Sessions In 1M/1M? How Does 1M/1M Help Entrepreneurs Connect With Silicon Valley? Mentoring or Consulting? Why Does 1M/1M Charge $1000 a Year? Why Does 1M/1M Partner With Local Organizations? Why Don\’t Mentoring Networks Work? Why Is It Important To Study With 1M/1M Now? Dan Stewart Story Vikrant Mathur Story

Sign Up for FREE
Company* :
First Name
Last Name* :
Email* :
Aweber Listname :
	Captcha validation failed. If you are not a robot then please try again.

One Million

by

One Million Blog

categories

HOT TOPICS

Thought Leaders in Big Data: Interview with Jeremy Howard, President and Chief Scientist of Kaggle (Part 3)

Featured Videos

Free Program

Premium Program

Premium Members Only

Partners