Sramana Mitra: Let’s get down to the specifics of what you are selling to these energy traders.
John Plavan: We sell subscriptions to a software as a service web-delivered product that enables our clients to identify extreme risks for extreme heat or extreme cold events. They use that information as an input in their energy trading strategies. If the current state-of-the-art forecast for a temperature event is valid a week to ten days out, using traditional weather models and simulating the atmosphere, there is too much chaos in the system. It tends to break down after about a week to ten days. They just don’t get accurate forecasts.
On the other hand, using big data methods to compare one existing weather pattern we have quantified all over the globe to a precursor to an event that might happen 30 days later has statistical validity. Then energy traders can say, “Well, the rest of the market only knows what is going to happen seven to ten days later. With the EarthRisk data, I have insight into what the probability is that an extreme heat or extreme cold event is going to happen 30 days from now. The energy market is mispriced as a function of the actual risk of a weather event. So if I can identify a mispriced market using this software, I can make a profitable energy trade.”
SM: I see. Talk a little bit about the architecture of what happens in your system.
JP: There are two main steps. All the research is done and patterns are defined and quantified in terms of what center deviation strengths they need to have occur to be a significant statistical precursor to an event. Once that research is done and those algorithms are written, every day we bring in data from weather observations from all over the globe. We upload that into the servers, and the servers use that data to upgrade pattern definitions and ask, “Do the observations we are seeing now match any of the pattern definitions that we have as statistically significant precursors? If so, to what degree?” Based on that combination of patterns, the software outputs a set of probabilities at various lead times and in various regions for what the risks of an extreme cold or an extreme hot event are. Just to recap: There is the research phase that identifies all the relationships, and on a daily basis we try to match the observations with those relationships in order to output a predictive occurrence.
SM: Where does the data come from? You say you accumulate data from all over the world. What kinds of sources are generating this data?
JP: Those are government data sources. Primarily the research and the bedrock of the data is a U.S. government data source called NCEP NCAR Reanalysis Data Set. There are a variety of acronyms there, but it is a free data set available from the U.S. government that takes observations all over the globe and then runs analysis routines on them to try to clean the data. If there are any obvious erroneous data readings based on all observation sites near one another, they can clean that data set.
It is called a reanalysis data set. It takes a couple of days for them to do that, so the data we receive from them is very stable back to 1948, but it is a couple of days old. In the initial periods of wanting to know exactly what the observations are today to do the correlations – we do the research on the reanalysis data set – we bring in forecast data from both the U.S. government and the UK. They have a forecast data set for a very short term of what the observational variables are going to be. We upload that data set. Some of it is free; some of it is not free. But they are very good government sources of data. They are really good. The observational networks that were put in place around World War II have led to some good observational data sets back to around 1948, so we use those.
This segment is part 2 in the series : Thought Leaders in Big Data: Interview with John Plavan, CEO of EarthRisk
1 2 3 4 5