Ya Gotta Know the Territory: Data Analytics in Climate Modeling

Ya Gotta Know the Territory: Data Analytics in Climate Modeling

An understanding of terrain and geography is fundamental to accurate weather and climate models, and equally important is understanding the coding terrain: the assumptions and calculations used to arrive at a certain conclusion. By definition, the model is never going to be perfect – as George Box, the famed mathematician, once commented that, “All models are wrong, but some are useful.” By understanding the assumptions and simplifications that your model makes, you can anticipate where your predictions are going to be strained.

Refining Resolution without Exploding Expense

One of the main ways that simplification sneaks into climate models is through resolution – most of the current generation of climate models deals with grid cells of about 100 km, or about half the size of New Jersey. It might not sound like any kind of high resolution, but it means that there are about five million cells covering the earth. This is very useful for determining how climate changes for the earth as a whole, but far less useful if you are interested in comparing how New York and Trenton can expect to deal with hurricane risk.

It is of course possible to increase the resolution of that grid by using a high-resolution physics model. However, that can be very expensive. Going to grid cells of 50 square kilometers quadruples the number of cells that you need to run; going to 25 square kilometers raises the number of cells by a factor of 16 – and that raises the amount of time your model takes by a similar factor.

To address this limitation, many users move to statistical downscaling: finding a statistical relationship between the model and observational data of the place of interest. Statistical methods are much faster and cheaper to run than climate models (depending on the size of their datasets, many users can get away with running on non-HPC systems) and have an effective resolution that is only limited by the resolution of your observations. Statistical downscaling is also used as a post-processing step in climate model generation, in an attempt to address the biases of the model.

Data Analytics and Climate Modeling

Where Did My Model Go Wrong?

So why isn’t downscaling used more often by scientists? Part of the reason is that stepping away from explicit modeling of the physics behind weather opens up the possibility of unphysical behavior – for example, observing that the model gives temperatures 10 degrees C higher in Florida in the future does not necessarily mean that Miami can expect temperatures of over 50 C (122 F) in the future (the interior of Florida might get that hot, but it’s unlikely that the oceans will get that warm). And, the statistical methods that we use rely on finding relationships between observations of the past and modeling the past—relationships that may not hold up as climate changes in the future. Both of these are surmountable, but they mean that the nature of statistical models makes it harder to know when and why they are wrong. And since statistical downscaling is so very easy to run, many people using statistical downscaling methods don’t know that they should be on the lookout for physics-defying behavior in the first place.

Given the rapid proliferation of statistical downscaling (both in how often downscaling is used and the number of methods used), getting a better idea of where the biases in downscaling are located is very important. NOAA is working on evaluating downscaling methods to determine their strengths and weaknesses in climate prediction by comparing them against high-resolution physics predictions of the future. We have a much better idea of what methods are appropriate for certain kinds of conditions than we did at the start of the project. For instance, some methods well-suited for predicting coastal temperatures work poorly when predicting temperatures in the mountains. We are now expanding the roster of methods that we can run and compare.

Without knowing how we arrive at an insight, we quickly move from science into fiction.
The biggest issues that the project currently faces relate to issues of adaptability and extensibility. As we expand to work with a wider range of input data, we've had to re-examine some assumptions both about how data generated outside of NOAA is structured and about the kinds of operations that we want to perform upon our data. Understanding the specific use cases and the limitations of the technique is as important as the output and allows us to be very judicious as to how it’s applied and the confidence in the climate data being generated. Without knowing how we arrive at an insight, we quickly move from science into fiction.

Hire a Guide Who Knows the Territory

One of Engility’s main roles in this case is technical support expertise, working on development, documentation and maintenance for the downscaling infrastructure that NOAA uses. Engility’s experience with the current state of the code enables us to add new features to the workflow, and we can use our understanding of the assumptions made about the input data to head off a variety of issues with new datasets. We enable researchers to look behind the curtain and gain confidence in their conclusions—a crucial element in HPC statistical downscaling and data analytics in general. We help analysts know the territory.

Share this Post:

Posted by Carolyn Whitlock

I have been with Engility for over 3 years and possess a B.S. in Environmental Studies. I am currently pursuing an M.S. in Computer Science and have been serving on NOAA’s downscaling initiative during my entire career at Engility.