Let’s be honest, nothing happens in isolation…a butterfly flaps its wings in South America, and I get a rain storm in Mississippi. Someone runs an aerosol-physics model in Ohio and my weather model (which was sharing resources with it) slows down. But maybe the wave-height model coupled with the atmospheric model improves upon the forecasted track of the next hurricane? That’s why scientists are constantly trying to combine models - to get a better read on something complex, like the weather.
They couple models because the underlying physics is coupled together in nature. Weather affects climate over the long term; the weather and ocean affect each other at the ocean surface; “wave” models also work at the surface and need to be coupled to weather and ocean models; and space/solar “weather” models interact with the Earth’s weather at the very top of the atmosphere. We have driven the physics and resolution of each of these individual models nearly as far as they can go to be as independently accurate as possible, thus the overall prediction can only get more accurate once you couple them so they can behave more like Earth’s true system.
And that takes a lot of high performance computing (HPC) power and lots of data input/output (I/O).
Why Do We Care About I/O?Let’s say you have many millions of data points (imagine all the readings from myriad ocean buoys) flowing into an HPC system. The computer crunches that information (input) and spits out trends and patterns (output). I/O refers to the hardware and software used to assess data during a simulation and how that data is then analyzed and retained. I might have an analysis that eats up thousands of gigabytes. Is it quicker to send that analysis to a colleague in another lab or have them re-run the original data? And that’s just one aspect of the I/O balancing act. We need to figure out how to best use our limited computing resources for the maximum scientific insight.
I/O and Weather ForecastsSo let’s see some I/O in action. I will refer back to weather forecasts. “Good” I/O (I/O that performs well) can easily halve the time of a model run. The longest pole in the tent is often “bad” I/O, and it's very common. So we typically fix the I/O first, which could quickly improve a model’s runtime by a factor of 5-10x. After that, there are typically some computational things to improve. Then, good I/O remains, but it’s still slow compared to computation, so then we look at asynchronous I/O.
Enter Asynchronous I/OAsynchronous I/O is just doing I/O at the same time that you do computation. I/O is the slowest part of the system, so if you can hide it behind computation, you can realize great speedups. It's just tricky to orchestrate all of the computation and I/O that's happening at the same time to make sure they don't step on each other. It's like the difference between a separate washer and dryer versus a combo unit. The separate washer and dryer will be (almost) twice as fast when doing serial loads, but someone has to deal with moving the clothes between them.
In my case, that someone is me and my fellow computational scientists working at Engility. You can read more about how we are applying these methods to DoD mission challenges, in conjunction with the DoD’s HPC Modernization Program, in Matt Turner’s upcoming SC18 paper. We’ll share a link when it goes live.