Computers excite me. Archimedes said, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.” I say, give me a computer with the right code, and I’ll do the same. This summer I was given that opportunity, although I might have modeled the world instead of moving it.
Last year I worked on simulation data analysis using a cluster for the IceCube Neutrino Observatory collaboration. I became fascinated with supercomputing, and an opportunity to work with a supercomputer again this year presented itself. I leapt at the opportunity to work with Engility’s team at NOAA’s Geophysical Fluid Dynamics Laboratory (GFDL).
My SummerThis summer, I have been using supercomputers to analyze climate change model data. I write code that lends itself to parallelization, or using lots of processors at the same time. Using graphics processing units, a special type of processor, I then speed up those calculations. More specifically, I have been performing principal component analysis over both the time dimension of a single climate simulation run and across the ensemble dimension of a set of runs with slightly different initial conditions. I also wrote a hurricane tracker in order to compare different runs with each other using storm metrics like storm count, wind speed, pressure, and track.
Classroom Theory Meets Intership PracticeMy classwork directly prepared me for this job. In physics classes, I learned python and useful libraries for scientific data analysis like numpy, scipy, and matplotlib that I have used almost daily here. My probability and statistics class gave me the understanding for the statistical tests that I am performing on these datasets. I am using algorithms for matrix manipulation and decomposition that I was introduced to in linear algebra. My computer science classes taught me how to navigate a Posix environment, C++, awk, bash, make, git, which I use often now.
I also learned lessons of the cautionary tale variety. As a software engineer, there are some software engineering problems that are not fun. I spent three days in GDB, a debugger program, trying to find why my program was crashing with an “illegal instruction,” only to realize that the library I was using was compiled with an instruction set that is incompatible with the processors on the cluster. Sometimes libraries are very poorly documented, which makes performing simple functions cumbersome. I often must try random things until it works when this is the case. This try-stuff-until-it-works lesson applies outside of supercomputers, too.
Data AddictionI am approaching the next term reinvigorated. I have been able to use what I learned in class to solve real-world problems, and that makes me eager to learn more. The most surprising thing for me has been the amount of data that people produce and store and work with in the file system. People easily rack up petabytes of it in their directories, which is more than 10 years’ worth of high-definition video. I can rack up a couple hundred terabytes in a week, if I wished. The size of these datasets is ridiculous and not easy to wrap your head around. Each model run has tens of variables to keep track of with time steps that can be small, like 5 minutes in a model that is running for 100 simulated years. Each variable is defined on a fine grid across the globe (five million points) and with several vertical bins as well. It makes sense that you need some of the fastest computers in the world to produce and make sense of it all.
In a few weeks, I will be back in class, storing up knowledge for the next time I get to “move the world.” I am thankful for Engility giving me the opportunity to serve NOAA at GFDL.
Interested in learning more about career opportunities at Engility? Visit www.engility.com/careers.
Top image: Example climate model output from GFDL: Source NOAA