DOESciDAC ReviewOffice of Science
COMPUTATIONAL COSMOLOGY
Mapping the UNIVERSE to the Beginning of TIME
The cosmic microwave background, radiation left over from the Big Bang, provides scientists with clues about the early Universe and the parameters of cosmology in general. The 2006 Nobel Prize in Physics was awarded for experiments resulting in data that support the Big Bang model. Further work in this field promises to yield more discoveries about the fundamental nature of our Universe.

Understanding Our Universe
One of the biggest questions—both literally and figuratively--since the dawn of humankind is: What is our place in the Universe? This desire to understand our place has driven scientific discovery throughout history, from the ancients Greeks and Chinese to the Aztecs and Mayans, to Copernicus and Galileo, to Sputnik and the Hubble Space Telescope. In the past century, this quest has become more focused, leading to a standard model of cosmology in which the Universe comes into existence with a Big Bang, which creates space and starts time. Detailed measurements of this cosmos suggest that the matter we can see—planets, stars, and galaxies—make up less than 5% of the Universe. Up to one quarter of the Universe is now thought to be the invisible dark matter. The remainder is posited to be dark energy, the force driving the expansion of the Universe.
The fundamental questions in modern cosmology concern the nature of the dark Universe (dark matter, and especially dark energy). What is it? How has it evolved? What does it tell us about our future? Through increasingly precise observations and experiments, we can find evidence with which to address these questions. One of the keys to understanding our Universe is the relic radiation from the Big Bang, known as the cosmic microwave background (CMB; sidebar "The Cosmic Microwave Background," p16). Last scattered some 400,000 years after the Big Bang (about 14 billion years ago), the CMB provides the earliest possible image of the Universe, and encoded within this image are signatures of the fundamental parameters of cosmology. Since the CMB was first detected in 1965, a number of ground-based, balloon-, and satellite-borne experiments using increasingly precise instrumentation have been conducted to measure and characterize the CMB. Such experiments scan the sky with incredibly sensitive thermometers tuned to microwave frequencies for as long as possible (figure 1).
One of the keys to understanding our Universe is the relic radiation from the Big Bang, known as the cosmic microwave background.
Figure 1. Planck's scan of the sky. The routine observations of Planck are planned to last 15 months, allowing two complete surveys of the sky.

First Evidence of Anisotropies
The first satellite-borne CMB experiment was COBE, the Cosmic Background Explorer satellite launched by NASA in 1989. This was the first experiment to detect and then quantify the large-scale anisotropies (or temperature variations) in the CMB. These variations were consistent with predictions of the Big Bang model and resulted in the 2006 Nobel Prize in Physics (sidebar "A Nobel Study," p18) being awarded to Lawrence Berkeley National Laboratory's (LBNL) Dr. George Smoot, who led the analysis effort, and NASA's Dr. John Mather, who coordinated the COBE mission.
Despite COBE's success, it still was not known if the source of the CMB anisotropy was quantum fluctuations during an inflationary epoch, or knots of energy known as topological defects generated in the first moments after the Big Bang. These competing models were known to have very different signatures in the CMB, but more detailed measurements—at higher resolution and with greater detector sensitivity—were required to distinguish between them. There followed a series of ground- and balloon-based experiments culminating in the BOOMERanG and MAXIMA experiments, which detected a harmonic series of peaks and troughs in the CMB power spectrum, exactly as predicted by the inflationary models. In addition, the amplitudes of the peaks and troughs are a sensitive measure of the fundamental parameters of cosmology. For example, the location of the first peak indicates that the Universe is flat, while the ratio of the heights of the first two peaks measures the fraction of baryons (or ordinary matter) in the Universe.
These large-scale anisotropies in the CMB were consistent with predictions of the Big Bang model and resulted in the 2006 Nobel Prize in Physics.
At the turn of the millennium such CMB results, coupled with the accelerating expansion of the Universe deduced from observations of type Ia supernovae, led to the surprising but now widely accepted Concordance Cosmology, in which the Universe is believed to comprise around 70% dark energy, 25% dark matter and 5% ordinary matter (sidebar "The Concordance Cosmology," p19). In this context, dark means not only the absence of light-emission but also reflects our lack of understanding of the basic nature of the component; as things stand cosmologists have quantified their current ignorance to be at the 95% level!

Mapping the CMB Data
The first step in deriving cosmological information from a CMB experiment is to combine all of the observations gathered by the detectors into a pixelized map of the tiny (milli- to micro-kelvin) fluctuations in the sky temperature and polarization (figure 4, p18). Such a map represents the outcome of a random process as observed from one point in the cosmos; if we could observe the cosmos from a far-distant point the particular pattern of hot and cold spots in that map would be completely different. Therefore, what we need is a statistic that captures the underlying properties of all such maps and averages out the unique features of a particular realization. For CMB maps this statistic is the angular power spectrum, a curve that registers the strength of the fluctuations on all different angular scales, and which contains information on such characteristics of the Universe as its geometry and how much matter and energy it contains. The angular power spectrum measures the average power in the fluctuations at each angular scale, regardless of the location of the observer, thereby producing a characterization of the data that would be common to all observers.

More Data, Computing Challenges, and Insight
Future CMB observations promise to yield even greater insight into the workings of the Universe. In particular, next year will see the launch of the joint European Space Agency (ESA)/NASA Planck satellite mission (figure 1, p15; sidebar "The Planck Satellite Mission," p20). However, as the amount of data has increased, so have the computational requirements of the scientific community studying CMB. The National Energy Research Scientific Computing (NERSC) Center will play a critical role in the analysis of the resulting data.
As things stand, cosmologists have quantified their current ignorance to be at the 95% level!
Figure 3. A visualization of the history of the Universe, showing its expansion.
Planck will provide the definitive measurement of the temperature anisotropies, as well as the most detailed polarization observations to date. These results will be an essential complement to a number of follow-on experiments currently being developed to improve our understanding of the dark energy (indeed the projected Planck results are routinely assumed as given by those experiments). Beyond Planck, first sub-orbital and ultimately another satellite mission will search for the faintest CMB signal, known as its B-mode polarization. In an extraordinary convergence of physical processes across scales and systems, this is expected to carry the imprint of gravity waves generated during the inflation of the Universe following the Big Bang.
As the science goals of CMB observations have evolved, they have required ever-higher resolution observations of ever-fainter signals (figure 7, p21). Achieving sufficient signal-to-noise to extract the science has necessarily required an increase in the volume of data being gathered, and their analysis has become a significant computational challenge. Over the last decade, NERSC has become the leading provider of high-performance computing (HPC) resources for CMB experiments worldwide (sidebar "CMB Data Analysis at NERSC," p24).
Figure 4. The detailed, all-sky picture of the infant Universe from three years of WMAP data. The image reveals 13.7 billion year old temperature fluctuations (shown as color differences) that correspond to the seeds that grew to become the galaxies.
 
Correlation is Key, but Computationally Intractable
The key feature of CMB data is that they are correlated, meaning that individual observations are not independent of one another. Moreover, each of the three major components of an observation—the instrument noise, the foreground signal, and the CMB signal—is correlated in a different domain. Detector noise is correlated in the time-domain; foreground signals, such as the emission from dust in the Galactic plane, are spatially correlated; and the CMB signal itself is correlated on different angular scales—indeed it is precisely the spectrum of these angular correlations that scientists want to determine.
As the science goals of CMB observations have evolved, they have required ever-higher resolution observations of ever-fainter signals.
These correlations preclude the kind of divide-and-conquer embarrassingly parallel approaches used in other data-intensive fields, such as accelerator physics. Instead, any CMB analysis has to be able to manipulate an entire dataset simultaneously and coherently, ideally keeping track not just of the data but also of all of the details of these correlations as they are reduced. In practice, tracking the correlations exactly has become computationally intractable, and instead a range of approximate methods have evolved with the degree of approximation needed being set by the size of the dataset and the available computing resources.

Scaling
In determining the suitability of a particular approach for the analysis of a dataset an important concept is that of scaling. Scaling tells us the computational cost of a calculation for a particular data size, and typically consists of a function of the data size (often a power law) and a numerical prefactor. For example, an algorithm with an N3 power law means that doubling the data size results in an eight-fold increase in computing cost, while the cost of a calculation scaling as 10(N3) would be ten times that of one scaling as N3 for the same data size. As a rule of thumb, the power law depends on the algorithm while the prefactor depends on the implementation.
Often scaling depends on several parameters that describe different dimensions of the dataset, which are typically distinguished by their subscripts. In addition there may be several steps in a complete analysis, each of which has its own scaling behavior. In this case we often simply cite the most computationally costly step, known as the dominant scaling, since it will determine the overall tractability of the approach (and also identify the area where efforts to improve performance should be focused).

CMB Scaling Limits
For CMB data analysis, the key parameters are the number of observation times, Nt, and the number of map pixels, Np. The exact maximum likelihood analysis of a CMB dataset—that is, writing down a Gaussian likelihood function and maximizing it—scales as the cube of the number of pixels, Np3. This approach was adequate for the datasets gathered by suborbital experiments in the last decade—including the seminal BOOMERanG and MAXIMA balloon missions, the first to be analyzed at the DOE's NERSC Center. However for more recent experiments—both suborbital and satellite missions like WMAP and Planck—they have become completely intractable except on small subsets of the data.
In determining the suitability of a particular approach for the analysis of a dataset an important concept is that of scaling.
The first approximation used to address this problem has been to decouple the two phases of the analysis—map-making and power spectrum estimation—and calculate maps without attempting to keep track of the correlations between the noise in each pair of pixels. Using an iterative preconditioned conjugate gradient (PCG) method to solve for the most likely map given the observations and their noise properties scales linearly with the number of observations Nt, with a prefactor of around 100 that includes the number of iterations required for the algorithm to converge. Making reasonable assumptions about growth in the available computing power, this approach should remain viable—at least on the largest supercomputers—for all foreseeable generations of CMB experiments.
Deriving the angular power spectrum from this map is now complicated by the absence of information about the pixel correlations. Various more-or-less approximate approaches exist to handle this, including Monte Carlo methods that involve generating and analyzing large numbers of simulated datasets with the same properties as the real data (at least as far as they are known) and using these to estimate the uncertainties imposed by the approximation. Since we typically expect uncertainties to fall as the square root of the number of realizations, achieving the desired percent accuracy using this method requires ten thousand realizations, each of which requires simulating and then mapping the data. This approach has been tractable to date, being widely used in the analysis of the WMAP data, but will begin to push the limits of our capabilities with the Planck data and beyond.
A number of other approximate methods, including Gibbs and Hamiltonian sampling techniques and a hierarchical maximum likelihood approach, have been proposed, although none has yet been successfully applied to real experimental data. The analysis of the coming generation of experiments remains an open challenge that will likely be met by a hybrid approach using multiple methods to try to control the errors introduced by the approximations inherent in each.

Data Proliferation
Since the publication of the BOOMERanG and MAXIMA results in early 2000, the use of NERSC resources by CMB experiments has increased very significantly. In any allocation year, there are now about a dozen experiments and 100 data analysts (many on multiple teams) sharing an allocation of upwards of one million CPU-hours.
One challenge presented by this proliferation came from the fact that every experiment adopts its own way (and often more than one way) of storing its data, in terms of both the data distribution within and between files, and the format or formats of those files. To an analysis applications programmer who wants to be able to apply his or her code to any data, these details are at best an irrelevance and at worst an encumbrance. To circumvent this we have developed a data abstraction layer, the M3 library, which allows the application programmer to request any particular interval or subset of any particular type of CMB data (for example, sample data, pixel data, and so on) without needing to know what files and formats those data are stored in. The details of the data files and formats for a particular analysis are held in an XML run configuration file which is parsed by the M3 library at the outset and referenced at the lowest level whenever actual input/output (I/O) is performed. Extending support to any new file format then simply requires the addition of one new function able to handle that file type. As well as providing transparency to the applications programmer, the use of the M3 layer has enabled CMB researchers to analyze the combined data from multiple experiments despite their different formats, and provided instant upgrades to all applications whenever a new data-access optimization or functionality is incorporated.
Figure 7. Part of the CMB sky, as measured by the COBE, WMAP, and Planck satellite experiments, illustrates the improved resolution offered by each generation of satellite. In the upper left corner are COBE data at a resolution of seven degrees.   Next is the sky as seen at 94 GHz by WMAP, with a resolution of about 15 arcminutes. Next is a strip in which WMAP data from all bands are combined and smoothed (the so-called Internal Linear Combination), controlling foreground emission and noise at the expense of angular resolution. Next are simulated Planck temperature data at their anticipated resolution of around five arcminutes. Finally, simulated Planck polarization data smoothed to 15 arcminutes are shown superimposed on the temperature anisotropies.   Planck's sensitivity, angular resolution, and frequency coverage will allow it to extract essentially all available information on primary CMB temperature anisotropies, and make a dramatic advance in measurements of CMB polarization.
Despite the very important CMB science that these experiments are doing, for many years now the Planck satellite mission has been setting the bar in terms of the computational challenge. With a dataset that is orders of magnitude larger than anything that has gone before, and with science goals that require the most precise control of errors of any CMB analysis to date, Planck represents the rising tide that floats all ships.
As with any mission before its deployment, Planck work has necessarily involved the generation and analysis of simulated data. However this is an essential exercise for a number of reasons:
  • Simulations are used to inform mission design, by assessing the impact of particular choices (or, more usually, trade-offs) on the final science results

  • Simulations provide qualitative and quantitative tests of our data-processing capabilities

  • As outlined above, simulations are themselves an essential part of the analysis pipeline when Monte Carlo methods are employed
In addition, Planck-scale CMB data analysis has proved to be a very useful scientific application for high-level system benchmarking at NERSC (sidebar "MADbench Software for Realistic Benchmarking," p23), and a symbiotic relationship has emerged in which new NERSC capabilities are regularly proved by performing Planck calculations at some previously intractable scale while simultaneously enabling unprecedented Planck science with the results.
Figure 8. Maps of the CMB+foreground sky temperature at each of the nine Planck frequencies, made from the first ever simulation of an entire year's data from all 74 detectors in the Planck full focal plane, comprising 800 billion samples over 300 million pixels. Galactic foreground contamination is visible in all of the maps (which are plotted in ecliptic coordinates) and dominates the whole sky at the highest frequencies, which will be used primarily to try to control for the contamination at the lower frequencies.
In the summer of 2003, NERSC doubled the size of its SP3 to 6,000 compute processors. Unfortunately, the system's standard MPI implementation included a hard limit of 4,096 tasks, meaning that any code wanting to exploit the new capabilities of the system had to be an MPI/OpenMP hybrid. Late in 2004, a new MPI implementation was received that supported MPI across the entire system. Given the opportunity to test this with a scientific application, the LBNL team performed the first map-making analysis of the anticipated data from all 12 of the detectors at a single Planck frequency. With 75 billion simulated samples being mapped to 150 million CMB temperature and polarization pixels, the run required 2.5 TB of disk space to store the input data, and ran on all 6,000 Seaborg processors for about two hours—the first calculation to utilize the entire system.
With a dataset that is orders of magnitude larger than anything that has gone before, and with science goals that require the most precise control of errors of any CMB analysis to date, Planck represents the rising tide that floats all ships.
The resulting maps not only demonstrated a critical data-processing capability for Planck, but their subsequent spectral analysis confirmed that, despite claims to the contrary, Planck would indeed be able to recover the large-angular-scale (low-multipole) signals despite the presence of long-range correlations in its instrument noise.
Scaling to this concurrency presented a number of problems, most specifically with the I/O subsystem. Having 6,000 processors simultaneously attempting to access 2.5 TB of data, some of which were unique to a processor and some common to all processors, proved to take about 50% of the analysis' run time. To reduce the I/O load we looked at ways to compress the data, while keeping in mind that, given the subtlety of the signals we wanted to extract, any compression would have to result in a dataset that produced exactly the same results as the uncompressed data. A closer look at the input dataset revealed that only about 15% of it was signal from the detectors, with the remainder being the so-called pointing solution, recording where on the sky each detector was pointing at the instant each sample was taken. These pointing data offered two dimensions for effectively lossless compression:
  • Rather than store the specific pointing for each detector, since the position of the detectors in the focal plane is fixed we can just store the generalized pointing for the telescope's boresight and rotate this to any particular detector as it is being read


  • Rather than store the full boresight pointing at every sample (up to 200Hz for Planck), since the scans are at least piecewise smooth we can sparse-sample it and just store the compressed pointing and interpolate this as it is being read
The trade-off is then between the disk space and I/O time required for the full pointing and the calculation time needed to reconstruct it on the fly from the generalized compressed data. Quantitatively this trade-off is obviously system-specific; on the SP3 the Generalized Compressed Pointing (GCP) library halved the time to load the data, saving of about 25% of the run-time. The dominant cost in the reconstruction comes from trigonometric functions, and systems with vector math libraries show much better performance than those without. More significantly though, the use of the GCP library reduced the disk space and memory requirements by up to 85%—a critical saving for a dataset of this size. Indeed, as a consequence the Planck team was able to increase the size of the dataset being analyzed from 12 detectors at a single frequency to 32 detectors at three frequencies, with a corresponding increase in the quality of the results. Finally, by accessing the GCP library through the M3 data abstraction layer discussed above, the new capabilities it offered were immediately and transparently accessible to all of the analysis codes using M3.
Ultimately, Planck will gather data using 74 detectors at nine frequencies. The obvious next milestone would therefore be to perform an analysis of data simulated for the full mission duration across the full focal plane. Until very recently such a calculation has been beyond the capabilities of the NERSC systems, but with the recent arrival of Franklin—a Cray XT4 with 20,000 cores each having 2GB of memory and capable of 5.2 gigaflops—it entered the realm of the possible.
Once again, the LBNL team was invited to submit a previously intractable calculation to test the limits of a new computational capability, and once again we took the opportunity also to attempt an unprecedented scientific calculation for Planck. To add to the realism of the analysis, this time we included not only the CMB signal and detector noise but also many of the foreground signal components that will contaminate the real Planck data—including point source, synchrotron, free-free and dust emission, as well as the Sunyaev-Zel'dovich effect (a spectral distortion due to the up-scattering of CMB photons by the hot plasma within dense galaxy clusters). To maximize the science opportunities for their post-processing these data were simulated in two sets—one containing the CMB and noise and the other all of the foregrounds. In addition the data were stored in separate files, each holding a single day's observations, as they will be downloaded from the satellite. Overall this simulation comprised 800 billion samples stored in 50,000 files occupying 3TB of disk space.
The Planck team was able to increase the size of the dataset being analyzed from 12 detectors at a single frequency to 32 detectors at three frequencies, with a corresponding increase in the quality of the results.
With this dataset in hand we first produced maps of the microwave sky at each individual frequency (figure 8). These served both as a check on the integrity of the data and provided the inputs expected by the Planck component-separation working group. The most challenging calculation, however, was the previously impossible analysis of the entire focal plane data volume simultaneously, which, amongst other things, supports component-separation within the time-ordered data analysis rather than at the map level. This approach has the potential to offer a significantly cleaner separation, and a comparative analysis of the outputs from the two approaches will provide the first insight into their respective strengths and weaknesses. We were able to perform this calculation with the capabilities offered by Franklin.

Next-Generation Experiments
For all that the Planck data will provide, the fine details, of the CMB B-mode polarization in particular, will require a new generation of experiments. Having reduced the detector noise to the limits of quantum efficiency and shot noise, the only way to achieve the signal-to-noise necessary for the B-spectrum measurement is through a dramatic increase in the data volume. Increasing the observation time would be one way to achieve this, although this is not always practical. The alternative is to increase the number of detectors fielded by an experiment. Despite the challenges that this presents, suborbital experiments multiplexing thousands, and ultimately tens of thousands, of detectors are planned, and their first incarnations are currently being prototyped. Ultimately another satellite mission will be required to map the entire microwave sky with such detector-sets, and both NASA and ESA have such missions in their long-range plans.

What the Future Holds
For the CMB data analysts the future therefore holds the prospect of a 100-fold increase in the raw data volume, to the order of one petabyte and beyond. This will undoubtedly present a new set of challenges. Two examples of areas that we have begun to work on in preparation for this are automated data management and on-the-fly simulation. With limited spinning disk space and multiple real and simulated datasets it will not be possible to have all the data spinning continuously. However, any particular analysis requires all of its data to be accessible simultaneously, so we will need to automate the management of limited disk resources to meet the needs of the spectrum of analysis jobs as efficiently as possible. Given this disk space constraint, we will also need to couple simulations and their analyses much more tightly, replacing the current model, in which a simulation is performed once and its data are written to disk for subsequent analysis, with one in which the data are simulated and identically re-simulated on-the-fly (and entirely in-core) whenever an analysis requests them.
New NERSC capabilities are regularly proved by performing Planck calculations at some previously intractable scale while simultaneously enabling unprecedented Planck science with the results.
Figure 10. Artist's conception of the WMAP spacecraft.
Equipped with world-leading high-performance computing capacity and capability, we remain confident that we will continue to be able to meet the data analysis needs of current and future generations of CMB experiments.
Undoubtedly other challenges will arise as these data start to flow. However, equipped with world-leading high-performance computing capacity and capability, we remain confident that we will continue to be able to meet the data analysis needs of current and future generations of CMB experiments, and so help to realize the extraordinary scientific potential of this last echo of the Big Bang.

Contributors: This work has been the result of a number of collaborations with the CMB and HPC communities. The author, Dr. Julian Borrill, would like to thank members of the BOOMERanG and MAXIMA experiments, the U.S. Planck team and Planck Working Groups 2 and 3, as well as colleagues in the LBNL Computational Research Division and University of California-Berkeley Computer Science Division and Space Sciences Laboratory. Particular recognition goes to Christopher Cantalupo, Andrew Jaffe, Theodore Kisner, and Radoslaw Stompor.
Further Reading
"Cosmology and Computation: An Interview with George Smoot," p10