DOESciDAC ReviewOffice of Science
CLIMATE
Bridging the Gap Between CLIMATE and WEATHER
The distinction between climate and weather was expressed most succinctly by science fiction writer Robert A. Heinlein: "Climate is what we expect; weather is what we get." But as global warming produces more noticeable changes on a planetary scale, how do we even know what to expect in a particular region?

Climate change studies are increasingly focused on understanding and predicting regional changes of daily weather statistics. But to predict the next century's statistical trends with confidence, researchers have to demonstrate that their forecasting tools can successfully recreate the conditions of the past century. That requires a detailed set of historical atmospheric circulation data—not just monthly averages, but statistics for at least every six hours, so that phenomena like severe storms can be analyzed.
Figure 1. Historic weather map for 8 a.m. on January 28, 1922, the day the deadly Knickerbocker Storm hit Washington, DC.
Although there is scant atmospheric data from weather balloons and none from satellites for the first half of the 20th century, there is an enormous amount of observational data collected at the Earth's surface by a variety of sources, from meteorologists and military personnel to volunteer observers and ships' crews. Until recently, these two-dimensional data were widely available only on hand-drawn weather maps (figure 1; sidebar "Recreating the Knickerbocker Storm of 1922," p52). Despite many errors, these maps are indispensable to researchers, and extensive efforts are being made to put them into a digital format and make them available on the Web.
To predict the next century's statistical trends with confidence, researchers have to demonstrate that their forecasting tools can successfully recreate the conditions of the past century.
Now, using the latest data integration and atmospheric modeling tools and a 2007 Innovative and Novel Computational Impact on Theory and Experiment (INCITE) award of 2 million supercomputing hours at the National Energy Research Scientific Computing (NERSC) Center, scientists from the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Lab and the Cooperative Institute for Research in Environmental Sciences (CIRES) are building the first complete database of three-dimensional global weather maps of the 20th century.
Called the 20th Century Reanalysis Project, the new dataset will double the number of years for which a complete record of three-dimensional atmospheric climate data is available, extending the usable digital dataset from 1948 back to 1892. The team expects to complete the dataset within two years, including observations currently being digitized around the world. The final maps will depict weather conditions every six hours from the Earth's surface to the level of the jet stream (about 11 km, or 36,000 feet, above the surface), and will allow researchers to compare the patterns, magnitudes, means, and extremes of recent and projected climate changes with past changes.
"We expect the reanalysis of a century's worth of data will enable climate researchers to better address issues such as the range of natural variability of extreme events including floods, droughts, hurricanes, extratropical cyclones, and cold waves," said principal investigator Dr. Gil Compo of CIRES. Other team members are Dr. Jeff Whitaker of the NOAA Earth System Research Lab and Dr. Prashant Sardeshmukh, also of CIRES, a joint institute of NOAA and the University of Colorado.
Statistical summaries by themselves are inadequate for studies of climate changes.
"Climate change may alter a region's weather and its dominant weather patterns," Dr. Compo said. "We need to know if we can understand and simulate the variations in weather and weather patterns over the past 100 years to have confidence in our projections of changes in the future. The alternative—to wait for another 50 years of observations—is less appealing."

Reanalysis: Reconstructing Complete Climate Data
Traditionally the Earth's climate has been studied by statistical analysis of weather data such as temperature, wind direction and speed, and precipitation, with the results expressed in terms of long-term averages and variability. But statistical summaries by themselves are inadequate for studies of climate changes; for one thing, many important atmospheric events happen too quickly to be captured in the averages. The ideal historical dataset would provide continuous, three-dimensional weather data for the entire globe, collected using consistent methods for a period of at least a century, and longer if possible. In reality, weather records are incomplete both spatially and temporally, skewed by changing methods of collecting data, and sprinkled with inaccuracies.
Reanalysis is a technique for reconstructing complete, continuous, and physically consistent long-term climate data. It integrates quality-controlled data obtained from disparate observing systems, then feeds these data into a numerical weather forecasting model to produce short-term forecasts. The output from these forecasts fills in the gaps in the recorded observations both in time and space, resulting in high-resolution, three-dimensional datasets.
Figure 3. Comparison of analyses of 0000 UTC 20 Dec 2001 500 hPa geopotential height from (top left) full NCEP-NCAR reanalysis using all available observations at all levels (greater than 150,000) and three parallel assimilation experiments with a simulated 1895 network of only 308 surface pressure observations: (top right) EnsClim shows a root-mean-square (rms) difference of 95.7 meters from the full NCEP-NCAR reanalysis; (bottom left) EnsFilt shows an rms difference of 49.2 meters from the full reanalysis; and (bottom right) CDAS-SFC has an rms difference of 96.0 meters from the full reanalysis. Blue dots indicate the location of the surface pressure observations used to make the experimental analyses. The 5,500-meter line is thickened, and the contour interval is 50 meters.
Over the past decade, reanalysis datasets have been used in a wide range of climate applications and have provided a more detailed and comprehensive understanding of the dynamics of the Earth's atmosphere, especially over regions where the data are sparse, such as the poles and the Southern oceans. Reanalysis has also alleviated the impact of changing observation systems and reduced the uncertainty of climate modeling by providing consistent and reliable datasets for the development and validation of models.
A reanalysis workshop held in January 2005 at the University of Reading in the United Kingdom identified a number of climate research needs that could benefit from a dedicated dynamical reanalysis, including a more in-depth understanding of the general circulation of the atmosphere, and a more reliable assessment of climate trends, the hydrological cycle, and the calculation of energy fluxes over the oceans.
The Earth's atmosphere and oceans transport heat from the tropics to the polar regions. In forecast models, the processes of heating and transport are often represented by parameterizations—approximations of processes that are too small-scale or complex to be physically represented in the model, such as transient eddies within storm tracks that transport heat and moisture to higher latitudes. This lack of precision in the representation of mutually interacting processes sometimes yields flawed results with inconsistent heating and flow processes. Reanalysis data could provide the foundation for more accurate parameterizations.
"...it means that we can use the surface pressure measurements to get a very good picture of the weather back to the19th century."

DR. GIL COMPO
CIRES
Uncertainty regarding tropospheric temperature trends has resulted from inconsistency among observations from satellites, radiosondes (the detector/transmitters in weather balloons), and surface instruments. Even changes in the types of radiosondes have raised questions about the consistency of the data they produce. Real-time bias correction during the data assimilation phase of reanalysis can help reduce these uncertainties.
While the first generation of reanalyses focused only on the atmospheric component, recently there has been rapid progress toward creating reanalyses for the ocean, land surface, and the coupled climate system. For example, modeling of the precipitation/evaporation cycle is affected by parameterization of small-scale features such as convective systems, which produce a large part of the total precipitation in many parts of the world. Reanalyses that include precipitation data may improve the modeling of the hydrological cycle. And the use of coupled model data assimilation in reanalyses may improve the calculation of energy fluxes over the oceans.

From Two to Three Dimensions
Dr. Compo, Dr. Whitaker, and Dr. Sardeshmukh have discovered that using only surface air pressure data, it is possible to recreate a snapshot of other variables, such as winds and temperatures, throughout the troposphere—from the ground or sea level to the jet stream. This discovery makes it possible to extend two-dimensional weather maps into three dimensions. "This was a bit unexpected," Dr. Compo said, "but it means that we can use the surface pressure measurements to get a very good picture of the weather back to the 19th century."
The computer code used to combine the data and reconstruct the third dimension has two components. The forecast model is the atmospheric component of the Climate Forecast System, which is used by the National Weather Service's National Centers for Environmental Prediction (NCEP) to make operational climate forecasts. The data assimilation component is the Ensemble Kalman Filter.
Data assimilation is the process by which raw data such as temperature and atmospheric pressure observations are incorporated into the physics-based equations that make up numerical weather models. This process provides the initial values used in the equations to predict how atmospheric conditions will evolve. Data assimilation takes place in a series of analysis cycles. In each analysis cycle, observational data are combined with the forecast results from the mathematical model to produce the best estimate of the current state of the system, balancing the uncertainty in the data and in the forecast. The model then advances several hours, and the results become the forecast for the next analysis cycle.
The Ensemble Kalman Filter is one of the most sophisticated tools available for data assimilation.
Figure 4. Effect of a single surface pressure observation (gray dot) on the final analysis. Filled contours show the first-guess field from CDAS-SFC (left) and Ensemble Filter (right) for geopotential height at 300 (top) and 1,000 hPa (bottom). Line contours show the analysis increment after assimilating a single pressure observation that is 1 hPa larger than the first-guess pressure field at the indicated location (gray dots). The two gray dots represent two separate experiments. First-guess (filled) contour interval is 100 hPa. Analysis increment (line) contour interval is 2 m starting at 1 m. Positive (negative) increments are indicated with black (red) contours.
The Ensemble Kalman Filter is one of the most sophisticated tools available for data assimilation. Generically, a Kalman filter is a recursive algorithm that estimates the state of a dynamic system from a series of incomplete and noisy measurements. Kalman filters are used in a wide range of engineering applications, from radar to computer vision to aircraft and spacecraft navigation. Perhaps the most commonly used type of Kalman filter is the phase-locked loop, which enables radios, video equipment, and other communications devices to recover a signal from a noisy communication channel. Kalman filtering has only recently been applied to weather and climate applications, but the initial results have been so good that the Meteorological Service of Canada has incorporated it into their forecasting code. The 20th Century Reanalysis Project uses the Ensemble Kalman Filter to remove errors in the observations and to fill in the blanks where information is missing, creating a complete weather map of the troposphere.
Rather than making a single estimate of atmospheric conditions at each time step, the Ensemble Kalman Filter reduces the uncertainty by covering a wide range. It produces 56 estimated weather maps—the ensemble—each slightly different from the others. The mean of the ensemble is the best estimate, and the variance within the ensemble indicates the degree of uncertainty, with less variance indicating higher certainty. The filter blends the forecasts with the observations, giving more weight to the observations when they are high-quality, or to the forecasts when the observations are noisy. The NCEP forecasting system then takes the blended 56 weather maps and runs them forward six hours to produce the next forecast. Processing one month of global weather data takes about a day of computing, with each map running on its own processor. The Kalman filter is flexible enough to change continuously, adapting to the location and number of observations as well as meteorological conditions, thus enabling the model to correct itself in each analysis cycle.
"What we have shown is that the map for the entire troposphere is very good, even though we have only used the surface pressure observations," said Dr. Compo. He estimates that the error for the 3D weather maps will be comparable to the error of modern two- to three-day weather forecasts.

Four-Dimensional Reanalysis Using Only Surface Pressure Data
Over the past several years, Dr. Compo, Dr. Whitaker, and Dr. Sardeshmukh have developed a unique capability to produce high-quality six-hourly reanalyses for the troposphere from surface pressure observations alone using a data assimilation system based on the Ensemble Kalman Filter. Before the 20th Century Reanalysis Project began, they conducted a series of pilot reanalyses to establish the feasibility of producing a reanalysis dataset from the 1890s—before observational data for the upper atmosphere were available—to the present.
In one study, they chose three data assimilation systems to make their assessment. One system was a three-dimensional variational data assimilation (3DVAR) scheme very similar to that used for the NCEP-National Center for Atmospheric Research (NCAR) reanalysis, which allowed them to test a system that had been extensively used and studied. This system was modified for surface pressure only and was referred to as CDAS-SFC in the study. The second system was the Ensemble Kalman Filter (EnsFilt), representing the potential for advanced data assimilation systems to improve upon older 3DVAR systems. As a baseline measure, they used a third system—a climatologically based statistical interpolation scheme (EnsClim) with no dynamical model to advance information to the next analysis time step. This baseline enabled them to quantify the importance of propagating information with a dynamical model.
The researchers used modern data but reduced the observational network to resemble historical networks from four representative five-year periods, centered on 1895, 1905, 1915, and 1935. A representative example of a 500 hPa air pressure geopotential (that is, gravity-adjusted) height analysis using the simulated 1895 network at 0000 UTC (coordinated universal time) 20 December 2001 is shown in figure 3 (p53). Analyses produced with the three data assimilation systems using only 308 surface pressure observations were compared with the full NCEP-NCAR reanalysis, which used all available observations at all levels (more than 150,000). In this example, the EnsClim analysis depicts many of the large-scale barotropic features associated with this time, including a substantial block over the North Atlantic and deep troughs over Europe and the North Pacific, but misses the smaller synoptic-scale features. In contrast, the CDAS-SFC analysis has many small-scale features, but they are positioned incorrectly, resulting in an error comparable to that of the EnsClim. The Ensemble Filter was able to represent not only the large-scale features, but also many of the synoptic-scale features, and had an overall smaller error for this case and throughout the month tested.
To demonstrate the influence of a single surface pressure observation on the resulting analysis in the Ensemble Filter and CDAS-SFC systems, the researchers conducted four single-observation assimilation experiments. Two "observations" were assimilated separately by both systems. Each observation was prescribed to have a value 1 hPa larger than the surface pressure forecast for 0600 UTC 25 December 2001 from the previous assimilation using a 1905 network. The results of the separate experiments are plotted together in figure 4 to make a summary of the results. The filled contours show the first-guess geopotential height field at 1,000 (bottom) and 300 hPa (top) from the CDAS-SFC (left) and Ensemble Filter (right). The line contours show the analysis increment, the difference between the analysis after assimilating the indicated observation and the first-guess field.
The Ensemble Filter was able to represent not only the large-scale features, but also many of the synoptic-scale features, and had an overall smaller error for this case and throughout the month tested.
For the CDAS-SFC system, the analysis increments for the two experiments are identical—each is centered on the observation location, is largest at the surface, and decreases with the height. In contrast, the right panels illustrate the ability of the Ensemble Filter to create spatially inhomogeneous background error covariances that change with the flow and observational density. The analysis increments produced by the two observations are quite different, reflecting the larger expected uncertainty in the observation-poor central Pacific and smaller uncertainty in the observation-rich continental North America. The larger uncertainty becomes a larger analysis increment. The uncertainty in the first guess, the background-error covariance, over the mid-Pacific translates an observation 1 hPa above the background into a general weakening of the nearby trough at 1,000 hPa directly to the east of the observation, and even more weakening of the upper-level trough at 300 hPa to the Southeast of the observation. In this case, a single surface pressure observation produces an analysis increment that is tilted with height and has maximum amplitude in the upper troposphere. The Ensemble Filter also changes the sign of the increment to the Northwest of the observation. Even over the interior continent, the effect of the single observation, though smaller, still has maximum amplitude in the upper troposphere and is tilted with height. This example illustrates how the Ensemble Kalman Filter can reconstruct three-dimensional conditions from two-dimensional data.
Figure 5 (p55) illustrates the degree to which the principal mid-tropospheric features for an extreme event, the famous post-Christmas snowstorm of December 1947, are present in a reanalysis map using only surface pressure observations and the Ensemble Kalman Filter (left panel). The reanalysis features are compared with those seen in maps which used all available surface, radiosonde, and other upper-tropospheric observations: a hand-drawn real-time map from the Air Weather Service (middle panel) and a map from a reanalysis using the full NCEP assimilation system (right panel). It is remarkable that the Ensemble Filter reanalysis, using only the surface pressure observations and a lower-resolution model, is able to replicate many of the features seen in the hand-drawn mid-tropospheric analysis produced at the time, arguably better than the higher-resolution NCEP system. Most likely, this advantage arises from the Kalman gain, which adjusts for both the meteorological conditions and the observational network. The NCEP system used in the right-hand panel does not have this flexibility and could probably be improved by altering the system to account for the sparse observations.
The new 3D atmospheric dataset will provide missing information about the conditions in which early-century extreme climate events occurred, such as the Dust Bowl of the 1930s and the arctic warming of the 1920s to 1940s.
Figure 5. The 500 hPa geopotential height analysis of 27 December 1947 06GMT from the Ensemble Filter data assimilation system using only surface pressure observations (left) and from the experimental NCEP T254 analysis using all available surface and upper air observations (right). An Air Weather Service map drawn in near-real time is also shown (middle). Colored arrows illustrate the same features in all three maps. Note that the middle and right panels used all available surface, radiosonde, and other upper-air observations.
Figure 6 provides additional evidence that the Ensemble Kalman Filter-produced upper-level tropospheric circulation fields will reflect the actual atmospheric variations. Shown in the black dots are newly digitized daily-averaged radiosonde observations of 500 hPa (approximately 5,500 meter altitude) temperature at Ilmala, Finland (62.2°N, 24.92°E) for the period November 1943-October 1944. The variability produced by the reanalysis (red stars) at this high-latitude location appears to be consistent with the direct measurements even on a case-by-case basis, suggesting that the Ensemble Kalman Filter will be able to reconstruct upper-air variability of both weather and climate variations throughout the 20th century.
Figure 6. Daily averaged temperatures at 500 hPa (approximately 5,500 meter altitude) from radiosonde measurements taken at Ilmala, Finland (black dots) from 1943-1944 compared with estimates from the Ensemble Filter using only surface pressure observations (red stars). The Ensemble Filter estimates are able to represent most of the variability seen in the observations throughout the year with a correlation coefficient of 0.94.
Overall, the feasibility studies of Dr. Compo and his collaborators suggest that the extratropical, upper-tropospheric Northern Hemispheric height errors obtained from Ensemble Kalman Filter-based analyses will be comparable to current two- to three-day weather prediction forecast errors.

Filling In and Correcting the Historical Record
With the 2007 INCITE allocation run on NERSC's Bassi and Jacquard systems, the researchers reconstructed weather maps for the years 1918 to 1949. In 2008, they plan to extend the dataset back to 1892 and forward to 2007, spanning the 20th century. In the future, they hope to run the model at higher resolution on more powerful computers, and perhaps extend the global dataset back to 1850.
Figure 7. A dust storm approaching Stratford, TX on April 18, 1935. The 20th Century Reanalysis Project will provide missing information about the conditions in which early-century extreme climate events occurred, such as the prolonged drought that led to the Dust Bowl of the 1930s.
One of the first results of the INCITE award is that more historical data are being made available to the international research community. This project will provide climate modelers with surface pressure observations never before released from Australia, Canada, Croatia, the United States, Hong Kong, Italy, Spain, and 11 West African nations. When the researchers see gaps in the data, they contact the country's weather service for more information, and the prospect of contributing to a global database has motivated some countries to increase the quality and quantity of their observational data.
The team also aims to reduce inconsistencies in the atmospheric climate record, which stem from differences in how and where atmospheric conditions are observed. Until the 1940s, for example, weather and climate observations were mainly taken from the Earth's surface. Later, weather balloons were added. Since the 1970s, extensive satellite observations have become the norm. Discrepancies in data resulting from these different observing platforms have caused otherwise similar climate datasets to perform poorly in determining the variability of storm tracks or of tropical and Antarctic climate trends. In some cases, flawed datasets have produced spurious long-term trends.
The new 3D atmospheric dataset will provide missing information about the conditions in which early-century extreme climate events occurred, such as the Dust Bowl of the 1930s (figure 7) and the arctic warming of the 1920s to 1940s. It will also help to explain climate variations that may have misinformed early-century policy decisions, such as the prolonged wet period in central North America that led to overestimates of expected future precipitation and over-allocation of water resources in the Colorado River basin.
But the most important use of weather data from the past will be the validation of climate model simulations and projections into the future. "This dataset will provide an important validation check on the climate models being used to make 21st century climate projections in the recently released Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC)," Dr. Compo said. "Our dataset will also help improve the climate models that will contribute to the IPCC's Fifth Assessment Report."
Contributor: Dr. Gil Compo, CIRES and the University of Colorado-Boulder; Dr. Jeff Whitaker, NOAA Earth System Research Lab; Dr. Prashant Sardeshmukh, CIRES and the University of Colorado-Boulder
Further Reading

G. P. Compo, J. S. Whitaker, and P. D. Sardeshmukh. 2006. Feasibility of a 100-year reanalysis using only surface pressure data. Bull. Am. Meteor. Soc., 87 (2): 175-190.

L. Bengtsson et al. 2007. The need for a dynamical climate reanalysis. Bull. Am. Meteor. Soc., 88 (4): 495-501.