(16 November 1995)

2012 Global EPS Report to WMO on NCEP Global Ensemble Forecast System (GEFS) pdf


Since the atmosphere is a chaotic dynamical system, any small error in the initial condition will lead to growing errors in the forecast, eventually leading to a total loss of any predictive information. This is so even if our models were perfect (Lorenz, 1969). The rate of this error growth and hence the lead time at which predictability is lost depends on factors such as the circulation regime, season, and geographical domain. It is possible to obtain information on the inherent predictability of a given case by running the model from a number of initial conditions that lie within the estimated cloud of uncertainty that surrounds the control analysis (which is our best estimate of the true state of the atmosphere).

In the extratropics numerical weather prediction models are good enough so that to a first approximation forecast error growth can be attributed solely to the process of growing instabilities in a chaotic system that result from initial condition uncertainties. In our current ensemble approach, therefore, we assume that our models are "perfect" and introduce perturbations only to the analysis (rather than or in addition to, for example, perturbations in model physics).

The above described role of chaos is evident on all spatial and temporal scales in the atmosphere and ocean. So, ideally, all forecasts should be made from an ensemble of initial conditions. In fact in any nonlinear dynamical system this approach offers the best possible forecast with the maximum information content. Averaging the ensemble members provides in a statistical sense a forecast more reliable than any of the single forecasts, including that started from the control analysis (Leith, 1974). Additionally, from the spread of the ensemble we can assess the reliability of the predictions and, for a sufficiently large number of realizations, any forecast quantity can be expressed in terms of probabilities. These probabilities convey all the information available regarding future weather. Note that each individual model run is deterministic, i.e., uniquely defined by the initial conditions, but collectively the ensemble of forecasts from the set of slightly different analyses portrays the chaotic nature of the atmosphere. Since a priori any single ensemble member is no more or less likely than any other, forecasting can be viewed as deterministic only to the extent that the envelope of solutions is sufficiently small that the differences between forecasts are inconsequential to the user. Otherwise, the variety of solutions and the implied uncertainty reflect the more general stochastic nature of the forecast process, with the range and distribution of possible outcomes providing information on the relative likelihood of various scenarios.

At NCEP the ensemble approach has been applied operationally for the medium- and extended range (Tracton and Kalnay, 1993; Toth and Kalnay, 1993), using the Environmental Modeling Center's (EMC) Medium-Range Forecast Model (MRF), which has a global domain. This document contains information regarding the operational global ensemble forecasts. However, short-range ensemble forecasts are also created at NCEP on an experimental basis, using the ETA and Regional Spectral Models (Brooks et al., 1995; Hamill and Collucci, 1996). Planning is also underway to run the coupled ocean- atmosphere model of EMC in an ensemble mode (Toth and Kalnay, 1995).



There will always be a trade-off between the resolution at which the forecasts are made and the number of ensemble forecast members, due to limited computational resources. Since the impact of using a higher resolution model is not detectable with traditional skill scores beyond a few days (Tracton and Kalnay, 1993), at NCEP we truncate the resolution of the nominal MRF and AVN runs (T126 truncation, ~100 km) to T62 (~200 km) at a lead times of 7 and 3 days at 00Z and 12Z, respectively. At 00Z there is also a "control" totally T62 run. In addition to this control forecast, 10 forecasts with T62 resolution are run from 00Z starting from slightly perturbed initial conditions. At 12Z four additional forecasts are generated from perturbed initial analyses. Hence, there is a total of 17 individual global predictions generated daily. All forecasts are run to 16 days with the latest version of the EMC MRF global model (Kanamitsu et al., 1993). Evaluations indicate that for the first week or so for daily forecasts, the set of 17 ensemble members is sufficient. For the currently operational 6-10 day mean outlooks (and future "week2" forecasts), additional information is added by including the runs from up to 48 hours prior to "todays" 00Z set in the time (or lagged) average sense (Hoffman and Kalnay, 198*). In this application the total the number of forecasts (ensemble size) equals 46.


There has been considerable effort directed at the question of optimal initial perturbations. There are two major considerations. The first is to estimate the analysis error in the probabilistic sense, and the second is to create an adequate sampling of perturbations given this statistical estimate of initial uncertainty. Since not all errors in the analysis are likely to grow, adequate here encompasses not just the question of representativeness. It includes also the notion of economy in identifying only those initial uncertainties that result in rapidly diverging solutions (e.g., errors in analysis of a baroclinic zone versus those in a broad ridge).

If the analysis error had a white noise distribution (i. e., all possible analysis errors occurred with the same probability), the best sampling strategy would be the use of the singular vectors (SVs) of the linear tangent version of the nonlinear model (Buizza and Palmer, 199*; Ehrendorfer and Tribbia, 1995). This is because the leading SVs span those directions in the phase space that are capable of maximum error growth. If we miss those directions, "truth" could lie outside the ensemble envelope.

The analysis error distribution, however, is far from being white noise (Kalnay and Toth, 1994): Consider the analysis/forecast cycle of the data assimilation system as running a nonlinear perturbation model. The error in the first guess (short-range forecast) is the perturbation which is periodically "rescaled" at each analysis time by blending observations with the guess. Since observations are generally sparse they can not eliminate all errors from the short-range forecast that is subsequently generated as the first guess for the next analysis. Obviously, any error that grew in the previous short-range forecast will have a larger chance of remaining (at least partially) in the latest analysis than errors that had decayed. These growing errors will then start amplifying quickly again in the next short-range forecast.

It follows that the analysis contains fast growing errors that are dynamically created by the repetitive use of the model to create the first guess fields. This is what we refer to as the "breeding cycle" or Breeding of Growing Modes (BGM). These fast growing errors are above and beyond the traditionally recognized random errors that result from errors in observations. Those errors generally do not grow rapidly since they are not organized dynamically. It turns out that the growing errors in the analysis are related to the local Lyapunov vectors of the atmosphere (which are mathematical phase space directions that can grow fastest in a sustainable manner). Indeed, these vectors are what is estimated by the breeding method (Toth and Kalnay, 1993, 1995).

At NCEP we use 7 independent breeding cycles to generate the 14 initial ensemble perturbations. The initiation of each breeding cycle begins with an analysis/forecast cycle which differs from the others only in the initially prescribed random distribution ("seed") of analysis errors. These initially random perturbations are added and subtracted from the control analysis, so that each breeding cycle generates a pair of perturbed analyses (14 in all). From this point on each breeding cycle evolves independently to produce its own set of perturbations. The perturbations are just the differences between the short-term forecast (24 hour) initiated from the last perturbed analysis and the "control" analysis, rescaled to the magnitude of the seed perturbation. Since these short-term forecasts are just the early part of the extended range ensemble predictions, generation of the perturbations is basically cost free with respect to the analysis system (unlike the singular vector approach of ECMWF). The cycling of the perturbations continues and within a few days the perturbations reach their maximum growth.

Note that once the initial perturbations are introduced, the perturbation patterns evolve freely in the breeding cycle except that their size is kept within a certain amplitude range. Also note the similarity in the manner errors grow in the analysis vs. breeding cycles. The only difference is that from the breeding cycle, the stochastic elements that are introduced into the analysis through the use of observations containing random noise are eliminated by the use of deterministic rescaling. The seven quasi-orthogonal bred vectors from the breeding cycles span a subspace of the atmospheric attractor that represents the highest sustainable growth in the modeled atmosphere, at the given perturbation amplitude.

The breeding method has one free parameter, which is perturbation amplitude. We use a perturbation amplitude which is on average on the order of 12% of the total climatological rms variance (~10 m at 500 hPa height). The sensitivity to the choice of this amplitude (for example as a function of season) is under investigation. Regarding the spatial distribution of estimated analysis errors, we use a geographical mask (Toth and Kalnay, 1995) to which perturbations are rescaled every day. As a result, in data void regions such as the ocean basins the perturbations are three times or so larger than over data rich continents.

Finally, keep in mind that there is no guarantee that the above methodology "finds" all the possible growing modes or, equivalently, the ensemble will reliably encompass all possible outcomes in every situation: we cannot run enough perturbed forecasts (with, for example, different initial perturbation sizes) to populate the whole forecast distribution all the time. Moreover, remember that the forecast model is not "perfect", and model error, as well as initial condition uncertainty, will contribute to the distribution of predictions within the ensemble (especially systematic errors which may drive all the solutions in the same - wrong - direction). Overall, however, verifications indicate that the ensemble system as now constructed does provide enhanced skill through ensemble averaging and usefully reliable probability estimates.


One of the most challenging aspects of ensemble prediction is condensing the vast amounts of model output and information into an operationally relevant and useful form. One could, of course, display each of the standard maps and products for each indivdual forecast, but this very quickly becomes extremely cumbersome and difficult to digest and comprehend. Hence, we have invested considerable effort to convey and display the essential information from ensembles as compactly as possible. The following briefly describes the nature and use of the products available currently or under development. Keep in mind that ensemble prediction, while the acknowledged wave of the future in operational NWP, is rather new. And, especially with regard to existing or future operational products, we encourage feedback - we're on the learning curve together!

1) Ensemble mean:

The ensemble mean as now constructed is the weighted average of "todays" set of 12 00Z runs and the 5 predictions from the previous 12Z cycle (all verifying the same time). The weighting, based on a continually updated several week "training" period, allows for a somewhat greater influence of the more skillful high resolution MRF (in the first few days only) and somewhat lesser influence of the 12 hour old forecasts. In theory and practice (as demonstrated by verification statistics), the ensemble mean on average is more skillful than any individual ensemble member - moreso in winter, least (if at all) in summer, and somewhere in between during the transition seasons. So, given nothing else, the ensemble mean is the way to go. The ensemble mean will usually be "smoother" in appearance than any of the individual forecasts because the averaging filters the "unpredictable components", where unpredictable here means inconsistencies amongst ensemble members. Conceptually, considering anything with more detail than contained in the ensemble mean over specifies the inherent predictability; however, note that, if most of the ensemble members are similar in the amplitude and phase of even smaller-scale features, they will be retained in the ensemble mean.

2) Ensemble Spread:

The ensemble mean is just the first order advantage of ensemble prediction. Its more significant use is in providing information on uncertainties and/or confidence. The most basic product addressing this is the ensemble spread, which here is simply the standard deviation of the ensemble members about the ensemble mean. It reflects the overall degree of variability amongst the ensemble members - the larger values indicating areas of substantial disagreement and hence less confidence in any individual prediction (or ensemble mean) and visa versa. The maps of spread thus provide an evolving measure of the relative confidence geographically and with respect to individual weather systems.

3) Clustering:

Clustering here refers to grouping together ensemble members that are similar in some respect. The approach used here is based on simple correlation analysis. First, the two predictions least similar (smallest anomaly correlation < 0.6) are determined. Ensemble members similar (AC>0.6) to each of these extremes (if any) are found, and the cluster mean for each formed. Unless no two forecasts are dissimilar there will always be at least two clusters (C1, C2), possibly consisting of only one forecast each, which correspond to the range of solutions sampled by the ensemble. Second, of the remaining forecasts, the two most similar are found, and members similar to them grouped (averaged) to form the next cluster. The process iterates until there is no longer any set of at least two forecasts that are similar (maximum of 6 allowed). The cluster means effectively reduce the number of degrees of freedom relative to considering the complete set of individual forecasts, but not so much as the full ensemble mean (except if all the forecasts are alike). Ideally, and for the most part as indicated by verification scores, the more populated a cluster the more skillful the cluster mean.

Currently, the only two fields clustered are the 1000 and 500 mb height fields. Clustering is relative to the similarities amongst forecasts averaged over North American and its environs. The specifics of the domain are given in the detailed documentation. In the future we expect to perform the clustering for smaller sub regions and for other fields (ultimately, we hope, to be selected interactively by the forecaster). Also at present the clustering is done independently by level and time, so that the cluster membership at a given time may be different for 1000 and 500 mb and for either level different from one forecast time to the next. An alternative approach (available in the near future) is to force the cluster membership for all times and for each level to that determined for 500mb at day 5. While this will allow tracing the evolution of each ensemble over time with assurance that membership is unchanging, it may or may not be a better approach; forecasts similar at, for example day 3, may quite naturally diverge and be similar to members by day 6 they had little in common with earlier. (Clustering relative to mean conditions, e.g., days 3-5, have not proved satisfactory at ECMWF.)

4) "Spaghetti" diagrams:

These are simply composite charts of a selected contour (e.g., 5400 Z) from each ensemble member plotted on the same chart. The obvious purpose is to convey the information content of EACH ensemble member in a sufficiently compact form to enable ready visualization and interpretation. These charts show explicitly the evolution of almost identical solutions initially to virtual "spaghetti" after some time. In the process they provide information on the relative predictability as a function of forecast lead time and space (high where and when solutions are close and visa versa). Note too this is also a form of "graphical clustering" in that one can visually weigh the non-uniform distribution of solutions (if any) and thereby judge the relative likelihood of specific outcomes in terms of the number of forecasts pointing in that direction.

In addition to contour plots for 500 mb height, charts for various other parameters (at varying intervals) are available, some for very specific usage. For example, the spaghetti diagrams for 1000-500mb thickness and for 850mb temperature are intended primarily for assessing the uncertainty in predicting the boundary between frozen and non-frozen precipitation. The contour plots for SLP relate to the position and (w.r.t. choice of contour value) the intensity of high and low pressure systems. An adjunct to these displays are charts which depict just the positions of the "L" and "H" centers, from which one can follow over time the extent of disagreement in the occurrence and tracks of developing systems (e.g., east coastal storms). Isotach composite charts convey information about jet systems and relative humidity diagrams about the potential for precipitation. Plots with actual isohyets (QPF) relate more directly to this problem and will be added as soon as possible.

The above list is clearly not exhaustive nor possibly designed and presented in an optimum way; again, feedback from users is most welcome!

5) Probability forecasts:

The products described thus far lend themselves primarily to qualitative statements about the relative likelihood of different outcomes. An integral aspect of ensemble forecasting, though, is that it also provides quantitative estimates of probabilities. Probability estimates here are defined simply as the percentage of predictions out of the total (17) that satisfy the specified criterion. Probability charts for the 500mb height exceeding, for example, 5580m define quantitatively the probability envelope about the corresponding spaghetti diagram. Beyond the envelope the interpretation is that there is near certainty that the actual height will be either less or greater than 5558. The varying width of the envelope over time and space domains conveys quantitatively the degree of uncertainty. The same interpretation applies to the spaghetti plots for thickness and 850 temperature less than 5400m and 0-deg C, respectively, with regard to the zone of uncertainty in the rain versus snow problem (these, as with many criteria, are somewhat arbitrary or skewed toward our east coast bias - but a sufficient amount of yelling and screaming from elsewhere on this choice will be heard!).

Other probability products available include that for the thickness anomaly greater and less than 60m (about 3-deg C in layer-mean temp) as a guide for appreciating the confidence in predicted anomalously cold and warm regions. The probability of 700mb RH > 70% can be considered a proxy for precipitation chances, which will be added explicitly in the near future. The predicted odds for the 12hr 500 height tendency > 30m chart is an indicator of the degree of consistency amongst the members in depicting short-wave activity; consistency in treatment of the smaller-scale systems (because they are intrinsically less predictable) is a signal to take the implications seriously. In principle probability information on any direct or derived model parameter can be output - so let your requests come forward (how about vertical stability or indices such as the PNA?). Another form of probability chart, currently being developed, is to express the confidence in percentages of, for example, the 850mb temperature anomaly being within 5-deg of the ensemble mean value (or any other base for comparison).

A final few words on ensemble derived probabilities. Many forecasters have come to know and use (and love?) probability statements for precipitation and temperature. These generally are the direct or somewhat modified POP's and POT's generated statistically by TDL via the MOS (or similar) approach. They basically describe the probability distributions given the parameters (or, more generally, the synoptic features) from a single (i.e., deterministic) prediction (NGM or MRF). The actual uncertainty consists of two components; that associated with the non-unique distribution of precipitation or temperature given a particular synoptic scenario AND that intrinsic to there being an array of alternative scenarios. MOS accounts now only for the first component while the probabilities derived from the direct model output (as described above) from ensembles include both. Of course, one could derive and combine precipitation and temperature probability distributions from each ensemble member, and TDL is actively pursuing this approach.