# ENSEMBLE FORECASTING AT NCEP

## INTRODUCTION

#### Since the atmosphere is a chaotic dynamical system, any small error in the initial condition will lead to growing errors in the forecast, eventually leading to a total loss of any predictive information. This is so even if our models were perfect (Lorenz, 1969). The rate of this error growth and hence the lead time at which predictability is lost depends on factors such as the circulation regime, season, and geographical domain. It is possible to obtain information on the inherent predictability of a given case by running the model from a number of initial conditions that lie within the estimated cloud of uncertainty that surrounds the control analysis (which is our best estimate of the true state of the atmosphere).

In the extratropics numerical weather prediction models are good enough so that to a first approximation forecast error growth can be attributed solely to the process of growing instabilities in a chaotic system that result from initial condition uncertainties. In our current ensemble approach, therefore, we assume that our models are "perfect" and introduce perturbations only to the analysis (rather than or in addition to, for example, perturbations in model physics).

The above described role of chaos is evident on all spatial and temporal scales in the atmosphere and ocean. So, ideally, all forecasts should be made from an ensemble of initial conditions. In fact in any nonlinear dynamical system this approach offers the best possible forecast with the maximum information content. Averaging the ensemble members provides in a statistical sense a forecast more reliable than any of the single forecasts, including that started from the control analysis (Leith, 1974). Additionally, from the spread of the ensemble we can assess the reliability of the predictions and, for a sufficiently large number of realizations, any forecast quantity can be expressed in terms of probabilities. These probabilities convey all the information available regarding future weather. Note that each individual model run is deterministic, i.e., uniquely defined by the initial conditions, but collectively the ensemble of forecasts from the set of slightly different analyses portrays the chaotic nature of the atmosphere. Since a priori any single ensemble member is no more or less likely than any other, forecasting can be viewed as deterministic only to the extent that the envelope of solutions is sufficiently small that the differences between forecasts are inconsequential to the user. Otherwise, the variety of solutions and the implied uncertainty reflect the more general stochastic nature of the forecast process, with the range and distribution of possible outcomes providing information on the relative likelihood of various scenarios.

At NCEP the ensemble approach has been applied operationally for the medium- and extended range (Tracton and Kalnay, 1993; Toth and Kalnay, 1993), using the Environmental Modeling Center's (EMC) Medium-Range Forecast Model (MRF), which has a global domain. This document contains information regarding the operational global ensemble forecasts. However, short-range ensemble forecasts are also created at NCEP on an experimental basis, using the ETA and Regional Spectral Models (Brooks et al., 1995; Hamill and Collucci, 1996). Planning is also underway to run the coupled ocean- atmosphere model of EMC in an ensemble mode (Toth and Kalnay, 1995).

## THE ENSEMBLE FORECAST SYSTEM:

### INITIAL PERTURBATIONS

#### There has been considerable effort directed at the question of optimal initial perturbations. There are two major considerations. The first is to estimate the analysis error in the probabilistic sense, and the second is to create an adequate sampling of perturbations given this statistical estimate of initial uncertainty. Since not all errors in the analysis are likely to grow, adequate here encompasses not just the question of representativeness. It includes also the notion of economy in identifying only those initial uncertainties that result in rapidly diverging solutions (e.g., errors in analysis of a baroclinic zone versus those in a broad ridge).

If the analysis error had a white noise distribution (i. e., all possible analysis errors occurred with the same probability), the best sampling strategy would be the use of the singular vectors (SVs) of the linear tangent version of the nonlinear model (Buizza and Palmer, 199*; Ehrendorfer and Tribbia, 1995). This is because the leading SVs span those directions in the phase space that are capable of maximum error growth. If we miss those directions, "truth" could lie outside the ensemble envelope.

The analysis error distribution, however, is far from being white noise (Kalnay and Toth, 1994): Consider the analysis/forecast cycle of the data assimilation system as running a nonlinear perturbation model. The error in the first guess (short-range forecast) is the perturbation which is periodically "rescaled" at each analysis time by blending observations with the guess. Since observations are generally sparse they can not eliminate all errors from the short-range forecast that is subsequently generated as the first guess for the next analysis. Obviously, any error that grew in the previous short-range forecast will have a larger chance of remaining (at least partially) in the latest analysis than errors that had decayed. These growing errors will then start amplifying quickly again in the next short-range forecast.

It follows that the analysis contains fast growing errors that are dynamically created by the repetitive use of the model to create the first guess fields. This is what we refer to as the "breeding cycle" or Breeding of Growing Modes (BGM). These fast growing errors are above and beyond the traditionally recognized random errors that result from errors in observations. Those errors generally do not grow rapidly since they are not organized dynamically. It turns out that the growing errors in the analysis are related to the local Lyapunov vectors of the atmosphere (which are mathematical phase space directions that can grow fastest in a sustainable manner). Indeed, these vectors are what is estimated by the breeding method (Toth and Kalnay, 1993, 1995).

At NCEP we use 7 independent breeding cycles to generate the 14 initial ensemble perturbations. The initiation of each breeding cycle begins with an analysis/forecast cycle which differs from the others only in the initially prescribed random distribution ("seed") of analysis errors. These initially random perturbations are added and subtracted from the control analysis, so that each breeding cycle generates a pair of perturbed analyses (14 in all). From this point on each breeding cycle evolves independently to produce its own set of perturbations. The perturbations are just the differences between the short-term forecast (24 hour) initiated from the last perturbed analysis and the "control" analysis, rescaled to the magnitude of the seed perturbation. Since these short-term forecasts are just the early part of the extended range ensemble predictions, generation of the perturbations is basically cost free with respect to the analysis system (unlike the singular vector approach of ECMWF). The cycling of the perturbations continues and within a few days the perturbations reach their maximum growth.

Note that once the initial perturbations are introduced, the perturbation patterns evolve freely in the breeding cycle except that their size is kept within a certain amplitude range. Also note the similarity in the manner errors grow in the analysis vs. breeding cycles. The only difference is that from the breeding cycle, the stochastic elements that are introduced into the analysis through the use of observations containing random noise are eliminated by the use of deterministic rescaling. The seven quasi-orthogonal bred vectors from the breeding cycles span a subspace of the atmospheric attractor that represents the highest sustainable growth in the modeled atmosphere, at the given perturbation amplitude.

The breeding method has one free parameter, which is perturbation amplitude. We use a perturbation amplitude which is on average on the order of 12% of the total climatological rms variance (~10 m at 500 hPa height). The sensitivity to the choice of this amplitude (for example as a function of season) is under investigation. Regarding the spatial distribution of estimated analysis errors, we use a geographical mask (Toth and Kalnay, 1995) to which perturbations are rescaled every day. As a result, in data void regions such as the ocean basins the perturbations are three times or so larger than over data rich continents.

Finally, keep in mind that there is no guarantee that the above methodology "finds" all the possible growing modes or, equivalently, the ensemble will reliably encompass all possible outcomes in every situation: we cannot run enough perturbed forecasts (with, for example, different initial perturbation sizes) to populate the whole forecast distribution all the time. Moreover, remember that the forecast model is not "perfect", and model error, as well as initial condition uncertainty, will contribute to the distribution of predictions within the ensemble (especially systematic errors which may drive all the solutions in the same - wrong - direction). Overall, however, verifications indicate that the ensemble system as now constructed does provide enhanced skill through ensemble averaging and usefully reliable probability estimates.

## ENSEMBLE PRODUCTS - GENERIC DESCRIPTION

### 3) Clustering:

#### Clustering here refers to grouping together ensemble members that are similar in some respect. The approach used here is based on simple correlation analysis. First, the two predictions least similar (smallest anomaly correlation < 0.6) are determined. Ensemble members similar (AC>0.6) to each of these extremes (if any) are found, and the cluster mean for each formed. Unless no two forecasts are dissimilar there will always be at least two clusters (C1, C2), possibly consisting of only one forecast each, which correspond to the range of solutions sampled by the ensemble. Second, of the remaining forecasts, the two most similar are found, and members similar to them grouped (averaged) to form the next cluster. The process iterates until there is no longer any set of at least two forecasts that are similar (maximum of 6 allowed). The cluster means effectively reduce the number of degrees of freedom relative to considering the complete set of individual forecasts, but not so much as the full ensemble mean (except if all the forecasts are alike). Ideally, and for the most part as indicated by verification scores, the more populated a cluster the more skillful the cluster mean.

Currently, the only two fields clustered are the 1000 and 500 mb height fields. Clustering is relative to the similarities amongst forecasts averaged over North American and its environs. The specifics of the domain are given in the detailed documentation. In the future we expect to perform the clustering for smaller sub regions and for other fields (ultimately, we hope, to be selected interactively by the forecaster). Also at present the clustering is done independently by level and time, so that the cluster membership at a given time may be different for 1000 and 500 mb and for either level different from one forecast time to the next. An alternative approach (available in the near future) is to force the cluster membership for all times and for each level to that determined for 500mb at day 5. While this will allow tracing the evolution of each ensemble over time with assurance that membership is unchanging, it may or may not be a better approach; forecasts similar at, for example day 3, may quite naturally diverge and be similar to members by day 6 they had little in common with earlier. (Clustering relative to mean conditions, e.g., days 3-5, have not proved satisfactory at ECMWF.)

### 4) "Spaghetti" diagrams:

#### These are simply composite charts of a selected contour (e.g., 5400 Z) from each ensemble member plotted on the same chart. The obvious purpose is to convey the information content of EACH ensemble member in a sufficiently compact form to enable ready visualization and interpretation. These charts show explicitly the evolution of almost identical solutions initially to virtual "spaghetti" after some time. In the process they provide information on the relative predictability as a function of forecast lead time and space (high where and when solutions are close and visa versa). Note too this is also a form of "graphical clustering" in that one can visually weigh the non-uniform distribution of solutions (if any) and thereby judge the relative likelihood of specific outcomes in terms of the number of forecasts pointing in that direction.

In addition to contour plots for 500 mb height, charts for various other parameters (at varying intervals) are available, some for very specific usage. For example, the spaghetti diagrams for 1000-500mb thickness and for 850mb temperature are intended primarily for assessing the uncertainty in predicting the boundary between frozen and non-frozen precipitation. The contour plots for SLP relate to the position and (w.r.t. choice of contour value) the intensity of high and low pressure systems. An adjunct to these displays are charts which depict just the positions of the "L" and "H" centers, from which one can follow over time the extent of disagreement in the occurrence and tracks of developing systems (e.g., east coastal storms). Isotach composite charts convey information about jet systems and relative humidity diagrams about the potential for precipitation. Plots with actual isohyets (QPF) relate more directly to this problem and will be added as soon as possible.

The above list is clearly not exhaustive nor possibly designed and presented in an optimum way; again, feedback from users is most welcome!

### 5) Probability forecasts:

#### The products described thus far lend themselves primarily to qualitative statements about the relative likelihood of different outcomes. An integral aspect of ensemble forecasting, though, is that it also provides quantitative estimates of probabilities. Probability estimates here are defined simply as the percentage of predictions out of the total (17) that satisfy the specified criterion. Probability charts for the 500mb height exceeding, for example, 5580m define quantitatively the probability envelope about the corresponding spaghetti diagram. Beyond the envelope the interpretation is that there is near certainty that the actual height will be either less or greater than 5558. The varying width of the envelope over time and space domains conveys quantitatively the degree of uncertainty. The same interpretation applies to the spaghetti plots for thickness and 850 temperature less than 5400m and 0-deg C, respectively, with regard to the zone of uncertainty in the rain versus snow problem (these, as with many criteria, are somewhat arbitrary or skewed toward our east coast bias - but a sufficient amount of yelling and screaming from elsewhere on this choice will be heard!).

Other probability products available include that for the thickness anomaly greater and less than 60m (about 3-deg C in layer-mean temp) as a guide for appreciating the confidence in predicted anomalously cold and warm regions. The probability of 700mb RH > 70% can be considered a proxy for precipitation chances, which will be added explicitly in the near future. The predicted odds for the 12hr 500 height tendency > 30m chart is an indicator of the degree of consistency amongst the members in depicting short-wave activity; consistency in treatment of the smaller-scale systems (because they are intrinsically less predictable) is a signal to take the implications seriously. In principle probability information on any direct or derived model parameter can be output - so let your requests come forward (how about vertical stability or indices such as the PNA?). Another form of probability chart, currently being developed, is to express the confidence in percentages of, for example, the 850mb temperature anomaly being within 5-deg of the ensemble mean value (or any other base for comparison).

A final few words on ensemble derived probabilities. Many forecasters have come to know and use (and love?) probability statements for precipitation and temperature. These generally are the direct or somewhat modified POP's and POT's generated statistically by TDL via the MOS (or similar) approach. They basically describe the probability distributions given the parameters (or, more generally, the synoptic features) from a single (i.e., deterministic) prediction (NGM or MRF). The actual uncertainty consists of two components; that associated with the non-unique distribution of precipitation or temperature given a particular synoptic scenario AND that intrinsic to there being an array of alternative scenarios. MOS accounts now only for the first component while the probabilities derived from the direct model output (as described above) from ensembles include both. Of course, one could derive and combine precipitation and temperature probability distributions from each ensemble member, and TDL is actively pursuing this approach.