Zoltan Toth(1), Eugenia Kalnay, Steven M. Tracton,

Richard Wobus(1) and Joseph Irwin

National Centers for Environmental Prediction

Submitted to Weather and Forecasting

Revised version (August 29, 1996)

(1) General Sciences Corporation (Laurel, MD) at NCEP

Corresponding author's address: Zoltan Toth, NCEP, Environmental Modeling Center, 5200 Auth Rd., Room 204, Camp Springs, MD 20746. e-mail: wd20zt@sun1.wwb.noaa.gov


Ensemble forecasting has been operational at NCEP (formerly NMC) since December 1992. In March 1994, more ensemble forecast members were added. In the new configuration, 17 forecasts with the NCEP global model are run every day, out to 16 days lead time. Beyond the 3 control forecasts (a T126 and a T62 resolution control at 00UTC and a T126 control at 12UTC), 14 perturbed forecasts are made at the reduced T62 resolution. Global products from the ensemble forecasts are available from NCEP via anonymous ftp.

The initial perturbation vectors are derived from seven independent breeding cycles, where the fast growing nonlinear perturbations grow freely, apart from the periodic rescaling that keeps their magnitude compatible with the estimated uncertainty within the control analysis. The breeding process is an integral part of the extended-range forecasts, and the generation of the initial perturbations for the ensemble is done at no computational cost beyond that of running the forecasts.

A number of graphical forecast products derived from the ensemble are available to the users, including forecasters at the Hydrometeorological Prediction Center and the Climate Prediction Center of NCEP. The products include the ensemble and cluster means, standard deviations, probabilities of different events. One of the most widely used products is the "spaghetti" diagram where a single map contains all 17 ensemble forecasts, as depicted by a selected contour level of a field, e.g. 5520 m at 500 hPa height or 50 m/s wind speed at the jet level.

With the aid of the above graphical displays and also by objective verification we have established that the ensemble can provide valuable information for both the short and extended range. In particular, the ensemble can indicate potential problems with the high resolution control that occur on rare occasions in the short range. Most of the time, the "cloud" of the ensemble encompasses the verification, thus providing a set of alternate possible scenarios beyond that of the control. Moreover, the ensemble provides a more consistent outlook for the future. While consecutive control forecasts verifying on a particular date may often display large "jumps" from one day to the next, the ensemble changes much less, and its envelope of solutions typically remains unchanged. In addition, the ensemble extends the practical limit of weather forecasting by about a day. For example, significant new weather systems (blocking, extratropical cyclones, etc.) are usually detected by some ensemble members a day earlier than by the high resolution control. Similarly, the ensemble mean improves forecast skill by a day or more in the medium to extended range, with respect to the skill of the control. The ensemble is also useful in pointing out areas and times where the spread within the ensemble is high and consequently low skill can be expected, and conversely, those cases in which forecasters can make a confident extended range forecast because the low ensemble spread indicates high predictability. Another possible application of the ensemble is identifying potential model errors. A case of low ensemble spread with all forecasts verifying poorly may be an indication of model bias. The advantage of the ensemble approach is that it can potentially indicate a systematic bias even for a single case while studies using only a control forecast need to average many cases.


Since the advent of weather forecasting it has been evident to both the forecasters and the public that the forecasts have only a limited success, i.e. their skill drops off with lead time and varies from case to case. Though it was clear that often there was not enough data to adequately estimate the initial state of the atmosphere, much of the blame for forecast failures was attributed to the numerical weather prediction (NWP) models and methods for data assimilation. This resulted in a successful effort to improve these forecast tools.

Model errors, however, are not the only source of errors in the forecasts. Recent studies (e.g., Reynolds et al., 1994; Zhu et al., 1996) indicate that most of the synoptic scale errors in the extratropics in global NWP models are not due primarely to model deficiencies. This observation has important consequences for medium- and extended-range weather forecasting. Even though model errors can occasionally play a role, we have to look for other sources of errors on these time scales.

The work of Lorenz (1963, 1965, 1969) and others have provided an understanding of what can make forecasts inaccurate. The atmosphere is a chaotic system, so even if a perfect atmospheric model is used, the smallest errors in the initial conditions can make the forecasts so inaccurate as to be useless within two weeks or so. This is because instabilities in the atmosphere will amplify even infinitesimally small initial differences between two model runs (or between a "perfect" model run and the true evolution of the atmosphere) until the two forecast states are as far apart as two randomly chosen states in the atmosphere.

In the early years of NWP, forecast errors due to simplified model formulations dominated the total error growth. The traditional perception that forecast errors are primarily due to model errors dates back to those early years. By now, however, models have become much more sophisticated and it is the errors that arise due to instabilities in the atmosphere (even in case of small initial errors) that dominate forecast errors. The recognition of this situation requires a major shift in the perception of NWP. The models and the data assimilation can still be improved but that alone will not solve all problems. The forecasts, no matter how good our models may become, will always fail due to the inherent finite time limit of predictability of the atmosphere (which may become a few days longer as atmospheric analyses and models are improved, see Lorenz, 1989.) However, with ensemble forecasting we can learn a lot about when, where and how the skill is lost in the forecasts. Since the rate at which forecasts lose their skill varies substantially from case to case, this is very important information that can expand significantly the usefulness of the forecasts.

Though there have been other attempts at predicting forecast skill based on the large scale or persistence characteristics of the flow (see, e. g., Wobus and Kalnay, 1995), ensemble forecasting, is the most general and practical way to learn about the skill of the forecasts in advance. Ensemble forecasting was first introduced into meteorology by Leith (1974) and Epstein (1969). It consists of (1) estimating the probability distribution of the true state of the atmosphere around the control analysis, then (2) sampling that probability distribution, and (3) running the forecast model from all the sample points (perturbed analyses). If everything is done right, we end up with a number of possible forecast scenarios that are equally likely (except that the control forecast, because it starts from our best analysis, will represent developments that are somewhat more likely, see Zhu et al., 1996) and that will encompass the true state of the atmosphere. If, on a particular day, the ensemble members have a large spread, let's say, at 3 or 5-day lead time, this should indicate that the forecasts are less reliable, and hence the forecasts should be worded differently or even alternate scenarios should be mentioned.

The ensemble strategy will work only if the models are good enough that model-related errors do not dominate the final error fields. This was not so in the 1960's and 70's but today's sophisticated models, and the availability of enough computer power, made it practical and also possible to run multiple forecasts. The first two centers that entered this new area of NWP were the National Centers for Environmental Prediction (NCEP, formerly NMC; Tracton and Kalnay, 1993) and ECMWF (Buizza et al., 1993; Molteni et al., 1996). The two centers use almost the same horizontal model resolution for the perturbed ensemble forecasts; however, the estimation (and sampling) of the initial probability distribution of the atmosphere is different. At ECMWF, the singular vector (SV) approach is used (Buizza and Palmer, 1995), in which the fastest-growing perturbations are determined for a 2-day period at the beginning of the forecast. At NCEP, the breeding method is used (Toth and Kalnay, 1993, 1996a, hereafter referred to as TK93, TK96). In this technique (see next section) perturbations that grew very fast during the analysis cycles leading to the initial time, and thus are likely to dominate analysis errors, are used as initial ensemble perturbations.

After NCEP and ECMWF started ensemble forecasting at the end of 1992, other centers have also considered implementing it as well. At the US Navy's Fleet Numerical Meteorological and Oceanographic Center, an ensemble forecast system based on the breeding method has been quasi-operational since 1995 (M-A. Rennick, personal communication, 1995). At the South African Weather Service, an ensemble system using the breeding method has been implemented quasi-operationally (Tennant, 1996, personal communication). At the Canadian Meteorological Centre, an approach based on multiple data assimilation cycles, an extension of the breeding method, is being prepared for quasi-operational implementation in 1996 (Houtekamer et al., 1996). Work is also under way at the National Center for Medium Range Weather Forecasting in India to implement a breeding based ensemble system (Iyengar, 1996, personal communication). At the UK Meteorological Office, ensemble experiments have been performed by introducing ECMWF initial perturbations on either the UK Meteorological Office or ECMWF analysis (Harrison et al., 1995). At the Japan Meteorological Agency, an SV-based ensemble is run for monthly prediction while the application of a breeding-based ensemble is considered for the short to medium range (Takano, 1996, personal communication.)

In section 2 we describe the implementation of the breeding method at NCEP using as an example the Blizzard of 1993 (Caplan, 1995). A short description of the post-processed information that is made available at NCEP and is distributed through anonymous ftp every day to the NWS forecasters and to the general user community is given in section 3. Next we present several synoptic examples to illustrate the benefits of using the ensemble forecasts instead of relying only on a control forecast (section 4). Section 5 is a short discussion of questions and plans for the use of the NCEP ensemble.


Since March 1994, the NCEP ensemble has consisted of 17 forecasts every day, including at 00UTC a T126 (MRF) and a T62 (started from truncated T126 analysis) resolution control forecast, and at 12UTC a T126 control (AVN) forecast (see Fig. 1). Note that the high resolution control forecasts (MRF and AVN) are truncated to T62 after 7 and 3 days respectively, to save computer time, since Tracton and Kalnay (1993) showed that most of the advantages of high resolution are derived from just the first few days of the forecast. The rest of the ensemble consists of 10 perturbed forecasts at 00UTC and 4 at 12UTC, all done at a T62 resolution.

From theoretical considerations (Leith, 1974) we know that the initial perturbations should be constructed to be representative of possible analysis errors. Since forecast errors grow fast from the beginning (e. g., Lorenz, 1989), analysis fields must contain fast growing errors. These errors get into the analysis through a dynamical process, due to the use of model-generated first guess fields (TK93). This is because the analysis in data void regions relies heavily on the first guess and therefore contains errors similar to short-range forecast errors. Beyond these dynamically conditioned, fast growing errors, there are random errors that are a result of stochastic errors in observations and/or in analysis procedures. The random errors, however, generally will not dominate forecast error, since they typically do not grow (and initially may even decay) until they become dynamically organized into growing patterns a few days later.

In the breeding method we attempt to capture the fastest-growing analysis errors that are most likely to be responsible for the error in the control forecast. Therefore the breeding cycle intends to mimic how the fastest growing errors in short range forecasts develop. In the operational ensemble system, the difference field between a positively and negatively perturbed 24-hour forecast from the previous day, valid at the time of the latest analysis, is used as an initial perturbation for today's ensemble (Fig. 2). Before adding (and subtracting) that perturbation onto the latest control analysis, it is first "scaled down" to a perturbation amplitude that is representative of actual analysis errors in a given season (estimated as differences between independent analyses valid at the same time, see Kalnay and Toth, 1994; Iyengar et al., 1996). So, for example, initial perturbations over the Pacific ocean where rawinsonde observations are sparse, have amplitudes three times larger than over the continental US, which is relatively well observed. Since the same perturbations are added with both positive and negative sign onto the control analysis, one breeding cycle (Fig. 2) generates perturbations for a pair of ensemble forecasts. Operationally, there are 7 independently run breeding cycles that differ only in that the very first initial perturbations (used to make "cold" starts) in each cycle are independent.

To give a hypothetical example, let us assume that the 500 hPa height analysis on a particular day at Seattle, WA is 5640 m and that the positively and negatively perturbed 24-hour forecasts valid at the same time have values of 5660 and 5630 m, respectively, and that the initial perturbation size is 10 m. The rescaling factor for Seattle will then be . The new initial ensemble perturbation will be the difference between the two 24-hour forecasts from the previous day, multiplied by R (which is kept the same for all model variables). This perturbation field will then be both added to and subtracted from the latest control analysis, so the perturbed 500 hPa height initial conditions at Seattle will be 5650 and 5630 m, respectively. (In practice, instead of height, kinetic energy is used as a rescaling variable.)

Since the bred perturbations are computed from the differences of the previous day's ensemble forecasts (at one day lead time), breeding is done globally at no extra computational cost. Over the course of 3-4 days after initiating the ensemble, the perturbations converge to a subspace of perturbations that grow fastest in a sustainable manner, which provides a good estimate of the subspace of theoretically optimal ensemble perturbations (Toth et al., 1996.) The bred perturbations are closely related to the leading Lyapunov vectors of the atmosphere, which represent the fastest possible linear growth on the attractor (Legras and Vautard, 1995); they can be considered as a nonlinear extension of the linear Lyapunov vectors. For further discussion of the perturbation methodology, see TK96.

In regions where fast developments (e. g., cyclogenesis) take place and instability is high, the perturbations from different breeding cycles are typically similar to each other. As the cyclone decays, the perturbations in the region considered become less correlated again in the different breeding cycles, until another system with strong instabilities develops. We will document this behavior through the example of the "storm of the century", a massive and very fast-developing storm that hit much of the eastern US in March 1993. In Fig. 3 (and Fig 3x) we present a series of 500 hPa streamfunction fields, leading to the development of the storm. One can see that there are only relatively small changes in the large scale flow configuration and with conventional analysis techniques no precursor of the cyclone can be seen earlier than 24-36 hrs before it actually appears on the 13th of March.

In Figs. 4a (and 4a2) and 4b (and 4b2 ), we show the corresponding bred perturbations from two independently run breeding cycles (differences between control and perturbed nonlinear forecasts). We can see in Fig. 4a that there is a packet of instability (larger amplitude perturbations with a negative-positive-negative wave triplet) on the 8th of March in the eastern part of the Pacific (top panel, a selected contour is highlighted). This wave packet then travels primarily to the east and by 12th of March (5th panel from top) it appears over the western Atlantic. The speed of the packet of instability is larger than the actual wind speed. In fact, if we follow the evolution of the perturbations in time we can notice that there is a very strong downstream development (and upstream decay) that makes the fast speed possible. The same can be seen in Fig. 4b.

One can also notice in Figs 4a and 4b that there is a second surge of perturbation energy that reaches the eastern half of the Pacific on the 10th of March. This wave packet, however, does not follow the previous one that traveled to the east, but rather takes a more southerly course into the Gulf of Mexico, where the actual storm was formed. In Fig. 5 we zoom into the perturbations in time and space, just before the development of the cyclone. As we can see, the perturbation amplitude more than tripled in a 12-hour period ending 06Z on the 13th of March in the Gulf of Mexico, where the storm actually developed. This shows that the bred perturbations can grow extremely fast in the presence of strong instability. In addition, these perturbations are often related to actual forecast errors. In this case, for example, MRF control forecasts verified on March 13, 12UTC with a lead time of 12, 36, ..., 132 hours (not shown) all had a similar error pattern with a maximum near the location of the negative perturbation center in the Gulf of Mexico in Fig. 5 (12UTC panel). It is important to note that the same error was found even at 12 hour lead time (Fig. 6), suggesting that a 6-hour forecast valid at the same time, used in the analysis as a first guess, would also have a similar error pattern. And since in that area there are not enough observations to "correct" the first guess error, the analysis at 12UTC should have had a similar error pattern.

The fact that the bred perturbations may appear as errors in the first guess and the analysis gives the justification for their use in ensemble forecasting. A successful ensemble scheme requires perturbations that are possible analysis errors. An objective comparison of the bred perturbations and analysis/first guess fields confirms the presence of bred perturbations within the analysis errors (Kalnay and Toth, 1994).


An ensemble of 17 forecasts out to 16 days is a lot of data to examine and to archive. To reduce the volume, each day at 00UTC and 12UTC we create files that contain forecast fields (all lead times and all forecast runs) for 17 different selected variables. The variables (Table 1) were chosen with different forecast applications in mind, including extratropical prediction, aviation forecasts, tropical predictions, hurricane forecasting, marine prediction, etc. The data are available in grib format at an anonymous ftp site (nic.fb4.noaa.gov, /pub/ens).

In addition to these gridded fields, postprocessed graphical information is also provided in GEMPAK metafile format (desJardins et al., 1995) to the general user community, along with a user's manual (nic.fb4.noaa.gov, /pub/nadata/meta/model/ens). More recently, some charts are also available on the EMC experimental web page (http://sgi62.wwb.noaa.gov:8080/ens/enshome.html). These same products are used in the operational practice at the Hydrometeorological Prediction Center (HPC) and the Climate Prediction Center (CPC) of NCEP for the preparation of medium- and extended-range forecast guidance. Of all the graphical products available to the forecasters, the most heavily used is the "spaghetti" diagram (Fig. 7), where a single selected contour of a variable is plotted for each ensemble member. This, in fact, is probably the simplest way one can display information directly from all individual forecasts on one panel. The advantage, from a synoptic point of view, is that the position of smaller scale features can be seen in each ensemble run. In the example of Fig. 7, we can see a large uncertainty at 4.5 day lead time regarding the position and amplitude of a trough over the eastern part of the US. As one can see, the control forecast is not necessarily in the middle of the pack; in this case it was on the side of the distribution that did not verify well. The most frequently used variable is the 500 hPa height, but other fields such as 850 hPa temperature, 1000/500 hPa thickness, mean sea level pressure or wind speeds at 850 and 250 hPa heights are also used. One should be aware that in areas of small gradients (like over a flat ridge) large differences among the contour lines do not necessarily mean substantial differences among the solutions. To avoid such problems we find it best if the spaghetti diagram is viewed alongside a full field map for the control forecast (or ensemble mean).

The ensemble mean field is also available to the forecasters. This field offers a forecast that is, on average, better than the control (or any other ensemble member) forecast in terms of RMS errors. In the operational medium-range practice at HPC (Danaher, 1996), the forecasters use the ensemble to adjust the amplitude and speed of smaller-scale waves toward the median of the ensemble. At longer lead times, the small-scale features whose positions are uncertain (but which nevertheless will most likely still appear somewhere in the verification) are naturally filtered out from the ensemble mean, which is used as a guidance most for the extended range (days 6-8 and beyond).

For the medium range, subsets of the ensemble (clusters, that are formed of ensemble members with similar solutions based on an objective algorithm; see Tracton and Kalnay, 1993) are more applicable than the grand ensemble mean (Fig. 8). The two leading clusters shown in the example of Fig. 8 clearly identify two distinct scenarios. Note that in this case none of the three controls was classified in cluster 1, which verified very well over eastern US. The spread of the ensemble around the ensemble mean (Fig. 9) also indicated to the forecasters the large uncertainty in the position of the trough over the Eastern US. Such spread charts can also be normalized by the climatological variance or spread averaged over the preceeding months (not shown), to highlight uncertainty with respect to background expectations.

The forecasters are traditionally used to considering (and selecting among) different forecast scenarios, offered by the control forecasts of different meteorological centers (such as ECMWF or the UK Meteorological Office). This process is now assisted by the ensemble: if a particular scenario is supported by an ensemble cluster, it will be considered with more care. And though the current worded forecasts do not explicitely address the possibility of alternate scenarios, based on the ensemble the forecasters, in their internal products, do acknowledge the possibility of weather events that are less likely than that suggested by the official forecast.

As a tool for developing new, non-traditional forecast products that are more consistent with the amount of uncertainty present in medium- and extended-range forecasts, the ensemble offers another important guidance, the map of ensemble-based probability. For example, the probability that the 850 hPa temperature will be below (or above) a certain contour level can be displayed in a map format, by counting how many of the 17-member ensemble runs had such a temperature (not shown). Similarly, the ensemble-based probability that the 500/1000 hPa thickness anomaly will be above 60 m is displayed (Fig. 10). For the selected case, high probabilities are given for the Western part of the country and low over the Northeast but in between the probability values reflect the uncertainty in the position of the trough over the Eastern US. Height tendencies can also be viewed in a probabilistic manner (Fig. 11), indicating a strong likelihood of height falls around the Ohio valley. Though the probabilistic forecasts shown above, derived directly from the ensemble, are generally overconfident, they can be easily calibrated, resulting in probabilistic forecasts with excellent reliability, at all (short- through extended-range) lead times (Zhu et al., 1996.)

The set of products available from NCEP keeps growing through the interaction with the forecasters: Recently we added maps in which the high and low surface pressure centers are marked for each ensemble member. In addition, probabilistic quantitative precipitation forecasts, probabilistic "meteograms" (weather parameters at certain locations), and other products will also become available in the near future.


In this section, we show examples of the everyday use of the ensemble, pointing out the potential benefits the ensemble can offer (in addition to using only a control forecast), through a few selected synoptic examples. Perhaps the most important benefit the ensemble can offer is that it helps to distinguish between cases when a meaningful forecast can or cannot be made with confidence (see, e. g., TK96.) This becomes obvious if we compare, for example, Fig. 12 and Fig. 13a. These two 10.5 day forecasts are from cases only three weeks apart, but their skill levels are very different. The case from April 25 (low skill, 0.129 pattern anomaly correlation for the MRF control) is a good example of what chaos can produce due to high levels of instability, while the April 3 (high skill) ensemble is much more orderly, suggesting to the forecaster a relatively high degree of predictability even at this extended lead time. A useful relationship between low spread and high skill (and vice versa) in the temporal domain has been found both in experimental and operational forecasts (Toth and Kalnay, 1996, 1995.) The ensemble will be heavily used in the preparation of the 8-14 days HPC forecast guidance that will have in a new, probabilistic format (O'Lenic, 1995, personal communication.) In cases where the ensemble has a large spread and consequently suggests a lack of confidence in the forecasts, climatological forecast probabilities will be provided in lieu of a real forecast.

Note also that in both cases the verification line, for most of the domain, lies within the cloud of the ensemble. This is true in general: at all lead times, only in about 15% of the cases does the ensemble fail to give reliable information in this respect (Zhu et al., 1996), so that the forecaster can be fairly confident that the verification will fall within (or close to) the envelope of the ensemble. With initial perturbation amplitudes that are representative of analysis uncertainties, our ensemble encompasses truth most of the time, confirming our earlier assertion that the synoptic scale extratropical forecast errors are due primarily to initial value uncertainty and not to model deficiencies. This point is further corroborated by the subjective observation that in most cases when the ECMWF operational control forecast and the NCEP control have largely different solutions, the NCEP ensemble encompasses the ECMWF control as well. In the example of Fig. 14, the ECMWF control forecast had a trough over the West Coast of the US that was positioned substantially further west with a different tilt than that of the MRF control. The NCEP ensemble, however, incorporated not only the ECMWF control but also the verification line.

The fact that the verification typically falls within the ensemble also implies that the ensemble can be used to indicate the level of skill not only in the temporal but also in the spatial domain (see also Toth and Kalnay, 1996.) In Fig. 13a, for example, one can see that in areas of small spread (West Pacific, mid-Atlantic) the control forecast (and most of the perturbed forecasts) is almost perfect while in areas of large disagreement (e. g., East Pacific) the control forecast can have larger errors.

Another feature of the ensemble welcomed by synopticians, and which can have great impact on how forecasts are made, is the fact that the ensemble provides consistent information in time. If we compare today's ensemble valid for the weekend and then compare it with yesterday's ensemble valid at the same time, chances are we will find that they are quite similar, even if the corresponding control forecasts were very different (Zhu et al., 1996.) This is natural since the esnemble can provide a probability distribution of possible forecast outcomes (while a control forecast basically represents a single scenario out of that). Typically, the spread in consecutive ensemble forecasts valid at the same time will be similar and so will the distribution of different possible flow scenarios, though their respective forecast probabilities may change from one day to the next. Fig. 13a (and Figs. 13b and 13c) give an example of time consistency in the ensemble, showing 10.5, 6.5 and 3.5 days long ensemble forecasts that are remarkably similar, despite substantial shifts in the behavior of the control forecast. The change in the ensemble with decreasing lead time is primarily a reduction of the spread and not a shift in the position of the envelope of the ensemble.

Since temporal consistency in the forecasts is important to weather forecasters, we present another example where the control forecasts from consecutive days suggested largely different scenarios over the east Pacific and western US (Fig. 15). The ensemble in this critical situation, again, displayed a much desired consistency in time. Note that when the controls lie north of the verification (Fig. 15a, 9.5 days lead time) the ensemble indicates that a deep trough in the SW is a possibility, whereas next day (Fig. 15b, 8.5 days lead time), when the controls take on this solution, much of the ensemble lies NE of the control, with the envelop basically unchanged. If the spread is large in the ensemble (as in our example here) the ensemble, of course, does not tell the forecaster which is the solution that will verify best. But at least it gives a range of solutions that will most likely encompass truth. Situations like this, indeed, call for some kind of probabilistic approach in the forecast presentation, a goal that the ensemble can greatly help to achieve.

Looking at our last example (Fig. 15b) one may notice that in areas where most ensemble members were on one side of the control (e.g., NE of the control over SW US or south of it over the Eastern US) the verification also tends to lie on the same side, i. e., on the side where the majority of the ensemble members are. So on average, adjusting the control somewhat into the direction of the bulk of the ensemble, a process often followed in the operational practice of HPC, is a worthwhile excersize. This is because, as mentioned earlier,, the mean or the median of the ensemble offers a statistically better forecast than the control (Zhu et al., 1996). There is a caveat there, though. As described in section 2, all perturbed forecasts within the ensemble are made at a resolution that is half of the resolution of the operational control forecasts (MRF or AVN). So in cases where the resolution may have a strong impact (like in certain synoptic situations around mountains) the low-resolution ensemble may be handicapped with respect to the MRF or AVN controls. In the example of Fig. 16, the T62 resolution control (dashed line) is dramatically different from the MRF scenario (dotted line) over the SW and there is only one ensemble member that has a solution similar to the MRF. Note also that the AVN control (from 1200 UTC the previous day, solid line closest to the MRF over the SW), that is run at the high T126 resolution only for the first three days (after which it is truncated to T62), is still similar to the MRF in the SW. This, and similar examples suggest that increasing the resolution of the ensemble for the first 3 days or so of the integration may enhance the utility of the ensemble in these situations.

As mentioned earlier and documented elsewhere (TK93, TK96) the ensemble mean, on average, verifies considerably better than the control forecast in the medium and extended ranges. This amounts to extending the limit of useful forecast skill by a day or so. The advantage of the ensemble in this respect is not restricted to the ensemble mean, though. In difficult forecast situations like the blocking example of Fig. 17 below, where the control forecast completely misses the truth, we have noticed that some ensemble members often suggest the right solution a day in advance of the control forecast. Some NWP models have become quite good in reproducing blocking frequencies in a climatological sense (Anderson, 1993; Brankovic and Molteni, 1996). The prediction of individual blocking events, though, is still a challenge. This is partly because blocking may occur through the nonlinear interaction of several dynamic developments that need to be analyzed and predicted fairly well. In the example of Fig. 17a, the 5.5 day lead time control missed the blocking in the NW Atlantic. There was one ensemble member, however, that indicated the possibility of this development. The control forecast initiated the next day (Fig. 17b) was quite accurate, as were many members of the ensemble. Due to the often highly nonlinear nature of blocking development a much larger ensemble would be needed to reliably indicate possible blocking development at longer lead times.

So far our examples came from the medium and extended range. But on occasion the global ensemble can give the forecaster crucial information on short range as well. In Fig. 18b we show an example where the ensemble indicated an unusually high uncertainty in the 36 hour forecast. The control forecast obviously did not capture the fast developing storm that appeared over Canada while some of the ensemble members were close to the verification. The same storm had a strong impact over the NE US two days later (Fig. 18c). The control gave a similarly poor forecast at this lead time, which is not surprising given its earlier failure. The ensemble captured not only the NE storm but also a second storm affecting the western part of Canada. Concerning the NE storm, note how strongly nonlinear the distribution of the ensemble members is with respect to the control. It should be remembered that at the initial time the perturbations are symmetrically distributed around the control (half on one and another half on the other side of it). At three and a half day lead time (Fig. 18c) all perturbations are south of the control, indicating that if there is an error in the control initial condition that will most likely result in a (possibly large) underforecast of the storm in question. Given the high degree of nonlinearity and the large uncertainty in the ensemble the forecaster could have included in his/her prediction the possibility of a strong storm development over the extreme NE. Another lesson from this example is that a disappointing failure of the control forecast should not necessarily be blamed on the model. Small initial uncertainties, in areas of large instabilities, can amplify even within a couple of days to differences within the ensemble that can explain why the forecast fails.

Earlier in this paper we emphasized that our global models have become reliable enough that the forecast quality generally does not hinge on model deficiencies. Nevertheless the models are still far from perfect. Interestingly, potential problems with the model can be identified through the use of the ensemble in a much more efficient way than using only a control forecast. Through the traditional approach, a large number of control forecasts issued in similar synoptic situations would be needed to identify systematic errors with the model. We need many cases because the "signal", i.e., systematic model error, is often smaller than the errors that are due to fast developing instabilities. Running an ensemble can actually cut down the cost of this process. If in areas of small ensemble spread we see that the verification lies far outside the ensemble cloud at short lead times, we can suspect that there may be a model problem. In these areas there is little sensitivity to initial conditions yet all forecasts fail. So it is probably not the analysis but rather the model that is responsible for these errors. A possible example for this situation can be found in Fig. 18c. Over the West Coast of America the control and all ensemble members are far away from the verification. Note that the error here does not seem to be connected to a developing synoptic feature like the two waves discussed earlier further in the east but rather appears as a larger scale bias. Ensembles started on adjacent days exhibit a very similar behavior over the same region, suggesting that it may be connected to a systematic error in the model, due to perhaps some inaccurate parametrization.

The ensemble approach can not only help to improve the model by identifying potential systematic errors but can also point to areas where additional observations may help to improve the analysis. In Fig. 18a we see a very short range ensemble forecast. When comparing the 12-hour lead time ensemble (Fig. 18a) to the 36 hour ensemble (Fig. 18b), it is obvious that the large uncertainty appearing at 36-hour lead time came from an area upstream, north of Lake Winnipeg. Fig. 18a actually suggests that the analysis at 12UTC 1995/03/14 would be quite uncertain in this area because the huge uncertainty in the first guess short range forecast. And since the ensemble forecasts indicate that at later lead times (see Fig. 18c) the forecast uncertainty becomes even greater, in a situation like this it would be ideal to send deployable observing platforms (such as manned or unmanned aircraft) to take additional observations in the area of high spread indicated in Fig. 18a. If such extra observations had been used for the 12UTC analysis, they would certainly have reduced the uncertainty in the initial condition, and the forecast started at 12UTC would become more reliable. The strategy described here is called "targeted observations" (see, e. g., Snyder, 1996) and it has the advantage that it can pinpoint areas where extra observations can make a critical contribution to increasing the skill of our forecasts (Kalnay et al, 1996).


Most of our discussion in this paper has focused on medium-range predictions. We argued that NWP forecasts on synoptic scales are good enough that the benefit of an esnemble approach is unquestionable. The concept of ensemble forecasting, however, applies to any temporal and spatial scale. If the model used has major deficiencies, however, a careful compromise has to be made considering the benefits of running a more sophisticated (and hence more computer time costly) model that offers a better single solution and that of computing an ensemble of solutions. At NCEP, for example, a lower resolution (T62) global ensemble accompanies a higher resolution (T126) control forecast (Tracton and Kalnay, 1993). At NCEP, experiments are also going on in the area of regional ensemble forecasting (Brooks et al., 1995). A successful short range ensemble would be an essential component in the National Weather Service's efforts toward initiating probabilistic quantitative precipitation forecasting. Plans are also prepared for running the coupled ocean-atmosphere model in an ensemble mode, using the breeding method (Toth and Kalnay, 1996b), for making the probabilistic seasonal forecasts more accurate and reliable.

Regarding the use of the NCEP global ensemble, forecasters at the Hydrometeorological Prediction Center, the Climate Prediction Center and many NWS field offices have already adopted the ensemble as a very useful tool in their preparation of medium and extended range predictions. Nevertheless, the ensemble system can be further improved in several ways. Currently we are fine tuning the way we set the initial perturbation amplitudes in the ensemble, which have been perhaps too large in the summer. We also plan to increase the horizontal resolution for at least some of the perturbed forecasts for the first three days. Work is also underway to include more postprocessed products, many of them in probabilistic terms, related more closely to sensible weather parameters like precipitation, low-level temperature and wind. The potential merit of using ensembles from other numerical forecast centers along with the NCEP ensemble is also being explored. In the use of the ensemble, the most difficult task will continue to be that of the forecasters, who need to master a new, more probabilistic perspective to weather forecasting and be able to interpret a larger amount of information than ever before. We truly believe, and indications are already clear, that this will lead to a genuine increase in the overall usefulness of weather forecasts.


We would like to thank all of our colleagues at NCEP who, at different stages of the development process, helped us in the operational implementation of the global ensemble forecasting system. We received valuable comments on an earlier version of this manuscript from Edwin Danaher and Edward O'Lenic of NCEP.


Anderson, J. L., 1993: The Climatology of Blocking in a Numerical Forecast Model J. Climate, 6, 1041-1056.

Brooks, H. E., M. S. Tracton, D. J. Stensrud, G. DiMego, and Z. Toth, 1995: Short-Range Ensemble Forecasting (SREF): Report from a workshop. Bull. Amer. Meteorol. Soc., 76, 1617-1624.

Buizza, R., 1994: Sensitivity of optimal unstable structures. Q. J. R. Meteorol. Soc., 120, 429-451.

Buizza, R., and T. Palmer, 1995: The singular vector structure of the atmospheric general circulation. J. Atmos. Science, 52, 1434-1456.

Buizza, R., J. Tribbia, F. Molteni, and T. Palmer, 1993: Computation of optimal unstable structures for a numerical weather prediction model. Tellus, 45A, 388-407.

Caplan, P. M., 1995: The 12-14 March 1993 superstorm: Performance of the NMC global medium-range model. Bull. Amer. Meteorol. Soc., 76, 201-212.

Danaher, E. J., 1996: Ensemble forecasting - Early impressions and future prospects. In: Critical Path, Winter 1995-96, National Weather Service, Silver Spring, Md.

desJardins, M. L., K. F. Brill, S. Jacobs, S. S. Schotz, P. Bruehl, R. Schneider, B. Colman, and D. W. Plummer, 1995: N-AWIPS User's Guide, NOAA/NWS/NCEP. [Available from NCEP, 5200 Auth Rd., Camp Springs, MD 20746.]

Epstein, E. S., 1969: Stochastic dynamic prediction. Tellus, 21, 739-759.

Harrison, M. S. J., D. S. Richardson, K. Robertson, and A. Woodcock, 1995: Medium-range ensembles using both the ECMWF T63 and Unified models - An initial report. Technical Report No. 153, UK Met. Office. [Available from: Forecasting Research Division, Meteorological Office, London Road, Bracknell, Berkshire RG12 2SZ, UK.]

Houtekamer, P. L., L. Lefaivre, J. Derome, H. Ritchie, and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev., Mon. Wea. Rev., 124, 1225-1242.

Iyengar, G., Z. Toth, E. Kalnay, and J. Woollen,1996: Are the bred vectors representative of analysis errors? Preprints of the 11th AMS Conference on Numerical Weather Prediction, 19-23 August 1996, Norfolk, Virginia, p. J64-J66.

Kalnay, E. 1995: Numerical weather prediction. Computers in Physics, 9, 488-495.

Kalnay, E., and Z. Toth, 1994: Removing growing errors in the analysis. Proceedings of the Tenth Conference on Numerical Weather Prediction, July 18-22, 1994, Portland, OR. AMS, p. 212-215.

Kalnay, E., Z. Toth, Z-X. Pu and S. Lord, 1996: Targeting weather observations to locations where they are most needed. WGNE Reasearch Activities in Atmospheric and Oceanic modeling, WMO-CAS/WCRP-JSC publication.

Legras, B., and R. Vautard, 1996: A guide to Lyapunov vectors. Proceedings of the ECMWF Seminar on Predictability. September 4-8, 1995, Reading, England, Vol. I, 143-156.

Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409-418.

Lorenz, E. N., 1963: Deterministic non-periodic flow. J. Atmos. Sci., 20, 130-141.

Lorenz, E. N., 1965: A study of the predictability of a 28-variable atmospheric model. Tellus, 17, 321-333.

Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21, 289-307.

Lorenz, E. N., 1989: Effects of analysis and model errors on routine weather forecasts. In: Ten years of medium-range weather forecasting. 4-8 September 1989, ECMWF Seminar Proceedings, Vol. I, 115-128. [Available from: ECMWF, Shinfield Park, Reading, RG2 9AX, UK.]

Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble system: Methodology and validation. Q. J. R. Meteorol. Soc., 122, 73-119.

Reynolds, C. A., P. J. Webster, and E. Kalnay, 1994: Random error growth in NMC's global forecasts. Mon. Wea. Rev., 122, 1281-1305.

Snyder, C., 1996: Summary of an Informal Workshop on Adaptive Observations and FASTEX. Bull. Amer. Meteorol. Soc., in press.

Toth, Z., and Kalnay, E., 1993: Ensemble Forecasting at the NMC: The generation of perturbations. Bull. Amer. Meteorol. Soc., 74, 2317-2330.

Toth, Z., and E. Kalnay, 1995: Ensemble forecasting at NCEP. Proceedings of the ECMWF Seminar on Predictability. September 4-8, 1995, Reading, England, 39-60.

Toth, Z., and Kalnay, E., 1996a: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., under review. Also avaialbele as NMC Office Note 407 [from NCEP, 5200 Auth Rd., Camp Springs, MD, 20746.]

Toth, Z., and E. Kalnay, 1996b: Climate ensemble forecasts: How to create them? Idojaras, in press.

Toth, Z., I. Szunyogh, and E. Kalnay, 1996: Singular, Lyapunov and bred vectors in ensemble forecasting. Preprints of the 11th AMS Conference on Numerical Weather Prediction, 19-23 August 1996, Norfolk, Virginia, p. 53-55.

Tracton, M. S. and E. Kalnay, 1993: Ensemble forecasting at NMC: Operational implementation. Wea. Forecasting, 8, 379-398.

Wobus, R. L., and E. Kalnay, 1995: Three years of operational prediction of forecast skill at NMC. Mon. Wea. Rev., 123, 2132-2148.

Zhu,, Y., G. Iyengar, Z. Toth, M. S. Tracton, and T. Marchok, 1996: Objective evaluation of the NCEP global ensemble forecasting system. Preprints of the 15th AMS Conference on Weather Analysis and Forecasting, 19-23 August 1996, Norfolk, Virginia, p. J79-J82.