GLOBAL ENSEMBLE FORECASTING AT NCEP

 

 

Zoltan Toth1

Environmental Modeling Center, NCEP


 












Acknowledgements:

Yuejian Zhu1 EMC
 
 
 
 
 
 

http://sgi62.wwb.noaa.gov:8080/ens/enshome.html

ncepwks_page16

ncepwks_page15

PERSONNEL: ensemble (+targeted obs)

NAME PERIOD TIME(%) YRS

Eugenia Kalnay 91-96 20 1

Zoltan Toth 91-00 80 7 +2

Yuejian Zhu 95-00 40 2.5

Richard Wobus 96-00 50 2.5

Timothy Marchok 96-98 20 0.5

Istvan Szunyogh 99-00 50 1 +2

Steve Tracton 92-96 20 1

TOTAL: 15.5 personyear, 1.5 person over 10 yrs

OUTSIDE COLLABORATORS

Gopal Iyengar New Delhi, NCMRWF

Stephane Vannitsem Brussel, RMI

Jon Ahlquist Tallahassee, FSU

Leonard Smith Oxford, Univ. Oxford

Jeff Whitaker Boulder, NOAA/CDC - CIRES

Warren Tennant Pretoria, SAWB

Craig Bishop State College, PSU

Ron Gelaro Monterey, NRL

Kerry Emanuel Boston, MIT

Chris Snyder Boulder, NCAR

Marty Ralph Boulder, NOAA/ETL

SUPPORTED DEVELOPMENT OF ENSEMBLE FCSTING IN:

Canada, FNMOC, India, South Africa, Japan

PUBLICATIONS: 20+ refereed, 60+ other

7 NATL/INTERNTL MEETINGS ORGANIZED ON PREDICTABILITY

ncepwks_page14

HOW TO CONSTRUCT PROB FCSTS?

1) Statistical methods (analogs, etc) - for long range

2) Based on single control forecast
+ past verification statisticsncepwks_Auto1C

+ MOS or other techniques

3) Based on ensemble of NWP fcsts

ncepwks_page13ncepwks_Auto1B

ncepwks_page12

ncepwks_page11

TWO ISSUES IN PERTURBATION GENERATION

1) ESTIMATING ANALYSIS UNCERTAINTY
2) Sampling analysis errors

ESTIMATING ANALYSIS UNCERTAINTY:
In future -
Advanced assimilation techniques (eg, Kalman filter) =>

With perfect 4DVAR,
errors are mostly the leading LLVs (bred vectors)
Pires, Vautard & Talagrand, 1995

At present -
FAST GROWING errors can be estimated via breeding
Toth and Kalnay, 1993

Is rest of spectrum relevant? - Houtekamer

ncepwks_Following Anchor8

In breeding, perturbed forecasts and rescaling
mimics the impact of observations on analysis -

Dynamically fastest growing perturbations in analysis cycle

ncepwks_Auto1A

ANALYSIS ERRORS VS. BRED VECTORS (Iyengar et al, Poster)

1) Correlate well in a statistical sense (0.8 in vertical)ncepwks_page10ncepwks_pageF
3) Sizable projection on each other on daily basisncepwks_pageE

Bred vectors are present as anal. errors (60,000 obs used
- But there are other errors as well =>
How to sample analysis errors?
 

INTRODUCTION - 2

DEFINITIONS: Lyapunov and Singular Vectors (SV)ncepwks_Following Anchor7

INTRODUCTION - 3

2) BVs - Theoretically less understood

Extension of Lyapunov concept into nonlinear domain - Boffetta

Combination of naturally fastest growing nonlinear perturbations

Characteristics:

Cheap to generate

No optimization - Not optimal for any specific problem -

Can be used to give general solution?

Is nonmodal behavior important?

Are decaying/trailing LLVs important?

Norm not critical in definition

More representative of physical system?

Nonlinear features - No need for tangent linear model/approach

Can they be uniquely defined?

For complex systems, not even linear LLVs may be uniquencepwks_Following Anchor6

3) Arguments from both sides -

WHICH APPROACH IS MORE APPLICABLE IN

ENSEMBLE FCSTING AND TARGETING?

TWO PROBLEMS IN PERTURBATION GENERATION

1) Estimating analysis uncertainty

2) SAMPLING ANALYSIS ERRORS

SAMPLING ANALYSIS ERRORS:
 

1) BREEDING (NCEP, FNMOC, JMA)
Random sampling in subspace of fastest growing possible anal. errors

2) MULTIPLE ANALYSIS CYCLES (CMC)
Random sampling in FULL space of possible analysis errors (including neutral/decaying errors) => Serious overrepresentation of selected decaying errors ("noise") Bishop, 2000

3) SINGULAR VECTORS (ECMWF):

Theoretically good approach. Must use analysis error covariance info

Potential problems:

a) Current applications do not constrain solutions by analysis error covariance info (instead use Total Energy, inappropriate)

b) Very expensive. If using anal. cov. info, order of magnitude more cpu need than for running ensemble fcsts themselves at same resol.

c) Solve separate problem for each application?

Separate ensembles for day 3, 4, 5,...? Not practical

d) How much do we gain compared to random sampling (breeding)?

Possibly not much Yoden et al., 1999

CONFIGURATION SINCE 28 JUNE 2000

ncepwks_Following Anchor5

INCREASED MEMBERSHIP:

Add 6 perturbed forecasts at 1200 UTC

INCREASED HORIZONTAL RESOLUTION:

From T62 to T126 for first 60 hrs for all members

3 times CPU increase well within 5-fold increase in capabilities -

Ensembles are ideal for parallel computing!

Acknowledgements to: M. Brown, D. Michaud, J. Irwin, others

Need more CPU resources to remain competitive

NCEP ECMWF

1994 14 members, T62 32 members, T62

2000 20 members, T126 for 60 hrs 50 members, T159

2001, plan 40 members, T126 for 180 hrs 50 members, T255

FORECAST PRODUCTS


1) EXPECTED FCST VALUE:

a) Ensemble mean

Flow dependent filtering (time mean not needed now)

b) Ensemble mode -

Most likely scenario
 
 

2) FORECAST RELIABILITY:

ncepwks_At Anchor6

a) Spaghetti diagrams

Show possible scenarios

b) Ensemble spread -

Degree of agreement

c) Normalized spread

Influence of Season/Region/Lead time removed

d) Relative measure of predictability

Shows higher/lower than average predictability

(influence of climate distribution also removed)

e) Probabilistic forecasts

PQPF, etc

DISTRIBUTION CHANNELS

1) RAW DATA:

OSO, NCEP ftp servers
 
 

2) GRAPHICAL PRODUCTS:

a) Gif (Web):

http://sgi62.wwb.noaa.gov:8080/ens/enshome.html

b) GEMPAK: NCEP centers?

c) Redbook: AWIPS

USER BASE

1) NCEP Centers:

HPC, CPC, TPC, others?

2) NWS fcst offices:

WFOs in different regions -

What's needed for more systematic use? AWIPS???

2) Other agencies:

USAF

3) Private companies:

Energy, insurance, etc. industries

4) International:

Public and private users from Central- and South America, Africa, Europe, New Zealand

VERIFICATION OF PROB FCSTS


1) TRADITIONAL MEASURES:

Categorical fcsts: Average hit rate for modus

Point fcsts: Error in expected value, median, etc
(RMS, Pattern Anomaly Correlation)
 
 

2) DISTRIBUTIONAL MEASURES:

a) Talagrand (Verification Rank) diagram
- measures consistency only

b) Reliability diagrams

c) Brier Skill Score

d) Ranked Probability Skill Score

e) Relative Operating Characteristics

f) Information content

g) Economic Value

MAIN CHARACTERISTICS OF PROB FCSTS

ncepwks_pageD
 
 

1) Must be "sharp", have lots of information wrt climatology

(called RESOLUTION)

A N D
 

2) Must be consistent with observations
(called RELIABILITY)

ncepwks_pageC

ncepwks_Auto19

ncepwks_Auto18

HOW TO COMPARE CONTROL VS ENSEMBLE?

1) ACCURACY: Compare control with ensemble mean

2) PROBABILITIES: In framework of 10 climate bins

CONTROL ENSEMBLE
Yes or No fcst for an event Full probability distribution
ncepwks_Auto17

"UPGRADE" control: "DOWNGRADE" ensemble:
Based on past verification, Take probability at mode (Pm),
can be calibrated (like ens.) distribute (1-Pm) over 9 other b
ncepwks_Auto16
ncepwks_Auto15

ncepwks_pageBncepwks_Auto14

ncepwks_pageA

ncepwks_Auto13

RESOLUTION OF ENSEMBLE BASED PROB. FCSTS

QUESTION:

What are the typical variations in foreseeable forecast uncertainty?

What variations in predictability can the ensemble resolve?

METHOD:

Ensemble mode value to distinguish high/low predictability cases

Stratify cases according to ensemble mode value -

Use 10-15% of cases when ensemble is highest/loewest

DATA:

NCEP 500 hPa NH extratropical ensemble fcsts for March-May 1997

14 perturbed fcsts and high resolution control

VERIFICATION:

Hit rate for ensemble mode and hires control fcst

ncepwks_page9

SEPARATING HIGH VS. LOW UNCERTAINTY FCSTS

ncepwks_Auto12

THE UNCERTAINTY OF FCSTS CAN BE QUANTIFIED IN ADVANCE

HIT RATES FOR 1-DAY FCSTS

CAN BE AS LOW AS 36%, OR AS HIGH AS 92%

10-15% OF THE TIME A 12-DAY FCST CAN BE AS GOOD, OR A 1-DAY FCST CAN BE AS POOR AS AN AVERAGE 4-DAY FCAST

1-2% OF ALL DAYS THE 12-DAY FCST CAN BE MADE WITH MORE CONFIDENCE THAN THE 1-DAY FCST

AVERAGE HIT RATE FOR EXTENDED-RANGE FCSTS IS LOW - VALUE IS IN KNOWING WHEN FCST IS RELIABLE

ncepwks_page8

ncepwks_At Anchor5

ncepwks_At Anchor4

ncepwks_Auto11ncepwks_Auto10

ncepwks_AutoFncepwks_AutoE

ncepwks_AutoDncepwks_AutoC

ncepwks_AutoBncepwks_AutoA

ncepwks_page7

ncepwks_page6

ncepwks_Auto9ncepwks_Auto8

ncepwks_page5

ncepwks_Auto7ncepwks_Auto6

ncepwks_Auto5ncepwks_Auto4

ncepwks_At Anchor3

ncepwks_page4

ncepwks_Auto3

ncepwks_page3

WHAT MAKES FCSTS BETTER / MORE USEFUL?

1) More / better quality data - within 25 years:
10% 2D error reduction, 6-hr gain

2) Improved analysis schemes

- within 6 years:
10% 5D AC improvement, 12-hr gain

3) Better fcst models
 

4) Use of ensembles: 25-30% 5D Brier score imprvm.,24-hour gain

CONTROL ENSEMBLE
Yes or No fcst for an event Full probability distribution
ncepwks_Auto2ncepwks_Auto1

IDENTIFYING AREAS OF FLOW DEPENDENT

SYSTEMATIC MODEL ERROR


Z. Toth & Y. Zhu

Fcst errors are due to errors in:

Initial condition

Boundary conditionncepwks_Following Anchor4 Missing/poor model for boundary

Imperfect model Resol., approx. in physics, etc.

Attribution of error is very difficult (e. g., snow storm of 000125)

- ambitious goal

Ensemble fcsts as diagnostic tool to identify systematic model error

Setup: Use consecutive ensembles

Mark areas where cloud of all fcsts misses verifying analysis

Measure normalized distance of ens mean from analysis

ncepwks_Following Anchor3

OUTLIER STATISTICS

Assumption 1:

Model (and ensemble formation) is perfect =>

Each member equally likely

Verifying analysis is indistinguishable from members =>

Each interval between ordered members equally likely,

inclulding open ended extreme intervals

ncepwks_page2

N-member ensemble defines N+1 intervals -

Chance of verif. analysis falling outside of ensemble: 2/N+1

Probability that four 10-member independent ensembles miss verific.:

P=(2/11)*(1/11)**3 = 0.0001366026

But analysis errors may be temporally correlated -

Assumption 2:

Analysis errors are independent only every 24 hrs (every 4th)

Probability that two 20-member independent ensembles miss verific.:

P=(2/21)*(1/21)=0.0045

NH extratropics: K=3456 gridpoints:

Expected number of outliers for 4 consecutive ensembles: K*P = 17

8 consecutive ensembles: K*P=0.035

If significantly more outliers =>

PERFECT MODEL ASSUMPTION DOES NOT HOLD

OUTLIER RESULTS

1) Number of outliers much larger than expected

2) Areas where 8-12-16 consecutive ensembles fail

CAN THESE ERRORS BE INITIAL VALUE RELATED? NO?

a) These errors develop and travel slowly with synoptic system

(Unlike initial value related error that travels faster)

b) Errors in initial condit. for a system with various fcst lead times

are from widely different geographical areas =>

VERY UNLIKELY THAT SAME (pos. or neg.) ANALYSIS ERROR IS

PRESENT AT NUMEROUS TIMES AND PLACES

FURTHER ISSUES

Qualitative results - only areas defined (though highly significant)

Can error estimates be quantified?

Comparison with HPC subjective model bias evaluation?

(Initial & model errors mixed?)

Vis5D application? What variable/level problem shows up first?

20000125 snow storm Strong indication for model error (T62 ens)

FLOW DEPENDENT SYSTEMATIC MODEL ERRORS

POSSIBLE USE/SIGNIFICANCE IN:

Model development - Identify problem areas

Test model on significant cases in ens mode

Ensemble development -

Perturb model during integration to represent

remaining model related uncertainty

Analysis - Compare ens. cloud at various lead times w. obs.

to recognize areas of model error?

ncepwks_At Anchor2

ncepwks_At Anchor1

ncepwks_page1

ncepwks_page0

12-hr fcsts can have MORE UNCERTAINTY than 9-day fcsts

Daily extended range weather prediction:

Possible in LOW UNCERTAINTY cases

If variations in hit rates due to model error could be foreseen:

RESOLUTION WOULD INCREASE

ATMOSPHERIC OBSERVATIONS

CURRENT PRACTICE:

Most observations are taken at fixed times or as opportunities arise

ADAPTIVE APPROACH:

Some observations taken adaptively to maximize anal./fcst impact
 

TARGETED OBSERVATIONS

IMPROVE PARTICULAR FCST FEATURE:
Eg, 3-day precip fcst over ne US

1) How to select fcst feature?
a) Uncertainty/information content in fcst
b) Societal impact: Is uncertainty tolerable?
 

2) How to identify sensitive area to be observed?
(i) Adjoint sensitivity calcuations
(ii) Ensemble transform technique
 

3) How to take observations?
(i) Dropsondes released from manned aircraft
(ii) Unmanned aircraft
(iii) Balloons
(iv) Satellite

ENSEMBLE TRANSFORM TECHNIQUE

GOAL:
Try to reduce expected fcst error at time t2, location V(erif)
PROBLEM:
Locate sens area where extra obsv. at t1 best achieve goal
METHOD:
Based on nonlinear ensemble - Bishop and Toth, 1996

Bishop and Majumdar, 2000
Variance = uncertainty under standard observational network
TRANSFORM ENSEMBLE to see effect of extra observations
Variance = uncertainty with extra obs. added at location X
MOVE X to see if variance at t2 optimally reduced at Verif areancepwks_Following Anchor2
TRANSFORMATION:
Linear combination of ensemble perturbations -
SVD in vector space of ensemble perturbations at t1 and t2
With 1st SV, VAR(V,t2) VERY EFFICIENT
minimize VAR(GLOBTARG,t1)

WINTER STORM RECONNAISSANCE PROGRAM

January 16 - February 16, 2000

BASED ON: WSR1999 & other field programs

NEW ELEMENT: Transition toward operations

EACH MISSION:

Requested by field/HPC forecasters to support critical weather fcsts
Operational needs

Among predisigned flight tracks, best is selected objectively
SDM training

Dropsonde flight missions carried out by AOC & USAF Reserve

COLLABORATIVE EFFORT:
Regions => HPC => EMC/SDM => AOC/USAF Reserve

Forecast feature Sensitive area Aircraft operations

TOTAL OF 12 MISSIONS, 300 DROPSONDES:
5 NOAA G-lV (from Anchorage) and
10 USAF C-130 (from Honolulu) flights

ALL DATA USED OPERATIONALLY

DATA IMPACT EVALUATION:
Near real time parallel assimil. fcst cycle with dropsonde data excluded:

http://sgi62.wwb.noaa.gov:8080/ens/target/wsr2000.html

DAILY DECISION PROCESS FOR TAKING
TARGETED OBSERVATIONS

HPC: List of significant fcst events: Time, Lat/Lon

PRIORITIZE

Objective guidance can be developed based on ensemble

EMC/SDM: Sensitivity computations for each event:

General guidance
Best flight tracks
Expected data impact

Based on results and priority of each event and available resources,

DECIDE WHETHER TO FLY, AND WITH WHICH PLANE(S)

Can be fully automated for other observing systems

ncepwks_Auto0

ncepwks_At Anchor0

Eleventh AMS Conference on Numerical Weather Prediction (August 19-23, 1996, Norfolk, VA) Preprint Volume Back Cover: Another application of ensemble prediction is to identify where and when additional observations can most effectively improve forecasts. Observing system platforms, such as aircraft released dropwinsondes, can then be "targeted" to specific regions where they will have maximum impact. Such a system is under active investigation as a component in the North American Observing System (NAOS) program (McPherson, invited talk; Lord, this volume), and in the North Atlantic field experiment (FASTEX).
In the illustration, the NCEP MRF 4.5 days 500 hPa height forecast, valid 1200 UTC 15 April 1996 (in blue, top left), displays an intense trough over the eastern third of the U.S. The ensemble spread valid at this time (top right) indicates considerable uncertainty and potential for large errors with this system (that, in fact, occurred, see error field in orange, top left panel). The evolution of this uncertainty can be traced back to, for example, 24 hours after the initial time by an inexpensive singular value decomposition procedure applied to the set of nonlinear ensemble forecasts (Bishop and Toth, this volume). In this case the procedure identifies a region in the east-central Pacific (minimum in bottom right figure) which is associated with a trough in the 24-hour MRF prediction (bottom left). It is this area in which targeted observations at that time are expected to improve most the ensuing 3.5-day forecast over the eastern U.S.

ncepwks_Following Anchor1Fourth Symposium on Integrated Observing Systems (January 10-14 2000, Long Beach, CA) Preprint Volume Back Cover: Over the past few years targeted observations, where data are taken in order to improve particular forecast features, have been tested and become part of the observing system. During January-February 1999 during the 15 missions of the quasi-operational Winter Storm Reconnaissance program dropsonde observations were adaptively collected over the northeast Pacific (top left panel, red dots) in areas that were found sensitive to forecast errors developing downstream over the continental US (shades of blue). Analysis/forecast cycles were run both with and without the dropsonde data. The contour lines in the top panel show the average difference between analyses with and without the use of the targeted data for the surface pressure. The middle right panel shows the average location of the verification regions (at 48-hour lead time) that were selected in real time by operational forecasters, expecting the possible occurence of severe weather associated with potentially large forecast errors (dashed blue ellipsoid). The contour lines in the middle panel show the actaul average surface pressure forecast error, indicating that the forecasters identified well the forecast problem areas in advance. Also shown in this panel is the average rms forecast error reduction (shades of red), indicating that the impact of the data is where it was intended to be, within the predefined verification region, over the area of maximum forecast errors. The bottom left panel shows a scatterplot of rms wind errors (1000-250 hPa, measured against observations within the predefined verification regions) for 25 forecasts with and without the use of the targeted data. Combined verification statistics using wind, surface pressure and accumulated precipitation indicate that in most (18 out of 25) cases the forecasts improved where and when intended due to the use of the adaptive observations. The 10-20% regional rms error reduction in the operationally most critical weather situations is comparable to error reductions in the Northern Hamisphere extratropical rms errors due to general improvements in data quality and quantity over the past 25 years. (For further details, see Toth et al., this volume.)

ncepwks_Following Anchor0Eleventh AMS Conference on Numerical Weather Prediction (August 19-23, 1996, Norfolk, VA) Preprint Volume Front Cover: Over the past few years ensemble forecating has become an important component of numerical weather prediction and operational forecasting (Tracton et al., this volume). As an illustration, the 5640 m single contour ("Spaghetti") diagram of the 500 hPa height is shown at 4.5 days lead time (valid at 1200 UTC on October 20, 1995, top left), displaying all the 17 members of the NCEP global ensemble (Kalnay and Toth, this volume). The yellow dotted and solid green lines represent the high resolution (T126) control forecasts (started on the 16th and 15th at 0000 UTC and 1200 UTC, respectively), while the red and blue lines, respectively, are the perturbed forecasts about the two controls. The verifying analysis is shown as a heavy black line.
The central schematic illustrates the divergence of solutions as a result of analysis uncertainties. Out of the 17 members, two dominant clusters of 8 (top right) and 7 (bottom right, including the two controls) forecasts were formed in this case, indicating the possibility of two distinctly different flow patterns at day 4.5. The verification (heavy black line) falls within the first cluster, which indicated a deeper and slower developing trough than the controls alone would suggest. For additional synoptic examples and other products derived from the ensemble, see Wobus et al. (this volume).
Probabilistic forecasts from the NCEP ensemble have demonstrated useful resolution and sharpness, and are very reliable, as displayed for the 4.5 days 500 hPa NH extratropical height forecasts (bottom left). Probabilistic forecasts (abscissa) are made for 10 climatologically equally likely bins and then the relative occurrence of the verifying analysis in all bins are accumulated as a condition of forecast probabilities (ordinate). The ensemble based probabilistic forecasts for February 1996 were calibrated using independent verification data from January 1996 (Zhu et al., this volume).


11. General Sciences Corporation (Laurel, MD) at NCEP, Zoltan.Toth@noaa.gov; (301) 763-8000/ext. 7268