Quantitative Precipitation Forecast Verification Documentation

Michael Baldwin

NCEP/EMC - General Sciences Corp.

INTRODUCTION

This documentation is intended to describe how model quantitative precipitation forecast (QPF) verification is performed at NCEP. Since changes to this system are quickly approaching, both the current and new methods will be included in this documentation.

One point that is important to mention is that we treat the model QPF as an areal average of the precipitation over an entire grid box. As a result, verifications should be made using analyses of observations that also represent an areal average. Given the high spatial variability of observed precipitation, verification of forecasts interpolated to station locations against single point observations should be avoided if possible. A single observation of precipitation is not representative of rainfall for the area surrounding the observation. It is for this reason that we limit the domain of verification to the regions of the U.S. that contain high-resolution precipitation data.

Model QPF should be provided on a grid that is the same or very similar to that used in the forecast integration. Interpolation should be avoided whenever possible, since it tends to smooth out details and may result in a loss of information. Under the proposed changes to the verification system, further remapping will be performed at NCEP in a manner which spatially integrates the precipitation, and all models will be remapped to the same verification grid to allow direct comparison of scores.

OBSERVED DATA

The observed precipitation data used in the verification is based on data provided by the National Weather Service's River Forecast Centers (RFCs). The data consist of reports of 24h accumulated precipitation ending at 1200 UTC each day. A map showing the station locations can be found in Figure 1. This network has good spatial coverage and high resolution east of the Rockies, but has many gaps in coverage in the mountainous West. There are approximately 10,000 stations in this network.The data that are received by EMC do not contain reports of zero rainfall, only non-zero precipitation is found in the RFC data that is currently in use.

Figure 1. Station distribution for the RFC 24h accumulated precipitation data.

In order to fill in some of the gaps in the raingage data, caused either by missing reports or no reporting stations at all, these data are augmented by "bogus" radar reports. These "bogus" radar observations are generated by comparing Manually Digitized Radar (MDR) data to nearby raingage observations. The MDR data are reported hourly on a ~40km grid (1/4 LFM grid) and represent the maximum intensity level of the radar data within each grid box at the time of the observation. The intensity levels vary from 1 to 6. The intensity levels are summed up for each grid box over the 24h period ending at 1200 UTC. This is compared to an analysis of the RFC raingage data, where a simple average of all non-zero raingage observations within each MDR grid box is executed. At a grid box where the raingage analysis shows zero but the 24h MDR intensity sum is greater than zero, a weight factor is computed. This factor is as follows:

where i=1,n represent all of the non-zero boxes for both MDR and raingage analyses immediately adjacent to the desired grid box. This weight factor is multiplied by the 24h MDR intensity sum there to create a "bogus" observation. This estimate is used as a raingage observation located at the center of the MDR grid box for the subsequent verification analysis.

Figure 2 shows an example of this procedure. In this figure, the center MDR grid box contains no RFC observations, presumably due to missing data or a lack of stations in the vicinity. The sum of the surrounding RFC analysis grid boxes is equal to 175.83 in this case, and the sum of MDR intensities is 352, therefore the factor is approximately 0.5. The resulting "bogus" observation would then be 17, and would be treated as a raingage observation at the center of the MDR box for all subsequent verification analyses.

Figure 2. Schematic for the MDR "bogus" procedure.

Note that this "bogus" procedure is only used when there is no raingage data found within a ~40km MDR grid box, and the MDR intensity shows that some precipitation probably occurred. Also, at least two neighboring grid points are required containing both non-zero raingage and MDR reports, in order for the comparison between gage and radar to be made.

There is a quality control step performed before all of these data (raingage and bogus) are used to create a verifying analysis. The mean and standard deviation of all reports from each RFC and the MDR list are computed. The maximum allowable observation from each RFC (and the MDR list) is three standard deviations greater than its mean observation. This will toss reports that are gross overestimates, but no other QC is performed on these data. There are obvious problems with this scheme if a small number of reports occur within a given RFC.

VERIFICATION GRID

The verification grid is the grid on which the model QPF is provided. The only exceptions to this are the Early and Meso Eta Models. The forecasts from these models are also remapped to an 80km grid (the old Early Eta 80km grid) so their scores can be compared to results from other models with horizontal resolutions close to 80km. The remapping is performed in a manner in which maintains the total precipitation to a desirable level of accuracy. In addition, these models are verified on their native horizontal grids.

The verification domain is determined for each model grid, based upon a master list of stations found in the RFC raingage data. This list contains only those stations found in the lower 48 states. All model grid boxes which contain no raingages are excluded from the verification domain. For the Meso and Early Eta Models, the verification domain is also sub-divided into meteorologically significant regions (West Coast, Rockies, Southern Plains, etc., see Figure 3) so that regional skill scores and biases can be computed.

Figure 3. Regions for QPF verification.

ANALYSIS

The analysis scheme of the current system is a simple average of all non-zero observations found within each verification grid box. Since only non-zero reports are received, all grid boxes within the verification domain that have no reports are assumed to have zero observed precipitation.

REMAPPING

The remapping of the Meso (29km) and Early (48km) Eta Model QPFs to the 80km grid is done in a manner which integrates the precipitation from the original grid to the target grid. At each grid point of the original grid, a set of 16 sub-boxes (4x4) is created that each contain the amount of precipitation that was in the original grid box. At each target grid box, all of the sub-boxes with centers located within the target grid box area are averaged together, using an area-weighted average, to obtain the target grid value.

Figure 4 shows an example of this technique. The shaded area represents the sub-boxes from the original Eta Model grid which are used to form the remapped target grid value.

Figure 4. Schematic for remapping algorithm. Thin solid lines denote origin grid boxes, dashed lines denote 4x4 sub-grid boxes. Thick solid lines denote a target grid box, shaded area denotes sub-grid boxes included in remapped target grid value.

FORECAST PERIOD

Nearly all of the models are verified for the 00-24h, 12-36h, and 24-48h periods that cover the 1200-1200 UTC time frame. The MRF, MRFX, and ECMWF are verified for days 1, 2, and 3.

SCORES

NCEP has maintained an archive of statistics that can be used to compute several skill scores. For a set of eight threshold values, the following statistics are kept; F=number of forecast grid points greater than the threshold, O=number of observed points greater than the threshold, C=number of points both forecast and observed greater than the threshold, and T=total number of points within the verification domain. The eight thresholds are 0.01, 0.10, 0.25, 0.50, 0.75, 1.0, 1.5, and 2.0 inches. Usually, the equitable threat score [= (C-E)/(F+O-C-E), E=F*O/T ] and bias score [= F/O ] are computed, however, other types of skill scores can be produced from these statistics. Also, a contingency table of scores is kept showing the number of points forecast between thresholds I and I+1 AND observed between thresholds J and J+1. These matrices can be used to compute Gandin and Murphy (1992) types of scores.

PROPOSED CHANGES TO THE VERIFICATION

In order to simplify the maintenance of the verification system, output the statistics in a standard format, and to be able to directly compare the scores from all of the models, some changes are required to this system. Changes to the ingest and storage of observed data due to the removal of the NAS-9000 are imminent (scheduled for 1 Jan 97) which will result in a change to the available raingage data. Since this change is unavoidable, the time for including other changes to the system seems appropriate. The current system will be maintained as much as possible, however, to bring continuity to the multi-year score archive that has been accumulated.

OBSERVED DATA

The observed precipitation data that will be used in the near future are provided to NCEP by the RFCs in SHEF format ("HYD" data). Currently, there are approximately 5000 stations available via this network. A plot of the station locations can be found in Figure 5. Compared to the previous data set, the areal coverage is reduced in the east, but increased in the western U.S., particularly in southern California. Also, reports of zero are included. There will NOT be any "bogus" radar reports used in the new verification system. We might augment this data set with 24h sums of 1h data ("HADS"), which may add another 1000 reporting sites which are not found in the 24h accumulated ("HYD") data.

Figure 5. Station distribution for the HYD data set as of 12/08/96.

Further into the future, we will use a hourly, high-resolution, multi-sensor (radar, gage, satellite) precipitation analysis to drive the verification. This will allow us to verify at the highest resolution in both time and space.

For us, automated quality control of precipitation data is unchartered territory. A first step will be to use a similar method as the current scheme, except including all raingage reports to determine the mean and standard deviation. The maximum allowable observation from the entire list (5000+ stations) will be three standard deviations greater than the mean observation. This will toss reports that are gross overestimates. Other types of yet-to-be-defined multi-sensor QC methods may be attempted, likely within the radar/gage/satellite precipitation analysis mentioned above.

VERIFICATION GRID

We will likely use several different verification grids. To start with, we plan to use two verification grids, a 80km Lambert Conformal grid (AWIPS #211) and a 40km Lambert Conformal grid (AWIPS #212). The intention is for the 80km grid to denote forecast skill at greater than mesoscale, and for the 40km grid to denote mesoscale skill. These grids are chosen because they are commonly produced by NCEP's operational models, there are several plotting packages that can easily display them, and they are not currently used by any of the forecast models that are verified by NCEP. Therefore, all model QPFs will have to be remapped to these verification grids before verification. The remapping process will be described in a later section.

The verification domain will be determined on a daily basis via the observation analysis scheme. This will be based upon the locations of the rainfall reports for a given day, and will exclude regions where data are missing. Verification grid points that are greater than ~40 km (to be determined) from a raingage station will be excluded from the verification domain. The verification domain will be sub-divided into meteorologically significant regions (West Coast, Rockies, Southern Plains, etc.), as is currently the case with the Early and Meso Eta Models, so that regional skill and biases can be computed.

ANALYSIS

The raingage data will be analyzed to a high resolution grid (~10 km) using a Barnes-type analysis scheme (distance-weighting). The "zero line" will be adjusted based upon nearby observations so that any grid point that is surrounded by 3 observations of zero will be set to zero. This is to reduce the problem of "smearing" non-zero values that distance-weighting type analysis schemes tend to have. The high resolution analysis will be remapped to the verification grids using the same technique as the QPFs.

The philosophy behind this process is as follows. The high resolution analysis is intended to estimate the values that a high resolution raingage network would provide. The remapping should then provide a reasonably good estimate of the areal-averaged precipitation on the verification grid.

REMAPPING

All grid-to-grid remapping of precipitation will be done in a manner that maintains, to a desired accuracy, the total precipitation on the original grid. We refer to it as "remapping" rather than "interpolation" to avoid confusion with interpolation schemes, such as linear interpolation, which do not have this property. The new remapping algorithm does a nearest-neighbor interpolation from the original grid to a set of points (5x5) arranged in a square box centered around each target grid point. In figure 6, the shaded region represents sub-grid points that contain nearest-neighbor values from original grid point #1, and the unshaded region contains values from original point #2. A simple average of these target sub-grid points produces the remapped value at the target grid point.

Figure 6. Schematic of the new remapping algorithm. Origin grid boxes are denoted by solid thin lines, with numbered circles at centers. A sample target grid box is denoted by a thick, solid line, with a 5x5 set of points within.

In the limit as the discretization becomes infinitely small, the new and current remapping methods are identical. Both techniques are a redistribution of the areal-average precipitation from the original grid to the target grid using relatively small, discrete sections.

FORECAST PERIOD

NCEP will verify all forecast periods for models providing QPF which covers the 1200-1200 UTC time frame. This will include the 0600 and 1800 UTC AVN runs, and forecast periods past 48h for models that provide such forecasts. Shorter accumulation periods (1h, 3h, 12h, etc.) will be verified using an accumulated hourly multi-sensor precipitation analysis after this becomes operational.

SCORES

NCEP will maintain an archive of statistics that can be used to compute several skill scores. For a set of threshold values, the following statistics will be kept; F=number of forecast grid points greater than the threshold, O=number of observed points greater than the threshold, C=number of points both forecast and observed greater than the threshold, and T=total number of points within the verification domain. The proposed thresholds are 0.2, 2, 5, 10, 15, 25, 35, 50, and 75 mm. The equitable threat score [= (C-E)/(F+O-C-E), E=F*O/T ] and bias score [= F/O ] will be computed, however, a large number of other skill scores will also be produced from these statistics.

Other yet-to-be-determined grid-to-grid verification statistics will also be computed. We may try to verify the mean distance error for given thresholds, which will provide information on skill that the equitable threat score does not currently provide. We may verify the volume of precipitation over river basins, to provide hydrologists with information on model QPF skill that they find most useful. Verification of the horizontal scales of precipitation systems will also be attempted.

Send your comments to: Mike.Baldwin@noaa.gov