NCEP Verification System User Guide 
Geoff DiMego, Hui-Ya Chuang, and Mary Hart


INTRODUCTION:
The NCEP Verification System, which generates the Verification 
Statistics Data Base (VSDB), is divided into three parts: 
1) the "editbufr" that thins the observation PREPBUFR files, 
2) the "prepfits" that interpolates model forecast GRIB files to 
the observation sites, and 
3) the "gridtobs" that computes and generates VSDB records.  
The VSDB is in a self-documenting straightforward ASCII format. 
The format and the database are described at 
http://www.emc.ncep.noaa.gov/mmb/papers/brill/VSDBformat.txt .
The database is a collection of flat files that can be left as 
individual files or can be concatenated together into larger files.  
These ASCII records contain the raw numbers from which many final 
statistics can be computed.  NCEP calls these numbers partial 
sums.  This format also can record final statistics as well for 
any domain or time period that the self-documenting format allows.


DOWNLOAD:
The tar file "NCEPVERIF.tar" that contains source codes/running 
scripts/libraries/parameter files needed to run NCEP Verification 
System is available for anonymous ftp by:

 1. Ftp to the EMC public server by typing ftp
 ftpprd.ncep.noaa.gov . Use "anonymous" as your user id
 and your e-mail address as the password.
 
 2. Change the directory to /pub/emc/mmb/WRFtesting/verif/
 
Un-tarring "NCEPVERIF.tar" creates four directories: 
1) sorc/ contains the source codes for editbufr, prepfits, 
and gridtobs as well as the makefiles (named build) used to 
build corresponding executables on the IBM.  If platforms other 
than IBM are used, the makefiles will need to be modified.  
2) lib/ contains the libraries that are needed to compile 
the source codes.  Most of these libraries only work on 
big-endian computers.  Also note that the version of bufrlib
included in NCEPVERIF.tar is slightly different from the standard 
version of NCEP bufrlib because it was modified to accommodate 
prepfits. 
3) parm/ contains the parameter files, which can be modified 
by the users to control how the verification should be
performed.  More detail of these control files are provided 
later in the overview of the three programs.  
4) scripts/ contains the sample scripts NCEP uses to run these 
three programs on the IBM. 
 

INSTALLATION:
The steps to run NCEP Verification System for the first time are as 
follows:

 1. Build the libraries;

 2. Compile the source codes;

 3. Create an ascii file called newdate with the string yyyymmdd00, 
    which is one day before the day to be verified;   

 4. Modify the control file;
 
 5. Obtain the PREPBUFR files;
 
 6. Modify the scripts to reflect correct paths for scripts,
    executables, parameter files, the model forecast GRIB files,
    and the PREPBUFR files, etc.  Submit the script.

 
NOTE:  At NCEP, the verification system is typically run once 
every 24 hour cycle at 00z to verify the model forecasts from 
the previous day (${DATE}).  Each time, the main running script 
(runpfit.retro.mangr) calls for simultaneous execution of a series of 
eight scripts (runfits${VH}.pll.ll, where ${VH} stands for verifying
hour) to verify the model forecasts every three hours at 00Z, 
03 Z, 06 Z....24 Z for that whole day (${DATE}). Within our sample 
script, exfits.${VH}z.pll.sh, a search is made to collect and verify 
the model forecasts at a 12-hour interval that are all valid at 
${VH}Z on ${DATE}.  Each run of exfits.${VH}z.pll.sh executes editbufr,
prepfits, and gridtobs in a sequence. When all eight scripts are
completed, all the VSDB records for the same day are combined into 
one file.   


OVERVIEW:
The following is a description of the three components of the 
verification system and their input files.

1.  The code editbufr reads in, thins and writes out observations 
in the Operational PREPBUFR common / international standard format 
of BUFR.  As part of the process of generating a PREPBUFR file, some 
platform-specific QC is applied to the data contained in the PREPBUFR
file.  Dennis Keyser describes PREPBUFR processing in greater detail at
http://www.emc.ncep.noaa.gov/mmb/papers/keyser/data_processing .  [For 
WRF, this PREPBUFR will be the format for not only the verifying 
observations but also for the WRF 3DVAR analysis.]  We have also 
restricted the data to just those pieces of information we actually 
use in the analysis / data assimilation.  THEREFORE, we only have wind, 
temperature, height and moisture.  Each piece of data also has an 
associated observation pressure which is used along with latitude and 
longitude to locate the obs in three-dimensions in the atmosphere.  
Sea-level pressure obs are the only pressures that are not used for this 
ob location function.

The editbufr step thins the complete obs collection contained in the 
Operational PREPBUFR down to just those data to be used for 
verification, and creates a temporary output file.  The thinning saves 
time and space in the next prepfits step, where the most computer work 
is actually done.  The output file uses standard PREPBUFR format with
one difference.  NCEP's standard PREPBUFR allows for each observation 
to also have stored with it a value of the first guess or background 
(generated at the location and time of the observation).  The temporary 
output file from editbufr is identical in all respects to that of 
PREPBUFR except that it allows multiple backgrounds to be stored.  This 
will happen in the next step and for that reason we call this our 
PREPFITS format to distinguish it from PREPBUFR format.   Decision for 
inclusion in the output file is based on an input control file called 
keeplist which allows specification of time window, areal extent and 
observation type to be included.  A sample keeplist file is shown below.
-------------------------------------------------------------------
IRETGRID     - GRID NUMBER OF THE RETENTION AREA
104
YYMMDD       - DATE OR TIME WINDOW INDICATOR
-75
OBTYP        - UP TO 20 OB TYPES TO BE RETAINED
120
220
221
122
222
223
224
133
233
180
280
181
281
182
282
183
284
---------------------------------------------------------------------
As shown in the sample above, data to be retained can be controlled by
specifying: 
   line 2:  geographic location using the AWIP grid number, 
   line 4:  time window in hundredths of an hour within which you 
wish to keep the data, and 
   lines 6-22:  observation types.  
Eric Rogers shows the myriad of output grids needed from 
the Operational Meso Eta (and the RUC too) at his webpage:
http://www.emc.ncep.noaa.gov/mmb/etagrids/ .  
The definition of observation types can be found at:
http://www.emc.ncep.noaa.gov/mmb/papers/keyser/prepbufr.doc/table_4.htm 


2.  The code prepfits reads in the observations from the temporary
output file created by editbufr, and adds background values to each 
piece of data from one or more forecasts which are valid at the time 
of the observations.  ONCE PER DAY you will be able to verify all the 
model forecasts that are valid at the time of the data collected / 
selected in the input prepfits file.  You can run this code for just 
one forecast or for many.  

The background values (model forecasts) are generated by horizontal 
and vertical bi-linear interpolation from the standard GRIB grid 
representations.  This code deals with AVN, NGM, RUC and Eta model 
fields so it was written to perform the vertical interpolation from 
standard pressure level output from those models.  In the case of the 
Eta model, this method does not introduce very much uncertainty (3-5%) 
compared to performing vertical interpolation from native model levels, 
AS LONG AS we have isobaric model data AT LEAST every 50 mb.  Vertical 
interpolation is linear in ln p for everything except specific humidity,
which is interpolated as the ln of q linear in ln p.  The moisture 
variable in PREPBUFR and PREPFITS is specific humidity q.

Two options are available for verifying forecasts at surface.  The first
option is to directly compare the post-processed surface variables of 
the model (2 meter Temperature & moisture and 10 meter winds) with 
observation.  This method only involves a horizontal interpolation.  
There is no adjustment for discrepancies in the elevation of 
observations versus model terrain height.  AFTER ALL, forecasters don't 
do this calculation when using these fields!  The second option is to 
perform a 3-dimensional interpolation of the post-processed fields from 
the model (which ALWAYS extend to 1000 mb) to the observed pressure.  
This option performs the necessary adjustment for elevation differences 
between observation and model terrain.  It also reflects what the 
forecasters will see in the below terrain isobaric fields coming out of 
the model post-processor like 1000mb and even 850 mb fields under 
the Rockies.

The prepfits job can be run multiple times to add additional model
forecasts to the PREPFITS file.  Remember, the verifying observations
remain the same - we are simply adding different forecasts
with a common valid time.  In practice, we would do this if we were
going to archive this PREPFITS file, but we generate so much stuff 
we just don't have enough archive space to do that.

There are four input files for prepfits (sample files for each of them
can be found in the directory parm/): 
 1) levcat: allows specification of number of levels and data categories;
    line 1:   number of levels to read in from GRIB file,
    line 2:   logical flag to choose observation data categories to fit. 

NOTE:  There are 10 data categories: 1)surface, 2)mandatory level, 
3)significant level temperature & moisture, 4)winds by pressure, 
5)winds by height, 6)tropopause, 7)any single level for aircraft data, 
8)auxiliary, 9)not used, and 10)not used;

 2) data00: is the thinned-down BUFR file from editbufr;    
 3) prepfits.tab: is a BUFR table defining BUFR mnemonics;
 4) prepfits.in00: contains the names of the experiment (e.g., WRF),
    the model forecast GRIB file, and the index file of the model 
    forecast GRIB file. The file prepfits.in00 is read in as standard 
    input and is generated within our sample script 
    exfits.${VH}z.pll.sh. 

3.  The code gridtobs generates VSDB records containing the desired 
partial sums.  The code takes a brute force - brain dead approach to 
the problem.  We read in instructions from a control file 
(e.g., gridtobs.wrf0012) that contains essentially the bounding 
parameters over which the partial sums are to be accumulated.  This 
bounding parameter info also facilitates generation of the 9 headers 
of the VSDB record. A sample control file is shown below:
------------------------------------------------------------------------
V01   10
    1  WRF/212
    6  00
  12
  24
  36
  48
  60
    1  19
    1  ADPUPA
    1  G236
    1  SL1L2
    4  Z
  T
  RH
  VWND
   11  P1000
  P850
  P700
  P500
  P400
  P300
  P250
  P200
  P150
  P100
  P50
------------------------------------------------------------------------  
In the sample control file are specifications for:
   line 1:  version number of verification system (which is always 
version one at NCEP), and unit number for the input BUFR file. 
   line 2:  number of verifying models, and the name of the verifying 
models/grid numbers.  One can specify up to 6 different models.
   lines 3-8:  number of forecast hours (has to be less than 20) to 
read from PREPFIT file followed by all the two-digit forecast hours 
(00h to 60h every 12 h).
   line 9:  number of verification dates, and the 10-digit yyyymmddhh.  
If one only verify one date at a time, as at NCEP, the date of PREPFIT 
input file (output from PREPFITS step) will be used instead.  In the 
sample script, because only one verification date is used, the value 
of 19 is ignored.  
   line 10:  number of verifying ob type (maximum 10 types), and the  
names of the ob type.
   line 11:  number of verification areas over which the computation of 
partial sums is performed, and the grid numbers of the areas.
   line 12:  number of statistics types, and the types of statistics.
   lines 13-16:  number of the variable types to verify (max 6 types), 
and the names of the variables.  In the sample script, four types of
variables will be verified: height, temperature, relative humidity, and
wind.
   lines 17-27:  number of levels to verify, and names of the levels.
For example, P1000 represents 1000 mb pressure level and SFC 
represents surface.            

Two additional input files are regions and grid#104.  These two files 
are used to divide grid 104 into different sub-regions so that 
statistics may also be performed within each of these sub-regions.  
The file regions defines the two-digit number and three-letter 
abbreviation associated with each sub-region.  The file grid#104 
assigns each grid point in the NCEP grid 104 with a unique sub-region 
number that is consistent with the definition in the file regions. 
Currently, 29 sub-regions are defined within NCEP grid 104.


TEST DATA:
If you wish to verify your model forecasts, one month's worth of 
PREPBUFR observation files for the August 2001 are available on the EMC 
public server under /pub/emc/mmb/WRFtesting/data/ .   
The directory also contains data sets for Eta analysis and forecast Grib 
files that can be used to initialize WRF for the same period.
There are four types of files in this directory: 

 (i)   yyyymmddhh.INPUT.tar:  analysis and forecast Eta Grib files 
    (every 3 h from 0 h to 48 h) initializing at the yyyymmddhh cycle. 

 (ii)  yyyymmddhh.INPUT.list:  a list of the file names in 
     yyyymmddhh.INPUT.tar. 

 (iii) yyyymmdd.prepbufr.tar:  8 PREPBUFR files verifying from 
     yyyymmdd00 to yyyymmdd21 every 3 hours. 

 (iv)  yyyymmdd.prepbufr.list:  a list of the file names in 
     yyyymmdd.prepbufr.tar. 


DISPLAY OF VSDB:
In addition to NCEP Verification System described above, NCEP 
also has the Forecast Verification System (FVS) software that Keith 
Brill wrote to process the VSDB and accumulate the partial sums into 
final sums, compute the requested statistics and display the results.  
This software reads the control file with the user's requests, 
constructs a list of the records it needs, scans the VSDB for records 
matching those it needs (using the basic UNIX file names given to the 
records AND their contents).  It uses some GEMLIB (GEMPAK library) 
entities to help with this name/record matching and to perform the 
display of the resulting final statistic.  The FVS also is capable of 
writing the final statistics out as a VSDB record.  At present, we are 
still storing a number of individual records (e.g. for each cycle's run 
for each day etc.), but this facility of FVS could be used to condense 
the numbers down into weekly, monthly, seasonal or annual statistics.


PRECIPITATION VERIFICATION SYSTEM
Information on Precipitation Verification System will be provided in
the near future.