EMC VERIFICATION DATABASE

                         Keith F. Brill
                         Mark D. Iredell

                          22 April 1998


The EMC Verificacation Database (VSDB) is an ASCII text file.  The
records in the file are separated by linefeeds (X'0A').  The maximum
record size is 255 bytes.  Each record contains one or possibly more
statistic values.  The record is defined by blank-separated text fields.
The maximum field size is 24 bytes.  The fields must not be either null
or contain an embedded blank.  Fields do not have to line up in columns.
All characters are assumed to be upper case.

There will be one VSDB file for each day for each model.  The naming
convention for the file will be name_YYYYMMDD.vsdb, where name is the
model name, and YYYYMMDD is the year, month, and day.  This will include
all verifications done on the particular day YYYYMMDD for that model.
The models will placed in separate directories.  The directory name
must agree with name_ in the .vsdb file name.

The first set of fields consist of the header fields.  The header fields
identify the verification statistic(s).  There are usually 11 header
fields but more can be added compatibly.  The contents of the header
fields should conform to the standards below:

  Header field  1 : (char) verification database version
  Header field  2 : (char) forecast model verified
  Header field  3 : (char) forecast hour verified
  Header field  4 : (char) verifying date
  Header field  5 : (char) verifying analysis or observation type
  Header field  6 : (char) verifying grid or region
  Header field  7 : (char) statistic type
  Header field  8 : (char) parameter name
  Header field  9 : (char) level description
  Header field 10-- Not yet defined

Following the header fields is a separator field consisting of a single
equals sign (=).

  Separator field : (char) =

The next set of fields consist of the data fields.  The first data field
is typically the number of values used.  The following data fields are
one or more statistics values.  The statistic type header field infers
the order of the statistic values.  A missing value for the data fields
is -1.1e31.

  Data field    1 : (real) number of values used (gridpoints or obs)
  Data field    2 : (real) actual statistic value(s)
  Optional data fields.

Examples:

V01 AVNB 24 1996090100 FNL NHX ACORR(1-20) Z P500 = 3600 94.32
V01 ERL 36 1996090100 MB_PCP G211 FHO>2.5 APCP/24 SFC = 6045 .40 .50 .30
V01 ETAX 24 1996090100 MESO G211 TENDCORR SLP MSL = 10000 77.77
V01 ECM 24 1996090100 FNL NHX RMSE Z P1000 = 3600 -1.1E31
V01 AVN 12 1996090100 AIRCFT/GOOD NHX RMSE T P250-200 = 3600 1.4321E+00


HEADER FIELD STANDARDS

Header field 1: verification database version

V01
Vnn              Future versions

Header field 2: forecast model verified

AVN              Aviation forecast model
AVNX             AVN Parallel X
AVNY             AVN Parallel Y
AVNZ             AVN Parallel Z
AVNU             AVN Parallel U
AVNV             AVN Parallel V
AVN?             AVN future parallel ?
ENSnn            Ensemble member nn
ENSxx            Ensemble product xx
ETA              Early Eta model
ETAL             Eta Parallel L
ETAV             Eta Parallel V
ETAX             Eta Parallel X
ETAY             Eta Parallel Y
ETA?             Eta future parallel ?
FNL              Final GDAS
NGM              Nested-grid forecast model
MESO             Mesoscale model
MRF              Medium-range forecast model
MRFX             MRF Parallel X
MRFY             MRF Parallel Y
MRFZ             MRF Parallel Z
MRFU             MRF Parallel U
MRFV             MRF Parallel V
MRF?             MRF future parallel ?
RSM              Regional Spectral Model
RSMH             Hawaiian Regional Spectral Model
RUC              Rapid Update Cycle
TSname           Hurricane model
USRname          User-defined experiment

The model name may be followed by an optional slash preceding
a grid number, indicating the output grid from which the
model data was interpolated for the verification.  For example,
eta/212 implies that grid 212 was the source of the eta model
data used in the verification.

Header field 3: forecast hour

hhhh.d/w

   where hhhh.d is the hour in the forecast that lies at the
   midpoint of the time interval w.  w is usually the interval
   over which observations have been interpolated.  The interval
   between forecasts used in the interpolation is w/2.  If w is
   zero, then /w is omitted, and no time interpolation is implied
   (e.g., grid-to-grid verification).

Header field 4: verifying date

yyyymmddhh.d/w/i

   where yyyymmddhh.d is the beginning, midpoint, or ending of
   a time interval of width w hours with a data increment of i,
   which gives the time interval in hours between the data times
   contributing to the stored statistical result.  If /w/i are absent,
   they are both assumed to be 0.  If w is preceded by a plus (+)
   sign, then yyyymmddhh.d is the beginning of the time interval.
   If w is preceded by a minus (-) sign, then yyyymmddhh.d is the
   ending of the time interval.  Otherwise, unsigned w indicates that
   yyyymmddhh.d is the midpoint of an interval w hours in duration.
   Note that .d is the fractional part of an hour and may be omitted
   if it is 0.

   To standardize certain commonly requested time searches, the
   following conventions are imposed for yyyymmddhh.d/w:

   hh.d = 12.0 and w=24 implies entire day yyyymmdd
   ddhh.d = 1500.0 and w = 730 implies entire month yyyymm
   mmddhh.d = 021500.0 and w = 2190 implies first quarter of yyyy
   mmddhh.d = 051500.0 and w = 2190 implies second quarter of yyyy
   mmddhh.d = 081500.0 and w = 2190 implies third quarter of yyyy
   mmddhh.d = 111500.0 and w = 2190 implies last quarter of yyyy
   mmddhh.d = 040100.0 and w = 4380 implies first half of yyyy
   mmddhh.d = 100100.0 and w = 4380 implies last half of yyyy
   mmddhh.d = 070100.0 and w = 8760 implies entire year yyyy

   mmddhh.d = 011500.0 and w = 2190 implies climatological winter
                                    season for yyyy
   mmddhh.d = 041500.0 and w = 2190 implies climatological spring
                                    season for yyyy
   mmddhh.d = 071500.0 and w = 2190 implies climatological summer
                                    season for yyyy
   mmddhh.d = 101500.0 and w = 2190 implies climatological fall
                                    season for yyyy

   The search software will look for these specific criterion on
   request for daily, monthly, quarterly, semi-annually, annually,
   or seasonally tagged data.

   Version 1.0 of the search software will NOT make any attempt to
   decide whether a specific yyyymmddhh.d lies within intervals
   defined in the data base using w.  It will, however, be able to
   match the string yyyymmddhh.d/w/i.  It will only discriminate on
   the basis of w when daily, monthly, quarterly, semi-annually,
   annually, or seasonally tagged data is requested.

   i will be used as a search criterion only.  If it is not present
   in the data field, it will be assumed to have a zero value.

Header field 5: verifying data source or analysis

Any name that can be used in field 2 plus:

MB_PCP           Mike Baldwin's Precipitation Analysis
ADPUPA           Conventional upper-air
ADPSFC           Conventional surface
AIRCAR           ACARS
AIRCFT           Conventional aircraft
ANYAIR           Any upper-air data source
ANYSFC           Any surface data source
ERS1DA           ERS Scatterometer data
GLBANL           Global Analysis
Knnn             Observation PREPRO type nnn
ONLYSF           Surface data verified against 2/10-m forecast data
PROFLR           Profiler
SATEMP           Satellite radiances
SATWND           Satellite winds
SFCSHP           Conventional marine
SPSSMI           SSM/I
VADWND           VAD WSR88D wind profiles

A verifying data source name may be followed by /Knnn, where nnn
is observation type number.

A data quality flag may be entered after each verifying data type.
The following flags are standard:

        /GOOD  -- useable data
        /BAD   -- rejected data

/GOOD may be omitted when only GOOD data is used.

The data searching software must match these verbatim.

Header field 6: verifying grid or region

Bnnnnn           Buoy, where nnnnn is the buoy number
GBL              Global
NHX              Northern hemisphere extropics (20N-80N)
SHX              Southern hemisphere extropics (80S-20S)
TRO              Tropics (20S-20N)
Gnnn             NCEP grid GRIB type nnn
Gnnn/SUBSET      NCEP grid subset
Rnnnnn           Rawinsonde station nnnnn
Rxxnnn           Rawinsonde set xxnnn
USRname          User-defined grid

SUBSET names:

ATC              Arctic verification region
WCA              Western Canada verification region
ECA              Eastern Canada verification region
NAK              Northern Alaska verification region
SAK              Southern Alaska verification region
HWI              Hawaii verification region
NPO              Northern Pacific Ocean verification region
SPO              Southern Pacific Ocean verification region
NWC              Northern West Coast verification region
SWC              Southern West Coast verification region
NMT              Northern Mountain verification region
SMT              Southern Mountain verification region
NFR              Northern Front Range verification region
SFR              Southern Front Range verification region
NPL              Northern Plains verification region
SPL              Southern Plains verification region
NMW              Northern Midwest verification region
SMW              Southern Midwest verification region
APL              Appalachians verification region
NEC              Northern East Coast verification region
SEC              Southern East Coast verification region
NAO              Northern Atlantic Ocean verification region
SAO              Southern Atlantic Ocean verification region
PRI              Puerto Rico & Islands verification region
MEX              Mexico verification region
GLF              Gulf of Mexico verification region
CAR              Caribbean Sea verification region
CAM              Central America verification region
NSA              Northern South America verification region

These regions will be defined in a table file.

Header field 7 : statistic type

ACTIVE statistic types consist of numbers from which other
statistics can be computed by the display software.

PASSIVE statistic types consist of pre-computed numbers that
can only be found and displayed.

ACTIVE TYPES:

SAL1L2(*)        Anomaly L1 and L2 values for scalars (5 values)
SL1L2(*)         L1 and L2 values for Scalars (5 values)
VAL1L2(*)        Anomaly L1 and L2 values for vectors (7 values)
VL1L2(*)         L1 and L2 values for Vectors (7 values)

FHO<>*           F,H, and O (three values), where
                 F = Forecasted fraction above/below threshold
                 H = Correct fraction above/below threshold (hits)
                 O = Observed fraction above/below threshold

PASSIVE TYPES:

ACORR(*)         Anomaly correlation
ACORWG           Anomaly correlation for waves 1-20, 1-3, 4-9, 10-20
CORR(*)          Correlation
MAXE(*)          Maximum difference
RMDIF            RMS & MEAN differences (see below)
RMSE(*)          Root Mean Square Error
TENDCORR(*)      Tendency correlation
???(*)           User defined

B<>*             Bias above/below threshold
CSI<>*           Critical Success Index above /below threshold
ETS<>*           Equitable threat score above/below threshold
FAR<>*           False alarm rate above/below threshold
PA<>*            Postagreement above/below threshold
PF<>*            Prefigurance above/below threshold
POD<>*           Probability of detection above/below threshold
TS<>*            Threat score above/below threshold

The qualifier parenthetically enclosed following the statistic
type may be any character string.  The qualifier is optional.
The searching software must match both the parameter name and
the qualifier to find the statistic values.

The scalar anomaly L1L2 data are composed of five numbers in
addition to the data count:

MEAN [f-c], MEAN [o-c], MEAN [(f-c)*(o-c)], MEAN [(f-c)**2],
MEAN [(o-c)**2].

The scalar L1L2 data are composed of five numbers in addition to
the data count:

MEAN [f], MEAN [o], MEAN [f*o], MEAN (f**2), MEAN (o**2).

In these expressions, f are forecast values, o are observed values,
and c are climatological values.

The vector anomaly L1L2 data are composed of seven numbers in
addition to the data count:

MEAN [uf-c], MEAN [vf-c], MEAN [uo-c], MEAN [vo-c],
MEAN [(uf-c)*(uo-c)+(vf-c)*(vo-c)], MEAN [(uf-c)**2+(vf-c)**2],
MEAN [(uo-c)**2+(vo-c)**2]

The vector L1L2 data are composed of seven numbers in addition to
the data count:

MEAN [uf], MEAN [vf], MEAN [uo], MEAN [vo], MEAN [uf*uo+vf*vo],
MEAN [uf**2+vf**2], MEAN [uo**2+vo**2]

Note that the statistic type determines whether vector or scalar
treatment is appropriate in the computation of the following
statistical quantities for SAL1L2, VAL1L2, SL1L2, VL1L2:

        variance and standard deviation of forecast values
        variance and standard deviation of observed values
        root mean square error
        bias
        covariance
        correlation

In the case of thresholds, the information is not enclosed in
parentheses, but it is given as a real number preceded by either
< or >, according to whether the value is an upper bound or a lower
bound, respectively.  The statistic types followed by <>* in the
listing above must ALWAYS be accompanied by a threshold qualifier.
The searching software will be binning this kind of data on the basis
of the thresholds.

Note that F, H, and O can be used to compute FAR, TS, ETS, POD,
PA, B, PF, and CSI.  Diagnostic software will be included to compute
the latter from the former, which MUST be stored under the FHO
statistic type.  The values of F, H, and O are always entered as
decimal values between 0 and 1.0.  The number of events is simply
the product of the value and the count.

ACORWG is composed of five numbers, the first being the data count.
RMDIF is composed of five numbers:  the data count, RMS (f-a),
MEAN (f-a), RMS (f-c), and MEAN (f-c), where f is forcast, a is
analysis, and c is climatology.

Header field 8 : parameter identifier

APCP/12          12-h Accumulated total precipitation
APCP/24          24-h Accumulated total precipitation
CPCP/12          12-h Convective precipitation
CPCP/24          24-h Convective precipitation
H                Height above ground level
Knnn             NCEP parameter GRIB type nnn
Kxxxxx           5-character NCEP (Russ Jones) identifier
Q                Specific humidity
RH               Relative humidity
SLP              Sea level pressure
SPCP/12          12-h Grid scale precipitation
SPCP/24          24-h Grid scale precipitation
T                Temperature (sensible)
TV               Virtual temperature
U                U wind component
V                V wind component
VWND             Vector wind
WDIR             Wind Direction
WSPD             Wind Speed
Z                Height

    Note that accumulation or averaging periods follow the
    parameter name with / as the separator.

Header field 9 : level identifier

Bx-y             Constant pressure depth boundary layer
Dx-y             Depth
Hx-y             Height above ground level
Px-y             Pressure
Sx-y             Sigma
Tx-y             Potential temperature
Zx-y             Height

ATMOS            Entire atmosphere
FRZDN            Lower freezing level
FRZUP            Upper freezing level
MSL              Mean Sea Level
MWND             Maximum wind
SFC              Surface
TROP             Tropopause

    where x-y gives the bounding values of the levels for a layer.
    If -y is not given, then a single level value is specified.
    For B, D, H, P ,S, T, and Z, either x or x-y must ALWAYS be
    specified.


Data field : count followed by data value(s)

    For data combination, the count will always multiply the data
    value before summing.  The counts will be summed also.