EMC VERIFICATION DATABASE Keith F. Brill Mark D. Iredell 22 April 1998 The EMC Verificacation Database (VSDB) is an ASCII text file. The records in the file are separated by linefeeds (X'0A'). The maximum record size is 255 bytes. Each record contains one or possibly more statistic values. The record is defined by blank-separated text fields. The maximum field size is 24 bytes. The fields must not be either null or contain an embedded blank. Fields do not have to line up in columns. All characters are assumed to be upper case. There will be one VSDB file for each day for each model. The naming convention for the file will be name_YYYYMMDD.vsdb, where name is the model name, and YYYYMMDD is the year, month, and day. This will include all verifications done on the particular day YYYYMMDD for that model. The models will placed in separate directories. The directory name must agree with name_ in the .vsdb file name. The first set of fields consist of the header fields. The header fields identify the verification statistic(s). There are usually 11 header fields but more can be added compatibly. The contents of the header fields should conform to the standards below: Header field 1 : (char) verification database version Header field 2 : (char) forecast model verified Header field 3 : (char) forecast hour verified Header field 4 : (char) verifying date Header field 5 : (char) verifying analysis or observation type Header field 6 : (char) verifying grid or region Header field 7 : (char) statistic type Header field 8 : (char) parameter name Header field 9 : (char) level description Header field 10-- Not yet defined Following the header fields is a separator field consisting of a single equals sign (=). Separator field : (char) = The next set of fields consist of the data fields. The first data field is typically the number of values used. The following data fields are one or more statistics values. The statistic type header field infers the order of the statistic values. A missing value for the data fields is -1.1e31. Data field 1 : (real) number of values used (gridpoints or obs) Data field 2 : (real) actual statistic value(s) Optional data fields. Examples: V01 AVNB 24 1996090100 FNL NHX ACORR(1-20) Z P500 = 3600 94.32 V01 ERL 36 1996090100 MB_PCP G211 FHO>2.5 APCP/24 SFC = 6045 .40 .50 .30 V01 ETAX 24 1996090100 MESO G211 TENDCORR SLP MSL = 10000 77.77 V01 ECM 24 1996090100 FNL NHX RMSE Z P1000 = 3600 -1.1E31 V01 AVN 12 1996090100 AIRCFT/GOOD NHX RMSE T P250-200 = 3600 1.4321E+00 HEADER FIELD STANDARDS Header field 1: verification database version V01 Vnn Future versions Header field 2: forecast model verified AVN Aviation forecast model AVNX AVN Parallel X AVNY AVN Parallel Y AVNZ AVN Parallel Z AVNU AVN Parallel U AVNV AVN Parallel V AVN? AVN future parallel ? ENSnn Ensemble member nn ENSxx Ensemble product xx ETA Early Eta model ETAL Eta Parallel L ETAV Eta Parallel V ETAX Eta Parallel X ETAY Eta Parallel Y ETA? Eta future parallel ? FNL Final GDAS NGM Nested-grid forecast model MESO Mesoscale model MRF Medium-range forecast model MRFX MRF Parallel X MRFY MRF Parallel Y MRFZ MRF Parallel Z MRFU MRF Parallel U MRFV MRF Parallel V MRF? MRF future parallel ? RSM Regional Spectral Model RSMH Hawaiian Regional Spectral Model RUC Rapid Update Cycle TSname Hurricane model USRname User-defined experiment The model name may be followed by an optional slash preceding a grid number, indicating the output grid from which the model data was interpolated for the verification. For example, eta/212 implies that grid 212 was the source of the eta model data used in the verification. Header field 3: forecast hour hhhh.d/w where hhhh.d is the hour in the forecast that lies at the midpoint of the time interval w. w is usually the interval over which observations have been interpolated. The interval between forecasts used in the interpolation is w/2. If w is zero, then /w is omitted, and no time interpolation is implied (e.g., grid-to-grid verification). Header field 4: verifying date yyyymmddhh.d/w/i where yyyymmddhh.d is the beginning, midpoint, or ending of a time interval of width w hours with a data increment of i, which gives the time interval in hours between the data times contributing to the stored statistical result. If /w/i are absent, they are both assumed to be 0. If w is preceded by a plus (+) sign, then yyyymmddhh.d is the beginning of the time interval. If w is preceded by a minus (-) sign, then yyyymmddhh.d is the ending of the time interval. Otherwise, unsigned w indicates that yyyymmddhh.d is the midpoint of an interval w hours in duration. Note that .d is the fractional part of an hour and may be omitted if it is 0. To standardize certain commonly requested time searches, the following conventions are imposed for yyyymmddhh.d/w: hh.d = 12.0 and w=24 implies entire day yyyymmdd ddhh.d = 1500.0 and w = 730 implies entire month yyyymm mmddhh.d = 021500.0 and w = 2190 implies first quarter of yyyy mmddhh.d = 051500.0 and w = 2190 implies second quarter of yyyy mmddhh.d = 081500.0 and w = 2190 implies third quarter of yyyy mmddhh.d = 111500.0 and w = 2190 implies last quarter of yyyy mmddhh.d = 040100.0 and w = 4380 implies first half of yyyy mmddhh.d = 100100.0 and w = 4380 implies last half of yyyy mmddhh.d = 070100.0 and w = 8760 implies entire year yyyy mmddhh.d = 011500.0 and w = 2190 implies climatological winter season for yyyy mmddhh.d = 041500.0 and w = 2190 implies climatological spring season for yyyy mmddhh.d = 071500.0 and w = 2190 implies climatological summer season for yyyy mmddhh.d = 101500.0 and w = 2190 implies climatological fall season for yyyy The search software will look for these specific criterion on request for daily, monthly, quarterly, semi-annually, annually, or seasonally tagged data. Version 1.0 of the search software will NOT make any attempt to decide whether a specific yyyymmddhh.d lies within intervals defined in the data base using w. It will, however, be able to match the string yyyymmddhh.d/w/i. It will only discriminate on the basis of w when daily, monthly, quarterly, semi-annually, annually, or seasonally tagged data is requested. i will be used as a search criterion only. If it is not present in the data field, it will be assumed to have a zero value. Header field 5: verifying data source or analysis Any name that can be used in field 2 plus: MB_PCP Mike Baldwin's Precipitation Analysis ADPUPA Conventional upper-air ADPSFC Conventional surface AIRCAR ACARS AIRCFT Conventional aircraft ANYAIR Any upper-air data source ANYSFC Any surface data source ERS1DA ERS Scatterometer data GLBANL Global Analysis Knnn Observation PREPRO type nnn ONLYSF Surface data verified against 2/10-m forecast data PROFLR Profiler SATEMP Satellite radiances SATWND Satellite winds SFCSHP Conventional marine SPSSMI SSM/I VADWND VAD WSR88D wind profiles A verifying data source name may be followed by /Knnn, where nnn is observation type number. A data quality flag may be entered after each verifying data type. The following flags are standard: /GOOD -- useable data /BAD -- rejected data /GOOD may be omitted when only GOOD data is used. The data searching software must match these verbatim. Header field 6: verifying grid or region Bnnnnn Buoy, where nnnnn is the buoy number GBL Global NHX Northern hemisphere extropics (20N-80N) SHX Southern hemisphere extropics (80S-20S) TRO Tropics (20S-20N) Gnnn NCEP grid GRIB type nnn Gnnn/SUBSET NCEP grid subset Rnnnnn Rawinsonde station nnnnn Rxxnnn Rawinsonde set xxnnn USRname User-defined grid SUBSET names: ATC Arctic verification region WCA Western Canada verification region ECA Eastern Canada verification region NAK Northern Alaska verification region SAK Southern Alaska verification region HWI Hawaii verification region NPO Northern Pacific Ocean verification region SPO Southern Pacific Ocean verification region NWC Northern West Coast verification region SWC Southern West Coast verification region NMT Northern Mountain verification region SMT Southern Mountain verification region NFR Northern Front Range verification region SFR Southern Front Range verification region NPL Northern Plains verification region SPL Southern Plains verification region NMW Northern Midwest verification region SMW Southern Midwest verification region APL Appalachians verification region NEC Northern East Coast verification region SEC Southern East Coast verification region NAO Northern Atlantic Ocean verification region SAO Southern Atlantic Ocean verification region PRI Puerto Rico & Islands verification region MEX Mexico verification region GLF Gulf of Mexico verification region CAR Caribbean Sea verification region CAM Central America verification region NSA Northern South America verification region These regions will be defined in a table file. Header field 7 : statistic type ACTIVE statistic types consist of numbers from which other statistics can be computed by the display software. PASSIVE statistic types consist of pre-computed numbers that can only be found and displayed. ACTIVE TYPES: SAL1L2(*) Anomaly L1 and L2 values for scalars (5 values) SL1L2(*) L1 and L2 values for Scalars (5 values) VAL1L2(*) Anomaly L1 and L2 values for vectors (7 values) VL1L2(*) L1 and L2 values for Vectors (7 values) FHO<>* F,H, and O (three values), where F = Forecasted fraction above/below threshold H = Correct fraction above/below threshold (hits) O = Observed fraction above/below threshold PASSIVE TYPES: ACORR(*) Anomaly correlation ACORWG Anomaly correlation for waves 1-20, 1-3, 4-9, 10-20 CORR(*) Correlation MAXE(*) Maximum difference RMDIF RMS & MEAN differences (see below) RMSE(*) Root Mean Square Error TENDCORR(*) Tendency correlation ???(*) User defined B<>* Bias above/below threshold CSI<>* Critical Success Index above /below threshold ETS<>* Equitable threat score above/below threshold FAR<>* False alarm rate above/below threshold PA<>* Postagreement above/below threshold PF<>* Prefigurance above/below threshold POD<>* Probability of detection above/below threshold TS<>* Threat score above/below threshold The qualifier parenthetically enclosed following the statistic type may be any character string. The qualifier is optional. The searching software must match both the parameter name and the qualifier to find the statistic values. The scalar anomaly L1L2 data are composed of five numbers in addition to the data count: MEAN [f-c], MEAN [o-c], MEAN [(f-c)*(o-c)], MEAN [(f-c)**2], MEAN [(o-c)**2]. The scalar L1L2 data are composed of five numbers in addition to the data count: MEAN [f], MEAN [o], MEAN [f*o], MEAN (f**2), MEAN (o**2). In these expressions, f are forecast values, o are observed values, and c are climatological values. The vector anomaly L1L2 data are composed of seven numbers in addition to the data count: MEAN [uf-c], MEAN [vf-c], MEAN [uo-c], MEAN [vo-c], MEAN [(uf-c)*(uo-c)+(vf-c)*(vo-c)], MEAN [(uf-c)**2+(vf-c)**2], MEAN [(uo-c)**2+(vo-c)**2] The vector L1L2 data are composed of seven numbers in addition to the data count: MEAN [uf], MEAN [vf], MEAN [uo], MEAN [vo], MEAN [uf*uo+vf*vo], MEAN [uf**2+vf**2], MEAN [uo**2+vo**2] Note that the statistic type determines whether vector or scalar treatment is appropriate in the computation of the following statistical quantities for SAL1L2, VAL1L2, SL1L2, VL1L2: variance and standard deviation of forecast values variance and standard deviation of observed values root mean square error bias covariance correlation In the case of thresholds, the information is not enclosed in parentheses, but it is given as a real number preceded by either < or >, according to whether the value is an upper bound or a lower bound, respectively. The statistic types followed by <>* in the listing above must ALWAYS be accompanied by a threshold qualifier. The searching software will be binning this kind of data on the basis of the thresholds. Note that F, H, and O can be used to compute FAR, TS, ETS, POD, PA, B, PF, and CSI. Diagnostic software will be included to compute the latter from the former, which MUST be stored under the FHO statistic type. The values of F, H, and O are always entered as decimal values between 0 and 1.0. The number of events is simply the product of the value and the count. ACORWG is composed of five numbers, the first being the data count. RMDIF is composed of five numbers: the data count, RMS (f-a), MEAN (f-a), RMS (f-c), and MEAN (f-c), where f is forcast, a is analysis, and c is climatology. Header field 8 : parameter identifier APCP/12 12-h Accumulated total precipitation APCP/24 24-h Accumulated total precipitation CPCP/12 12-h Convective precipitation CPCP/24 24-h Convective precipitation H Height above ground level Knnn NCEP parameter GRIB type nnn Kxxxxx 5-character NCEP (Russ Jones) identifier Q Specific humidity RH Relative humidity SLP Sea level pressure SPCP/12 12-h Grid scale precipitation SPCP/24 24-h Grid scale precipitation T Temperature (sensible) TV Virtual temperature U U wind component V V wind component VWND Vector wind WDIR Wind Direction WSPD Wind Speed Z Height Note that accumulation or averaging periods follow the parameter name with / as the separator. Header field 9 : level identifier Bx-y Constant pressure depth boundary layer Dx-y Depth Hx-y Height above ground level Px-y Pressure Sx-y Sigma Tx-y Potential temperature Zx-y Height ATMOS Entire atmosphere FRZDN Lower freezing level FRZUP Upper freezing level MSL Mean Sea Level MWND Maximum wind SFC Surface TROP Tropopause where x-y gives the bounding values of the levels for a layer. If -y is not given, then a single level value is specified. For B, D, H, P ,S, T, and Z, either x or x-y must ALWAYS be specified. Data field : count followed by data value(s) For data combination, the count will always multiply the data value before summing. The counts will be summed also.