Dennis Keyser - NOAA/NWS/NCEP/EMC
(Last Revised 10/29/2008)
Please take a moment to read the Disclaimer
for this non-operational web page.
The dumping of observational data is the first step in each NCEP network production suite. At the appropriate network data cutoff time, up to three separate jobs are executed simultaneously - two dump jobs and a tropical cyclone processing job. Once the two dump jobs have completed, a separate dump post processing job is initiated.
A. Copying Files For Later Use By Analyses
In the Global Forecast System (GFS) and Global Data Assimilation System (GDAS) network runs, GRIB files containing current analyses of snow depth, ice distribution, and sea-surface temperature from NESDIS are copied from the NCEP IBM Central Computer System (IBM-CCS) /dcom database into network-specific (“/com”) directories. These fields will be read later by the Global Gridpoint Statistical Interpolation (GSI) analysis. (Note: If the current files are not available, then one-day old files are copied.)
In the North American Model (NAM) and North American Data Assimilation System (NDAS) network runs, GRIB files containing current analyses of snow depth and snow cover from NESDIS are copied from the NCEP IBM-CCS /dcom database into network-specific /com directories. These fields will be read later by the Regional Gridpoint Statistical Interpolation (GSI) analysis. (Note: If the current files are not available, then one-day old files are copied.)
In the Rapid Update Cycle (RUC) network, GRIB
files containing snow cover analyses from NESDIS are copied from the
NCEP
IBM-CCS /dcom database into network-specific /com directories.
These
fields will be read later by the RUC 3DVAR-analysis. (Note: If
the
current files are not available, then one-day old files are copied.)
The process of accessing the observational database and retrieving a select set of observational data is accomplished in several stages by a number of FORTRAN codes. This retrieval process is run in all of the operational networks many times a day to assemble “dump” data for model assimilation. The script that manages the retrieval of observations provides users with a wide range of options. These include observational date/time windows, specification of geographic regions for filtering (via either a lat/lon box, a center point lat/lon and radius, or a lat/lon grid point mask), data specification and combination, duplicate checking and bulletin “part” merging, and parallel processing.
The primary retrieval software performs the initial stage of all data dumping by retrieving subsets of the NCEP IBM-CCS BUFR /dcom data base that contain all of the data base messages valid for the data type, geographical filter and time window requested by a user. (Recall that the /dcom data base is continuously updated with new data as the data GTS decoder and satellite ingest jobs run.) The retrieval software looks only at the date in Section One of the BUFR message to determine which messages to copy for a particular data type. This results in an observing set containing possibly more data than was requested, but allows the software to function very efficiently.
The second stage of the process performs a final 'winnowing' of the data to an observing set with the exact time window requested1. This is done within the codes which remove exact- or near-duplicate reports (the nature of which is data type dependent) and merge bulletin parts for upper-air reports.
1Normally, the six-hour cycle GFS, GDAS,
and
Climate Data Assimilation System (CDAS) network runs dump BUFR data
globally
over a six-hour time window centered on the analysis time. The
six-hour
cycle NAM and NDAS network runs normally dump data within the expanded WRF-NMM-model
domain over a six-hour time window centered on the analysis time (the
NDAS
assimilates data and updates every three-hours). The one-hour
cycle
upper-air RUC network runs normally dump BUFR data within the expanded WRF-NMM-model
domain (a superset of the RUC
domain)
over a one-hour time window centered on the analysis time. The
one-hour cycle surface RUC network runs normally dump BUFR data
within the expanded RUC domain over a one-hour time window centered on
the analysis time.
The final stage of the process is the application of manual quality
marks to the data extracted. The quality marks are provided by
personnel
in two groups: the NCEP/NCO
Systems Integration Branch (SIB) and the NCEP/Ocean
Prediction Center (OPC). The NCEP/NCO/SIB Senior Duty
Meteorologists
(SDMs) can apply quality markers to individual variables in many
observational
data types such as rawinsonde, dropwinsonde, PIBAL, aircraft, satellite
wind, surface land, surface marine, wind profiler and Vertical Azimuth
Display (VAD) wind reports. These markers either ensure that the
datum marked will be assimilated by the particular analysis regardless
of any subsequent quality control on it (called a "keep" flag), or
ensure
that it will NOT be assimilated (called a "purge" flag). The SDMs
use an interactive program on the IBM-CCS which initiates the off line
execution
of automated quality control programs run in the subsequent PREPBUFR
processing steps and then review the programs’ decisions before
making
assessment decisions. The SDMs use satellite pictures,
meteorological
graphics, continuity of data, input from reporting stations, past
station
performance and horizontal data comparisons (buddy checks) to decide
whether
or not to override quality control flags from the automated
programs.
All flags are stored in an ASCII file on the IBM-SP for use during this
data retrieval process. The NCEP/NCO/SIB also maintains a list
of data that should be rejected based on, among other things, monthly
statistics
provided from the NCEP and other international centers, and feedback
from
data producers. All rejected data receive either a "reject" or
"purge"
flag here. The flags are appended to the same ASCII file used for
storing the SDM quality marks. NCEP/OPC personnel perform
real-time
interactive quality control of global surface marine meteorological
data
and sea surface temperature using a graphical interactive program
called
CREWSS
(Collect, Review, and Edit Weather data from the Sea Surface).
CREWSS provides an evaluation of the quality of the marine surface data
provided by ships, buoys (drifting and moored), Coastal Marine
Automated
Network (CMAN) stations, and tide gauge stations by comparing the
observations
to GFS model first guess fields for all four synoptic periods. Data
that
differ from the first guess fields by more than certain amounts are
then
examined via techniques that involve buddy checks versus neighboring
platforms,
the platform’s track, and a one week history for each platform.
The
NCEP/OPC personnel can either mark these data according to their
quality,
here applying either a "keep" or "purge" flag, or they can correct
obvious
errors in the data, such as incorrect hemisphere, misplaced decimal,
etc.
(corrected data receive a "good" quality mark in the subsequent PREPBUFR
processing steps.) Upon completion of interactive
quality
control, an ASCII text file containing all quality control decisions
and
corrections is then uploaded to the IBM-CCS for use during this data
retrieval
process.
Each data type selected for dumping is associated with a unique
mnemonic
string which represents a particular BUFR type and subtype in the /dcom
database. The complete list of BUFR data types is shown in Table
1.a. This includes obsolete data types, future data types,
and
current data types which are currently not dumped in any network
job.
In order to limit the number of output dump files in the operational
network
jobs, like data types are grouped together and represented by sequence
or group mnemonics. The data group mnemonics used to generate
dump
files in the various NCEP networks (including obsolete types) are
read by either the subsequent PREPBUFR
processing steps , by the subsequent analysis codes, or by neither
according
to network. See Table
1.b for a listing of data group mnemonic dumps read by the PREPBUFR
processing steps and
Table
1.c for a listing of data group mnemonic dumps read by the analysis
codes.
C. Re-processing of BUFR Observational Data Dump Files
Some of the BUFR data dump files are re-processed into new BUFR files such that they can be used properly by the subsequent PREPBUFR processing or analysis programs.
1. SSM/I data - all network runs: The “reports” in the SSM/I products BUFR dump files (group mnemonics “ssmip” or “ssmipn”, see Table 1.b) consist of orbital scans, each of which contain 64 retrieval footprints of one or more products. The program PREPOBS_PREPSSMI unpacks selected products out of the scans, superobs them onto a one-degree latitude/longitude grid (optional in some network runs) then encodes them as individual “reports” in the output, re-processed, BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “spssmi” (see Table 1.b, key for superscript 2 in “NET” column). The GDAS, GFS and CDAS network runs superob the “operational” rainfall rate product generated at FNMOC, and the surface ocean wind speed and total column precipitable water products generated using a Neural-Net 3 algorithm (OMBNN3) developed by the Marine Modeling Branch of NCEP/EMC. The NAM and NDAS network runs superob the “operational” surface ocean wind speed and total column precipitable water products generated at FNMOC. The upper-air RUC network run processes the same products as the NAM and NDAS network runs but it does not superob the data.
2. QuikSCAT data - NAM, NDAS, GFS, GDAS and CDAS network runs: Each “report” in the QuikSCAT BUFR dump file (group mnemonic “qkscat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information. The program WAVE_DCODQUIKSCAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports over land, reports with missing nudged wind vector, reports with missing model wind direction and speed, reports with probability of rain greater than 10%, and reports at the edges of the orbital swath. Reports passing checks are then superobed onto a one-half degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “qkswnd” (see Table 1.b, key for superscript 1 in “NET” column).
3. TRMM TMI data - GFS, GDAS and CDAS network runs: Each “report” in the TRMM TMI BUFR dump file (group mnemonic “trmm”, see Table 1.c) is at full footprint resolution. The program BUFR_SUPERTMI unpacks each report checking the validity of the satellite id, observation date and total precipitation observation. Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file. The output filename contains the qualifier “sptrmm” (see Table 1.c, key for superscript 1 in “NET” column). The Global GSI analysis (GFS and GDAS network runs only) reads the superobed data directly from the reprocessed "sptrmm" BUFR dump file (these data do not pass through the PREPBUFR processing steps).
4. WindSat data - GFS, GDAS and CDAS network runs: Each “report” in the WindSat BUFR dump file (group mnemonic “wndsat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information. The program BUFR_DCODWINDSAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports not explicitly over ocean, reports with missing nudged wind vector, reports with missing model wind direction and speed, and reports with a "bad" or "no retrieval" EDR quality flag. Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing. The output filename contains the qualifier “wdsatr” (see Table 1.b, key for superscript 5 in “NET” column).
===> Dump Job 2, running simultaneously with Dump Job 1, performs the following single step:
This currently runs in only the NAM and NDAS networks. The processing is identical to that described in Dump Job 1, Step B above. The dumping of WSR-88D Level II radial wind and reflectivity data is performed in a separate job from the dumping of all other data in the NAM and NDAS networks in order to save computation time since it takes almost as long to dump Level II data here as it takes to dump all other observational data in Dump Job 1.
In the GFS, GDAS, NAM and NDAS network runs, tropical cyclone bulletins valid for the current cycle from the Joint Typhoon Warning Center (JWTC) and Fleet Numerical Meteorology and Oceanography Center (FNMOC) are read from the NCEP IBM-CCS /dcom database and merged into the proper record structure by the program SYNDAT_GETJTBUL. Next, tropical cyclone bulletins valid for the current cycle from the NCEP/Tropical Prediction Center (TPC) are read from the TPC directory on the NCEP IBM-CCS (these are already in the proper record format). Finally, manually generated tropical cyclone bulletins are read from the NCEP IBM-CCS database. The latter can be generated by the NCEP/NCO Senior Duty Meteorologist (SDM) in the event that data from other sources are not available.
Next, the program SYNDAT_QCTROPCY runs in order to merge the tropical cyclone records from the various sources and perform quality control on tropical cyclone position and intensity information. Some of the checks performed include duplicate records, appropriate date/time, proper record structure, storm name/id number, records from multiple institutions, secondary variables (e.g. central pressure), storm position and direction/speed. The emphasis is on internal consistency between the reported storm location and prior motion. The output tropical cyclone vital statistics (tcvitals) file is then copied to the network-specific /com directories in the NCEP IBM-CCS. This file is read in the next tropical cyclone relocation step in all networks and also later in the PREPBUFR processing by the program SYNDAT_SYNDATA in the NAM and NDAS networks in order to generate tropical cyclone bogus wind reports.
Post-processing of BUFR Observational Data Dump Files
The completion of the data dump job(s) triggers a job which performs post-processing on the data dump files just created. This job does not produce any output necessary to the successful completion of the analysis/forecast network [indeed it runs simultaneously with the PREPBUFR Processing Job which is also triggered by the completion of the data dump job(s)].
The first job step prepares a table of data counts for the various
reports just dumped via the execution of the program BUFR_DATACOUNT. These counts are compared to the running
average over the past 30 days for each report type for the particular
network and cycle time. If the current dump count for a
particular type is considered abnormally low (for most report types
this means more than 50% below the 30 day average), a dump alert is
generated. The action taken for low dump counts depends upon the
report type. For those types considered "critical" to the
subsequent assimilation system, a low dump count generates diagnostics
and triggers a code failure and a return code of 6 in the dump alert
job . For those types considered "moderately-critical" (all
types that are assimilated which are not in the "critical" category), a
low dump count generates diagnostics and a non-fatal return code of 5
in the dump alert job. For those types considered "non-critical"
(all types that are not assimilated in the particular network), a low
dump count generates diagnostics and a non-fatal return code of 4 in
the dump alert job. In all cases, a complete listing of dump
counts vs. the 30 day average, along with those types which are either
low or high (for most report types this means more than 200% above the
30 day average) is sent to the SDM. High dump counts do not
generate non-zero return codes in the dump alert job but they do
generate diagnostics. Trends in the 30 day averages vs. those for
3-, 6-, 9- and 12-months ago are also recorded for the SDM (report
types trend low vs. one of these previous averaging periods if the
current 30 day average is more than 20% below the 30 day average for
that period, or report types trend high vs. one of these previous
averaging periods if
the current 30 day average is more than 20% above the 30 day average
for
that period). Currently this dump count and alert processing runs
only in the NAM, GFS and GDAS networks.
The next job step executes the program BUFR_REMOREST which removes or masks, from the appropriate dump files, certain data types that are restricted (either by the data producers themselves or by the WMO) from redistribution outside of NCEP. NCEP/NCO has created a very strict policy on who may or may not have access to restricted data. The resulting dump files, gleaned of all restricted data, are given a suffix qualifier of ".nr" in the network-specific /com directories on the NCEP-CCS.
The next dump post-processing job step executes the program BUFR_LISTDUMPS which generates files containing text listings of all reports in the various BUFR data dump files. These text files are then copied to the network-specific /com directories on the IBM-CCS in order to provide diagnostic information for troubleshooting problems in the data, etc. Files containing listings of dump files that have been stripped of all restricted data are given the suffix qualifier ".nr".
The post-processing job also contains a step which generates unblocked versions of the
BUFR
data dump files and copies them to the /com directories (again, files
containing
unblocked forms of dump files that have been stripped of all restricted
data are given the suffix qualifier ".nr"). The unblocked files
are
then copied to servers for use by organizations outside of NCEP.
(The native blocking on the IBM-SP machine is Fortran 77.)
Restricted data are not copied to these servers.
Finally, in the all networks, the final post-processing job of the
day performs a data average processing step via the execution of the program BUFR_AVGDATA. This updates the 30 day
running average for each report type dumped, for each cycle for which a
dump is generated. These "current" 30 day averages are saved in
text files, according to the network, in the
"/com/arch/prod/avgdata" directory on the NCEP CCS. These
files are used by the dump alert processing in the NAM, GFS and GDAS
networks in order to generate alerts for high or low dump counts for
the current dump vs. the current 30 day average (see paragraph two in
this section). For the final post-processing job of a particular
month, the current 30 day average for the NAM, GFS and GDAS networks is
saved off in a separate file for that month in the same "/com"
directory as the current 30 day average files. These past month
30 day average files are used to check for high and low trends in the
current NAM, GFS or GDAS 30 day average for a particular report vs. the
30 day average for 3-, 6-, 9- and 12-months ago (again, see paragraph
two in this section). Only the most recent 12 months of 30 day
averages are saved here for the NAM, GFS and GDAS networks.
The NCEP production suite schedule, for those networks which
originate
with a dump of observational data, is shown in Table
2. “DUMP” indicates the name of the Dump Job 1, "DUMP2” indicates the name of the Dump Job 2, "TROPCY" indicates the name of the Tropical Cyclone Processing Job (with "TROPC1" for the relocation part only and "TROPCY2" for the q.c. part only in the NDAS network), "DPOST”
indicates
the name of the Dump Post-processing Job, "PREP" (and "PREP1" and "PREP2" in the CDAS network) indicates the name of the PREPBUFR
Processing Job,
"ANAL” indicates the name of
the Analysis Job, "FCST” (and "FCSTH" and "FCSTL" in the GFS
network) indicates the name of the Forecast
Job, "PPOST" (and "PPOST1" and "PPOST2" in the CDAS network)
indicates the name of the PREPBUFR Post-processing Job, "GESS" in the
RTMA network indicates the name of the job which retrieves the
first-guess and "APOST" in the RTMA network indicates the name if the
Analysis Post-processing Job. The
initiation of the dump jobs ("DUMP" and "DUMP2") and the tropical
cyclone processing job ("TROPCY", or "TROPCY1" in the NDAS network)
are triggered by the clock at the times
indicated.
All subsequent jobs run in sequence.