Process to add new datasets to NOMADS --

The more complete guide (20100505)

4/16/2010


When EMC would like a new model data set added to the production NOMADS high availability (server) system, there are four items that EMC needs to provide to the NCO Production Management Branch’s Dataflow Team. Before these items are handed off to NCO, it is assumed that EMC has verified the accuracy of all data manipulated via the GRIB Filter Script and the DODS interface (GrADS Control File Template) on the EMC NOMADS server. {It is recommended that the development servers, nomad[1,3] can be used to make these tests.}


The information should be handed off to NCO by submitting the items described below to ncep.pmb.dataflow@noaa.gov. NCO/PMB will configure the new dataset on the NOMADS staging server and then ask EMC to confirm the dataset is available for "http" download {Test that the reception of the data set to the NOMADS high availability server by DBNET was successful}, able to be retrieved via the GRIB filter, and can be properly displayed through DODS. Once EMC has confirmed the configuration is correct, NCO/PMB will submit RFCs to the Change Control Board to implement the changes.


  1. Listing of Input Data

When a new dataset is added to the production NOMADS system the DBNet software needs to know what files to expect for that data set. PMB will need a list of all the GRIB2 files from /com that should be included in the new model data set.


  1. Control File Template

For the GDS/OPeNDAP/DODS interface to work properly the NOMADS software needs to have access to a GrADS control file "Template" configured to work on the data in GRIB2 format. The DBNet software running on the NOMADS production system generates a current control file based off an input ‘Template’ control file.


{HOWTO generate the GrADS control file "Template":


GDS/DODS/OPeNDAP returns values from user queries using the packed binary GRIB2 files which may be aggregated a number of ways. A GrADS control (.ctl) file for GDS/DODS/OPeNDAP is used to declare the description details of the GRIB2 data set and the aggregation of the individual GRIB2 files in the data set. That is, the NOMADS server will use a single data entry to aggregate multiple data files and handle them as if they were one individual file. The individual data files must be identical in all dimensions except time, and the time range of each individual file must be indicated in its filename which is the case for almost all datasets in the NCEP suite of model products. The template can be made manually but a good way to do it is to make the template by first using the g2ctl tool (http://www.cpc.ncep.noaa.gov/products/wesley/g2ctl.html) from all of the GRIB2 files in the data set. For a given cycle, e.g., t00z, all the dataset component files should be present in a directory (in the way that they will be present on the server), we choose, as an example, the NAM-Alaskan files, "nam.t00z.awak3d*.grb2.tm00", meaning from the beginning forecast "nam.t00z.awak3d00.grb2.tm00" ... until the last forecast time "nam.t00z.awak3d84.grb2.tm00" for our aggregation and bring these files to a directory that has write privileges. One can not write in /com, we use the IBM-SP but this can also be done in any linux environment with a late version of GrADS (at least ...a3...), g2ctl, gribmap2 and wgrib2.

Step 1: Execute the g2ctl (grib2-to-ctl) command to make a control file in the GRIB2 file directory that contains the data set, for this example there are 29 GRIB2 files per cycle present as "nam.t00z.awak3d*.grb2.tm00". Issuing the command:

/u/wx51we/bin/g2ctl nam.t00z.awak3d%f2.grb2.tm00 > awak3d_orig.ctl

results in the control file, awak3d_orig.ctl where "%f2" indicates a forecast time template, that is, g2ctl will seek all files in the local directory with the above names and keep track of all forecast time values. Check that one can in fact make a Grads plot from this control file, (remember to run "gribmap2.pl -i awak3d_orig.ctl"). The grads commands, for example, to animate the first 10 mean sea level pressure forecasts for this test would be:

--

grads

<CR>

open awak3d_orig.ctl

set t 1 10

d prmslmsl

quit

--

If the mean sea level pressure animaton displays OK, then all is well, otherwise the data files have not been transfered properly. For Grads questions see http://www.iges.org/grads/gadoc/index.html


Step 2): The resulting .ctl file is then altered (changing a half dozen lines) in the following way:

line 1: The file name produced by g2ctl in this DSET line is altered to contain changeable dir/date/time/cycle, keywords as "(dir)"/"(date)"/"(cdate)"/"(hour)" respectively. The forecast time should already be present by virtue of g2ctl finding all of the files present. The information is conveyed by keywords in parenthesis so the line in our example,

"dset ^nam.t00z.awak3d%f2.grb2.tm00"

generated by g2ctl is replaced with the characters (the "^" is removed for fully qualified directory names):

"dset (dir)nam.t(hour)z.awak3d%f2.grb2.tm00"

where the WOC or DEV servers understand "(dir)" to mean the local directory name as determined by the server system administrator program (SAP) and replaced automatically by an internal sed command or equivalent which also determines the "(hour)", the cycle hour and other keywords. The %f2 will aggregate over all forecast times, f00 to the end according to the TDEF command (see below).


line2: The .idx, binary Index file name, is made more unique by including the "(hour)".

"index ^nam.t00z.awak3d.grb2.tm00.idx" produced by g2ctl is changed to

"index ^nam.t(hour)z.awak3d.grb2.idx"


line3: The title text of the data set. The owner/creator of the dataset has responsibility to make a good one line definition/title. For this Alaskan NAM data set produced by g2ctl, the title line: "title nam.t00z.awak3d.grb2.tm00" is changed to

" title Alaska 242 3-hour NAM fcst starting from (date), downloaded (cdate)",

where (date) is the GRIB2 internal file date and "(cdate)" is the time the server received the dataset which is furnished by SAP program and/or replaced automatically by an internal sed command or equivalent on the NOMADS development servers.


line10 & 11: The XDEF and YDEF lines should be checked for the lon/lat "framing" of the dataset. For a global field this is usually global and no change is needed but for any unequally spaced grid or conformal mapping the data set creator/owner needs to determine how the dataset will be "framed" or take the g2ctl default. For the Alaskan grid example rather than show the default framing which defaulted to the entire hemisphere for the small Alaskan area, which would cause an excess of "missing values" outside the small Alaskan domain, by setting the "XDEF" command with options: #Points Type Starting_lon Scaling:

"xdef 1779 linear 0. 0.202360876897133"

"ydef 297 linear 30.1011804384486 0.202360876897133"

generated from g2ctl is changed, the number of points and starting value to be a reasonably selected frame area:

"xdef 775 linear 140. 0.202360876897133"

"ydef 297 linear 30.1011804384486 0.202360876897133"

(we only change the XDEF as we agree with the g2ctl choice for the YDEF values)

to encompass the entire Alaskan grid but not to many outside (missing) points.


line 12: The TDEF giving the number of forecast times

"tdef 29 linear 00Z04may2010 3hr"

is changed to

"tdef 29 linear (date) 3hr"


This ends the changes to the g2ctl tool generated control file. This control file can be used to test OPeNDAP on the development server, nomad3 if the data are also present. The same template can be given to NCO/WOC who will incorporate any changes to the keywords to accomplish the same function.


We have shoen the NAM aggregation using a template over forecast time above but one may also aggregate over the ith ensemble component if pressent. This can be useful for ensemble runs or ensemble climate runs making one entry point have access across all GRIB2 file ensemble components. The requirement for this to work is that the GRIB2 files correctly have their header information declare that the data (file) is an ensemble according to the GRIB2 standard documentation (http://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc.shtml). Templating over forecast time is shown in the above mentioned example using templating component commands, "% <substitution><value>" like "%f2", in conjunction with the TDEF. Templating over ensembles using the ensemble definition (EDEF) can also be done as well as other combinations. See http://www.iges.org/grads/gadoc/gadocindex}




  1. GRIB Filter Script

The ability to “slice and dice” model data requires a cgi-bin Perl script for each model data set. This Perl script runs the GRIB filter feature available in NOMADS, including creating the webpage that allows the users to choose the parameters, levels, and domain they want to obtain. The script needs to contain a listing of the model parameters and levels that should be made available from the GRIB2 files via NOMADS. The parameter and level names are provided in the wgrib2 output nomenclature.


{The script needed for the GRIB2 slicer and dicer, "g2subset" (ftp2u)

To test a new dataset on the development NOMADS server one first needs to get either a static or updating set of data on the server to test on. There are a lot of examples for this using secure shell from the IBM-SP in the /global/save/nomad/ncep_nomad directory. This can be done with a secure shell file transfer or other means. Once the files are present on the development server, eg., nomad3, a script can be written to access the g2subset program. There is space available on the development servers for this sort of testing. The script setup to test g2subset, and offer to NCO, is below for the GFS 0.5 degree files:


#!/usr/bin/perl -w -I/home/wd23ja/bin

require ("g2subsetmod.pl");

$dir="/var/ftp/pub/gfs/rotating-0.5";

&g2sub_main($dir);

exit;


All the lines except the $dir (line 3) remain unchanged and will be updated by the system administration on the WOC.


line 3: Changes to the directory for the new dataset. $dir="/var/ftp/pub/... will remain the same but a dataset owner/creator needs to name a directory for testing and suggest one to NCO. The owner/creator needs to decide on a "reasonable" name for the directory that will hold the data. For the Alaskan data the directory was /var/ftp/pub/nam_Alaska. The NCO/WOC system administration will make the necessary changes. So make up a reasonable name for the dataset directory by checking existing names on the NOMADS WOC high availability server by checking the directory names there by examining the "http" link. The above script is used to test on the nomad3 development server and the directory and file names are needed to test in the dev environment but are (changed back to) left as in operations.}


  1. Dataset Description

For each dataset in NOMADS there is a short informational web page describing the dataset. This page can contain a description of the model, links to documentation, other dissemination locations for the data, and any other information the developers want to include. Provide a concise summary for this page and NCO will format it into the actual webpage entry.


{This includes the name of the dataset entry on the Welcome page of the web host. The owner/creator should check the names of directories in the data entry table at http://nomads.ncep.noaa.gov to create a new area or use an existing catagory. For example, NAM(WRF-NMM) was chosen for the NAM but a separate line in the table was given for "AK NAM(WRF-NMM)" for the Alaskan run used in the example above. }