NOAA logo - Click to go to the NOAA homepage National Weather Service NWS logo - Click to go to the NWS homepage
EMC Logo

NCEP Home > EMC Home > BUFRLIB > BUFRLIB Table of Contents > Introduction
Printer image icon. Printer Friendly Version


Throughout the discussion of the BUFRLIB software, it will be helpful for the reader to visualize a BUFR message as containing one or more BUFR data subsets, each containing the data for a single report from a particular observing site at a particular time and location. In turn, each data subset (i.e. report) will typically contain, in addition to time and location information, data values for items such as e.g. pressure, temperature, wind direction and speed, humidity, etc. for that particular observation. Finally, BUFR messages themselves are typically stored in files containing many other BUFR messages of similar content. Therefore, if we were to summarize in a top-down fashion, we could say:

"A BUFR file contains one or more BUFR messages, each containing one or more BUFR data subsets, each containing one or more BUFR data values."

If nothing else, remembering this hierarchy will at least make the user interface to the BUFRLIB software more intuitive, and for that reason it is worth keeping in mind.

So, without further ado, let's dive in!

As a whole, the BUFRLIB software is a library containing more than 250 different subprograms and functions; however, a typical user will never directly call more than 10-20 of them, and the rest are lower-level routines that the software uses to accomplish various underlying tasks and which can therefore be safely treated as "black box" from a user perspective. Do note, however, that most of the software is written in FORTRAN and is intended to be used as a library rather than as a "stand-alone" program; therefore, the user must possess at least a rudimentary knowledge of FORTRAN in order to be able to construct an application program which calls the proper BUFRLIB subroutines in the proper order. Several example application programs are provided later on within the documentation and can be used as references; however, these obviously only begin to touch on the myriad number of applications in which the software can be used.

A typical user's first encounter with the BUFRLIB software will most likely be as follows:


	Input arguments:
	    LUBFR	INTEGER		Logical unit for BUFR file
	    CIO		CHAR*(*)	'IN' or 'OUT' (or 'APX', 'NUL', 'NODX',
                                                       'SEC3' or 'QUIET')
	    LUNDX	INTEGER		Logical unit for BUFR tables

This subroutine identifies to the BUFRLIB software a BUFR file that is connected to logical unit LUBFR. The argument CIO is a character string describing how the file will be used, e.g. 'IN' is used to access an existing file of BUFR messages for input (i.e. reading/decoding BUFR), and 'OUT' is used to access a new file for output (i.e. writing/encoding BUFR). An option 'APX' is also available which behaves like 'OUT', except that output is then appended to an existing BUFR file rather than creating a new one from scratch, and there are also some additional options 'NUL' and 'NODX' which can likewise be used instead of 'OUT' for some very special cases as described later on in the documentation. There is also an option 'SEC3' which can be used in place of 'IN' for certain cases when the user is attempting to read BUFR messages whose content and descriptor layout are unknown in advance. However, all of these additional options each behave enough like 'IN' and 'OUT' that, except where otherwise noted, it will be sufficient to further consider only the 'IN' and 'OUT' cases for the purposes of this introductory discussion. The only other possible option is 'QUIET', which is a special case not involving the connection of any actual logical unit to the BUFRLIB software and which will be covered later.

The third and final call argument LUNDX identifies the logical unit which contains the definition of the DX BUFR tables to be associated with unit LUBFR. Except when CIO='SEC3', every BUFR file that is presented to the BUFRLIB software must have a DX BUFR tables file associated with it, and these tables may be defined within a separate ASCII text file (see Description and Format of DX BUFR Tables for more info.) or, in the case of an existing BUFR file, may be embedded within the first few BUFR messages of the file itself, and in which case the user can denote this fact to the subroutine by setting LUNDX to the same value as LUBFR. In any case, note that LUBFR and LUNDX are logical unit numbers; therefore, the user within his or her application program must have already associated these logical unit numbers with actual filenames on the local system, typically via a FORTRAN "OPEN" statement.

When an existing BUFR file is accessed for input (CIO='IN'), the associated DX BUFR tables are stored internally within the BUFRLIB software and are referenced during all subsequent processing of that file. Likewise, when a file is accessed for output (CIO='OUT'), the associated DX BUFR tables are still stored internally for subsequent reference; however, the file itself is also initialized by writing the BUFR table information (as one or more BUFR messages) to the beginning of the file, except when CIO='NODX', and in which case the writing of these additional messages is suppressed.

At this point, a brief mention of the CIO='SEC3' option is in order. As noted above, this is the only value of CIO (other than 'QUIET') where it is not necessary to provide pre-defined DX BUFR tables via LUNDX. Instead, this option instructs the BUFRLIB software to unpack the data description section (Section 3) from each BUFR message it reads and then decode the contents accordingly. In this case, it is necessary to provide a set of BUFR master tables containing listings of all possible BUFR descriptors (see Description and Format of BUFR Master Tables for more info.), but otherwise no prior knowledge is required of the contents of the messages to be decoded.

Otherwise, whenever CIO='QUIET' is specified, this is a special case which does not involve the actual connection of any logical unit to the BUFRLIB software. In this case, the value of the input argument LUBFR is ignored, and the value of LUNDX is an integer value from the below list indicating the level of verbosity of error messages and diagnostics to be written by the BUFRLIB. The default value is 0, but this can be modified as shown to suit the user's preference and then remains in effect for all logical units for the life of the application program, or until a subsequent additional call is made to this subroutine with CIO='QUIET' and a new specified verbosity level in LUNDX:

	-1	No printout except for catastrophic messages
	 0	Limited printout (default)
	 1	Warning messages are printed
	 2	Warning and informational messages are printed

In any case, messages are normally printed to standard output unless the user provides a special inline version of subroutine ERRWRT, as described later on within the documentation.

Currently, as many as 32 BUFR files can be simultaneously connected to the BUFRLIB software for processing. Of course, each one must have a unique LUBFR number and be defined to the software via a separate call to subroutine OPENBF.

Since OPENBF is used to initiate access to a BUFR file, it stands to reason that CLOSBF would be used to terminate this access:


	Input argument:
	    LUBFR	INTEGER		Logical unit for BUFR file

This subroutine severs the connection between logical unit LUBFR and the BUFRLIB software. It is a good idea to call CLOSBF for every LUBFR that was identified via OPENBF; however, it is especially important when writing/encoding a BUFR file in order to ensure that all output is properly flushed to LUBFR. It is also worth noting that CLOSBF will, before returning, actually execute a FORTRAN "CLOSE" on logical unit LUBFR, whereas it was previously noted that subroutine OPENBF did not itself handle the FORTRAN "OPEN" of the same LUBFR.

Now that we have covered the library routines that operate on the BUFR file level, and recalling the hierarchy structure that was previously discussed, it is now time to continue on to the BUFR message level:

When LUBFR points to a BUFR file for input:



	Input argument:
	    LUBFR	INTEGER		Logical unit for BUFR file

	Output arguments:
	    CSUBSET	CHAR*(*)	Table A mnemonic for BUFR message
	    IDATE	INTEGER		Section 1 date-time for BUFR messsage
	    IRET	INTEGER		Return code:
					  0 = normal return
					 -1 = no more BUFR messages in LUBFR

Subroutine READMG reads the next BUFR message from the given BUFR file pointed to by LUBFR. The associated function IREADMG does the same thing, but returns IRET as its function value which can then, e.g. be directly utilized as the target variable in an iterative program loop. The choice of which to use is merely one of programming preference and/or personal style, as both have the same net effect. In either case, the next BUFR message is read into internal arrays within the BUFRLIB software (from where it can be easily manipulated or further parsed) rather than passed back to the application program directly. If the return code IRET contains the value -1, then this indicates that there are no more BUFR messages (i.e. end-of-file) within the given BUFR file, and in which case the file itself will have been automatically disconnected from the BUFRLIB software via an internal call to subroutine CLOSBF. Otherwise, if IRET returns with the value 0, then the character argument CSUBSET will contain the Table A mnemonic corresponding to the type of message that has just been read (see Description and Format of DX BUFR Tables for further information about Table A mnemonics), and the integer argument IDATE will contain the date-time, in format YYMMDDHH, that was contained within Section 1 of the message (although it is also possible to have IDATE returned in format YYYYMMDDHH; this is accomplished via a preceding call to subroutine DATELEN, as shown within some of the example programs).

Alternatively, when LUBFR points to a BUFR file that has been opened for output, the following message-level subroutines are most commonly used:



	Input arguments:
	    LUBFR	INTEGER		Logical unit for BUFR file
	    CSUBSET	CHAR*(*)	Table A mnemonic for type of BUFR
					message to be opened
	    IDATE	INTEGER		Date-time to be stored within
					Section 1 of BUFR messsage

Both of these subroutines are similar in that they open and initialize, within internal arrays, a new BUFR message for eventual output to LUBFR, using the arguments CSUBSET and IDATE to indicate the type of message to be opened. The difference is that subroutine OPENMG will always open and initialize a new internal message, even if the CSUBSET and IDATE arguments have not changed since the previous call to OPENMG, whereas OPENMB will only open a new message if either CSUBSET or IDATE has changed, and otherwise will simply return while leaving the existing internal message unchanged, so that subsequent data subsets can be stored within the same internal message. For this reason, OPENMB is much more widely used, since it allows for the storage of an increased number of data subsets within each BUFR message and therefore improves overall encoding efficiency. Regardless, in the case of either subroutine, whenever a new BUFR message is opened and initialized, the existing internal BUFR message (if any) will be automatically closed and written to output via an internal call to the following subroutine:


	Input arguments:
	    LUBFR	INTEGER		Logical unit for BUFR file

thereby alleviating the user from having to directly do so within his or her application program. Furthermore, since, in the case of a BUFR file that was opened for input, each subsequent call to subroutine READMG will likewise automatically clear an existing message from the internal arrays before reading in the new one, it is rare to ever see subroutine CLOSMG called directly from within an application program!

Now, continuing on within our top-down hierarchy structure to the BUFR data subset (i.e. report) level, things now begin to get a little more complicated, because the order in which routines at this level are called with respect to routines at the data values level depends on whether the underlying BUFR file was opened for input or output access.

More specifically, if the BUFR file was opened for input access (and, of course, a successful call was subsequently made to subroutine READMG (or function IREADMG) in order to read a BUFR message into the internal arrays!), then the next step is to do the following in order to read a subset from that internal message:



	Input argument:
	    LUBFR	INTEGER		Logical unit for BUFR file

	Output arguments:
	    IRET	INTEGER		Return code:
					  0 = normal return
					 -1 = no more BUFR data subsets in
					      current BUFR message

As was the case previously with READMG, subroutine READSB has its own functional equivalent IREADSB which returns IRET as its functional value. Either way, a return code value of -1 within IRET indicates that there are no more data subsets within the given BUFR message and that, therefore, a new call to READMG (or IREADMG) is required in order to read the next BUFR message from the associated BUFR file before another subset can be read. At any rate, once a subset has been successfully read (as before, this reading is done into internal arrays!), then we are ready to call the values-level subroutines in order to retrieve actual data values from this subset.

If, on the other hand, the BUFR file was opened for output access, then the appropriate values-level subroutines must be called before calling the relevant subset-level routine WRITSB, which makes sense once we recognize that the function of this routine is to encode all of the data values that have been stored for the current subset and then pack that entire subset into the current message within the internal arrays. Put another way, we must store the data values to be contained within a subset before we can store the subset itself! Here is the routine:


	Input argument:
	    LUBFR	INTEGER		Logical unit for BUFR file

Again, this subroutine is called to indicate to the BUFRLIB software that all necessary data values for this subset have been stored and thus that the subset is ready to be encoded and packed into the current message for the BUFR file associated with logical unit LUBFR. However, we should note here that the BUFRLIB software will not allow any single BUFR message to grow larger than a certain size (usually 10000 bytes, although this can be increased via a call to subroutine MAXOUT as described later on in the documentation); therefore, it can happen that an attempt to pack a subset within the current message will not be possible due to a lack of remaining available space! If this occurs, then WRITSB will automatically flush the current message to logical unit LUBFR, open and initialize a new message using the same CSUBSET and IDATE values as were specified in the previous call to OPENMG or OPENMB, and then encode and pack the subset into that new message, all without any additional effort or worry on the part of the user's application program!

(As a side note, the default BUFR message size limit of 10000 bytes within the BUFRLIB software is a practical one based upon the current specifications of certain meteorological telecommunications networks and is not a limit imposed by the BUFR code form itself. Theoretically, at least according to the official WMO Manual #306, the only limit to the size of a BUFR message is the constraint that such a size (in bytes) must be representable as an integer of 24 bits or less so that it can be encoded within bytes 5-7 of Section 0 of the message. Since many applications may prefer (or even require?) a BUFR output message size larger than 10000 bytes, subroutine MAXOUT can be used in such cases, as shown later on in the documentation.)

At last, we have reached the proverbial "meat and potatoes" part of the discussion, where we now discuss the subroutines that are used to write/read actual data values to/from a data subset:




	Input arguments:
	    LUBFR	INTEGER		Logical unit for BUFR file
	    CMNSTR	CHAR*(*)	String of blank-separated mnemonics
					associated with R8ARR
	    MXMN	INTEGER		Size of first dimension of R8ARR
	    MXLV	INTEGER		Size of second dimension of R8ARR
					OR number of levels of data values
					to be written to data subset

	Input or output argument (depending on context of LUBFR):
	    R8ARR(*,*)	REAL*8		Data values written/read to/from
					data subset

	Output argument:
	    NLV		INTEGER		Number of levels of data values
					written/read to/from data subset

All three of these routines are similar, but there are some important distinctions. We'll focus first on the similarities, the most significant of which is basic functionality, in that each routine writes or reads specified values to or from the current BUFR data subset within the internal arrays, with the direction of the data transfer being determined by the context of LUBFR, i.e. if LUBFR points to a BUFR file that is open for input, then data values are read from the internal data subset; otherwise, data values are written to the internal data subset. The actual data transfer occurs through the use of the two-dimensional REAL*8 array R8ARR, which must be declared and dimensioned by the user within his or her application program, and whose actual first dimension MXMN must always be passed in as a call argument to each of the above routines. The call argument MXLV, on the other hand, contains the actual second dimension of R8ARR only when LUBFR points to a BUFR file that is open for input (i.e. reading/decoding BUFR); otherwise, whenever LUBFR points to a BUFR file that is open for output (i.e. writing/encoding BUFR), MXLV instead contains the actual number of levels of data values that are to be written to the data subset (and where this number obviously must be less than or equal to the actual second dimension of R8ARR!). In either case, the input character string CMNSTR always contains a blank-separated list of "mnemonics" (see Description and Format of DX BUFR Tables) which correspond to the REAL*8 values contained within the first dimension of R8ARR, and the output argument NLV always denotes the actual number of levels of those values that were written/read to/from the second dimension of R8ARR, where each such level represents a repetition of the mnemonics within CMNSTR. Note that, when LUBFR points to a BUFR file that is open for output (i.e. writing/encoding BUFR), we would certainly expect that the output value NLV is equal to the value of MXLV that was input, and indeed this is the case unless some type of error occurred in storing one or more of the data levels.

At this point we should mention that, except in the case of subroutine UFBSEQ, the correspondence between CMNSTR and the REAL*8 values listed within the first dimension of R8ARR is one-to-one, meaning that the mnemonics listed within CMNSTR are Table B mnemonics and correspond positionally to the values in the first dimension of R8ARR. UFBSEQ, on the other hand, may contain a Table A or Table D sequence mnemonic within CMNSTR, in which case the values in R8ARR then correspond to the sequence of Table B mnemonics which constitute that Table A or Table D mnemonic.

One important thing to note about all three of the above subroutines is that all data transfer is done via the use of the REAL*8 array R8ARR. Therefore, any data that are desired to be encoded into BUFR as character values (or, more officially, "CCITT IA5", which is basically just a fancy name for ASCII) must be converted from character into REAL*8 within the application program before storing such values into array R8ARR. Conversely, when LUBFR points to an input file, any data values read from R8ARR which correspond to character data must be converted by the application program from REAL*8 back into character format. In either direction, the conversion between REAL*8 and character (i.e. CCITT IA5) values is most easily accomplished in FORTRAN via an EQUIVALENCE between an array of each type, as shown within some of the example programs provided with the BUFRLIB documentation.

Another important thing to note is that all numeric (i.e. non-character) data values within R8ARR are in the exact units specified for the corresponding mnemonic within the appropriate BUFR table, without any scale or reference values applied. Specifically, this means that, when writing/encoding data values into a BUFR subset, the user needs only to store each respective value into R8ARR using the units specified within the BUFR table, and the BUFRLIB software itself will take care of any necessary scaling or referencing of the value before it is actually stored within the subset. Conversely, when reading data values from a BUFR input subset, the values returned in R8ARR are already de-scaled and de-referenced and, thus, are already in the exact units that were defined for the corresponding mnemonics within the relevant BUFR table. However, when a returned data value within R8ARR contains the value 10.0E10 (= 10.0 X 10**10 = 1.0 X 10**11 ), this indicates that the value for the corresponding mnemonic was "missing" (i.e. all bits set to 1) within the BUFR subset.

Now that we've covered the similarities between the above three subroutines, let's now take note of the differences, which relate mainly to the situational context within which each one may be used. Specifically, UFBINT is used for writing/reading data values corresponding to mnemonics which are part of a delayed-replication sequence, or for which there is no replication at all. As such, it is the most commonly-used of the three subroutines and is sufficient in-and-of-itself for many basic applications. UFBREP, on the other hand, must be used for mnemonics which are part of a regular (i.e. non-delayed) replication sequence or for those which are replicated via being directly listed more than once within an overall subset definition rather than by being included within a replication sequence. For example, consider the following cases, where the notation used is formally explained within Description and Format of DX BUFR Tables but will be covered in enough detail here in order to illustrate the concept:

To begin with, suppose that the BUFR tables file for a particular type of data contains the following definitions:

| WHTSEQ   | 303011 | WINDS-BY-HEIGHT SEQUENCE                                 |

| GPOT     | 007003 | GEOPOTENTIAL                                             |
| VSIG     | 008001 | VERTICAL SOUNDING SIGNIFICANCE                           |
| WDIR     | 011001 | WIND DIRECTION                                           |
| WSPD     | 011002 | WIND SPEED                                               |

| WHTSEQ   | GPOT  VSIG  WDIR  WSPD                                            |

The above defines a Table D sequence mnemonic with name "WHTSEQ" and which is composed of the Table B mnemonics "GPOT", "VSIG", "WDIR", and "WSPD" (in that order!). One can imagine such a sequence being utilized, for example, in the representation of winds-by-height within the subset definition for a rawinsonde report, where these four Table B mnemonics are all replicated at some number of sounding levels within the atmosphere above a particular reporting site. In determining which of the above three values-level subroutines to use, the key is in seeing how exactly the replication is defined within the actual subset definition! For example, if the subset definition contained:


then this would indicate that the replication is being done via 8-bit delayed replication, in which case we could, assuming that we were reading/decoding such data from an input BUFR file, use subroutine UFBINT as follows:


and in which case the return value NLV would indicate how many such sounding levels were available, and these would themselves be returned one per row within the array R8ARR, while the first four columns of R8ARR would themselves correspond to the values of the four Table B mnemonics, respectively, at each sounding level! Suppose, on the other hand, that the winds-by-height had been included in the subset definition as:


which would indicate that the replication is being done via regular (i.e. non-delayed) replication with a fixed replication factor of 100. In that case, we would have to use:


in order to read the data, and in this case we would always get a return value of NLV = 100, even if there were not 100 actual sounding levels worth of data available for this particular reporting site (but in which case the unavailable levels would be filled out with the aforementioned "missing" value of 10.0E10).

Now, suppose further that, at a later point in this subset definition for a rawinsonde report, the mnemonic "GPOT" was re-used in order to redefine the geopotential level to that of, say, the lowest cloud seen, followed by that of the highest cloud seen. This might look like:

{WHTSEQ}  ...  GPOT  (low_cloud_information)  ...  GPOT  (high_cloud_information)  ...

where ... represents any collection of zero or more intervening mnemonics. In this example, we are back to using delayed replication for the winds-by-height sequence itself; however, we now have the mnemonic "GPOT" being further replicated by being directly listed outside of a replication sequence. Therefore, the use of UFBREP is also required in this case in order to be able to retrieve all of the "GPOT" values within the data subset. The return value NLV would still give a count of the total number of rows that were filled within R8ARR, but this count would now be two higher than it was previously, and the two additional "GPOT" values (i.e. for the low cloud and high cloud information, respectively) would be returned within the last two rows of R8ARR, since that is the order in which they were listed with respect to the other "GPOT" values (i.e. the ones occurring as part of the replication of "WHTSEQ") within the overall subset definition.

As for UFBSEQ, we have already touched on the use of this subroutine, but we will do a quick review here by noting that we could have replaced:




in the previous example and gotten the exact same output in return, since UFBSEQ will itself determine which Table B mnemonics constitute the Table D sequence mnemonic "WHTSEQ" and then return all of the corresponding values within separate columns of R8ARR! Of course, in either case (and in the case of UFBREP as well!), the user must be certain that R8ARR is dimensioned large enough within his or her application program in order to be able to hold all of the values that can possibly be returned.

As a final note, there's one more important distinction between UFBINT and UFBSEQ which comes into play when LUBFR points to a BUFR file that is open for output; specifically, in such cases any call to the latter subroutine must be preceded by a call to subroutine DRFINI, as described separately within the documentation for that subroutine.