Model Transition Team
Model Transition Team (MTT) Meeting Minutes
September 21, 2012
- R. Reddy gave a presentation on 'Controlling Process/Thread Placement on Zeus'.
- R. Reddy's info is for SGI.
- We should find equivalent syntax for Gaea and WCOSS.
- D. Behringer is willing to try R. Reddy's placement on Gaea.
- G. Vandenberghe is verifying bufrlib.
- J. Woollen testing a C interface for bufr.
- NCEP library status table is now available on the main page of the MTT website or here.
- Hpss requires smaller ulimit. Hpsstar application was changed to have a smaller limit.
- G. Vandenberghe is running account_params once an hour and accumulating into a log file (for groups and allocations). Currently cannot track per user.
- No xlib on Eddy. There are binaries, but no libraries or includes.
- G. Vandenberghe is unable to build static versions of Image Magick because of this.
August 24, 2012
- Moab has been upgraded.
- Default number of threads (OMP_NUM_THREADS) is now 1 instead of 12.
- IDL may attempt to launch multiple threads, so users may need to specify the number of threads used.
- HWRF group cannot access Zeus and Jet for real-time data transfers from EMC servers (since the IP addresses and firewalls have been changed).
- There has been no update on tracking who has been using what accounts when running jobs (cpu hours).
- Support wants a list of experienced users to implement cron jobs until they know it runs without problems.
- VSDB package now works.
- Eddy is planned to be identical to Tide, which will be available sometime in September.
July 13, 2012
- /ptmp and /stmp scrubber starts July 16.
- Links and directories will not be removed at this time.
- Scrubbing is based on creation time.
- rstprod is still over quota.
- Scripts may need to be modified to remove this problem.
- Significant performance degradation due to MPI rates being cut by 2 - 3 (communications between tasks are slow).
- Still investigating lustre issues (R. Vasic is collecting job ids and nodes used.).
- All single jobs will be run on one rack and will not share with parallel jobs.
- No official build for nceplibs yet, but progress is being made.
- ImageMagik version is way too old (12 years). We should move to the most recent version (3 months old).
June 15, 2012
- Zeus rstprod:
- Be sure to change the group when using tag_rstprod (-g option).
- autotag_rstprod is a script that tags all files. It is being tested and will be passed on to Zeus admins.
- G. Vandenberghe will generate a report of the biggest 'offenders' with rstprod.
- It was suggested to propogate this information to newer Zeus users after the initial users have a chance to test everything.
- ptmp and stmp scrubber is set to start potentially June 18.
- There is NO support for MPMD scripts on Gaea.
- G. Vandenberghe feels there may be a need for at least more LDTN nodes.
- S. Barry will forward this request at the next HPC RAC meeting.
- Symbolic links on the root directory of Zeus will soon be removed.
- S. Barry requested any information regarding new work that is being done on the systems so he can add to his report for HPC RAC.
June 1, 2012
- There will be a presentation about a workflow manager sometime in July.
- Available dates in July are 11, 13, 17, 20, and 27.
- Those interested in attending should contact R. Reddy with their availability.
- Intel FFT:
- E. Mirvis installed a library based on wrapper and will provide a short presentation at the next meeting after some testing is completed.
- Nceplibs wiki page will be made available through the MTT website.
- All Zeus users need to have group read permission on files in /save if they want G. Vandenberghe to provide a backup.
- Anything with an NCEP root will be saved.
- G. Vandenberghe will send directives regarding backing up files on /save.
- S. Barry will get in contact with those involved with scrubbing of /ptmp and /stmp.
- We need a date for when scrubbing will start to properly notify users.
- When moving files from /ptmp or /stmp to /noscrub or /save, the group still stays ptmp or stmp and does not change to the new group in which they were moved.
May 18, 2012
- Sam Trahan gave a presentation about restricted data protection on Zeus through access control lists and introduced several scripts to set files to the rstprod access control list group (see presentation on main page under Meeting Documents)
- Raghu Reddy gave a presentation and example of how to use Tau (see presentation on main page under Meeting Documents)
May 4, 2012
- Current 1 and 2:
- Mpfort should be used.
- Convgrib using options 31 and 32 did not work on Zeus for R. Vasic, but it does work for him on WCOSS.
- S. Trahan has HWRF compiled and is now working on its scripts.
- No update yet on the scrubber. G. Vandenberghe will follow up with Craig Tierney.
- No update on rstprod. Keep an eye on it until quota issues are resolved.
- Real-time data transfers have been put on the back burner because of the move to WCOSS. A cron job may need to be set up in the meantime.
- S. Trahan sent Zeus admins all IP addresses from CCS to regenerate keys for class 1 and 2 nodes that were changed without notice by CCS.
- P. Shafran's verification is running.
- Threading issue in GFS whenever running with 3 threads. R. Reddy is diagnosing the problem. Application problem?
- There is a lot of cpu hours available still.
- GFS coupled is now working, but to kick it off from the parallel still is not.
- MTT meetings will now be every 2 weeks. The next one will be May 18.