Monitoring the NAF and its subsystems
The NAF is a compex system witch many subsystems. A simple "bad/good" answer to the question "How is the NAF?" is not possible. We are currently working to set up and improve monitoring for NAF systems. Here is a list of already existing tools ... and preview.
The Batch System
- The web site http://bird.desy.de/status/ features some plots of historic usage.
- qstat and qacct launched from a NAF WGS will show detailed information on the batch system or particular jobs at a certain moment. Note however that these tools put load on the batch server, so refrain from using them in scripts at a high rate or at times of known batch server problems.
- A look at the accounting file of SGE with some details on finished job details can be found using the command
bird-jobdetailfrom the WGS
Work Group Server
- currently, only tools like top, ps, sar and the like are available to the general public
- We are working on presenting plots with an view of several parameters like CPU Load and Memory consumption