Monitoring the NAF and its subsystems

The NAF is a compex system witch many subsystems. A simple "bad/good" answer to the question "How is the NAF?" is not possible. We are currently working to set up and improve monitoring for NAF systems. Here is a list of already existing tools ... and preview.

The Batch System

  • The web site features some plots of historic usage.
  • qstat and qacct launched from a NAF WGS will show detailed information on the batch system or particular jobs at a certain moment. Note however that these tools put load on the batch server, so refrain from using them in scripts at a high rate or at times of known batch server problems.
  • A look at the accounting file of SGE with some details on finished job details can be found using the command bird-jobdetail from the WGS

Work Group Server

  • currently, only tools like top, ps, sar and the like are available to the general public
  • We are working on presenting plots with an view of several parameters like CPU Load and Memory consumption

dCache storage systems

SONAS storage system

AFS storage system