Tags:
create new tag
, view all tags

Monitoring your job(s)

Query Job(s) Status

You can monitor your job(s) status with the qstat command, i.e.:

hydra% qstat

Job status meaning
qw pending (waiting in queue)
r running
t in transfer (typically from qw to r)
Eqw error and waiting in queue
d marked for deletion

Worth pointing out (these options can be combined):

qstat -u $user shows only your jobs.
qstat -u '*' shows everybody's jobs.
qstat -s r shows only running jobs.
qstat -s p shows only pending jobs.
qstat -r shows also requested resources and the full job name.
qstat -s r -u $user -g t show master/slave info for parallel jobs
qstat -s r -u $user -g d show task-ID for array jobs
qstat -j 4615585 produces a more detailed output, for a specific job ID.
qstat -explain E -j 4615585 produces the explanation for the error state of a specific job, specified by its job ID.

Use man qstat to get more details on the command.

Deleting/Killing a Job

Use the the qdel command to kill, or delete, jobs from the queue using the job ID, i.e.:

hydra% qdel 4615585

you can kill all your jobs with:

hydra% qdel -u $user

More details on qdel with man qdel.

Modifying a Job Resources

While a job is in the queue, you can modify it's requested resources with qalter. For example,

qalter -o junk 1234567
will alter the stdout file for job 1234567 to be the file junk. The system should reply with a message similar to the following:
modified stdout path list of job 1234567

Other usefull options:

-e filename modify pathname for stderr to filename
-m followed by b, a, e or n for no mail (see qsub)
-M address change mail address to which mail will be sent to address
-N name change name of job to name
-l h_rt=HH:MM:SS change runtime to HH:MM:SS
Read the man page for qalter.

Job Resources Usage

Upon completion, you can get the job resources usage (i.e., cpu, mem, exit status, etc...) with the qacct command, i.e.:

hydra% qacct -j 4615585

Again, more details on qacct with man qacct. Note that qacct will only work on the head node (hydra-2.si.edu), and not on the login nodes.

Cluster Status

You can monitor the cluster load with

hydra% qstat -g c

Compute Nodes Status

You can check/monitor the cluster's compute nodes status (load, memory use, etc...) with the qhost command, i.e.:

hydra% qhost

Again, more details on qhost with man qhost.

Grid Engine Configuration

While you cannot change the grid engine configuration, you can query the current configuration with qconf, using the show (-s) options.

Here is a partial list of -s options:

You can with the command comments
get the list of defined (existing) queues qconf -sql show queue list
query a given queue properties qconf -sq test.q show the properties of the queue test.q
get the list of defined (existing) parallel environments qconf -spl show pe list
query a given parallel environment propertie qconf -sp given_pe show the properties of the PE given_pe
get the list of defined (existing) resource quotas qconf -srqsl  
query the implemented resource quota qconf -srqs [rqame] rqname is optional
query the complex attributes qconf -sc show complex
get the list of host group qconf -shgrpl show host group list
query the value of a host group qconf -shgrp @thehgrp show the thehgrp host group

Again, more details on qconf with man qconf (focus on the -s options).

GUI Interface qmon and ganglia

  • You can use the command qmon, on hydra (with X11 tunneling set up) to access most of these functionality via a GUI.

  • As of Mar 2012, ganglia runs on hydra-2 not hydra.

-- SylvainKorzennikHPCAnalyst - 20 Jan 2012

Topic revision: r8 - 2013-10-29 - SylvainKorzennikHPCAnalyst
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback