Sun Grid Engine (SGE) on ICA Bench

Running Jobs in a Bench SGE Cluster

Once a cluster is started, the cluster manager can be accessed from the workspace node.

Job resources

Every cluster member has a certain capacity which is determined by the selected Resource model for the cluster member.

The following complex values have been added to the SGE cluster environment and are requestable.

  • static_cores (default: 1)

  • static_mem (default: 2G)

These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.

To ensure stability of the system, some headroom is deducted from the total node capacity.

Scaling

These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.

Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.

The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log

Submitting jobs

Submitting a single job

qsub -l static_mem=1G -l static_cores=1 /data/myscript.sh

Submitting a job array

qsub -l static_mem=1G -l static_cores=1 -t 1-100 /data/myscript.sh

Do not limit the job concurrency amount as this will result in unused cluster members.

Monitoring members

Listing all members of the cluster

qhost

Managing running/pending jobs

listing all jobs in the cluster

qstat -f

Showing the details of a job.

qstat -f -j <jobId>

Deleting a job.

qdel <jobId>

Managing executed jobs

Showing the details of an executed job.

qacct -j <jobId>

SGE Reference documentation

SGE command line options and configuration details can be found here.

Last updated

Was this helpful?