Sun Grid Engine (SGE) on ICA Bench
Running Jobs in a Bench SGE Cluster
Once a cluster is started, the cluster manager can be accessed from the workspace node.
Job resources
Every cluster member has a certain capacity which is determined by the selected Resource model for the cluster member.
The following complex values have been added to the SGE cluster environment and are requestable.
static_cores (default: 1)
static_mem (default: 2G)
These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.
To ensure stability of the system, some headroom is deducted from the total node capacity.
Scaling
These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.
Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.
Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.
The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log
Submitting jobs
Submitting a single job
qsub -l static_mem=1G -l static_cores=1 /data/myscript.sh
Submitting a job array
qsub -l static_mem=1G -l static_cores=1 -t 1-100 /data/myscript.sh
Monitoring members
Listing all members of the cluster
qhost
Managing running/pending jobs
listing all jobs in the cluster
qstat -f
Showing the details of a job.
qstat -f -j <jobId>
Deleting a job.
qdel <jobId>
Managing executed jobs
Showing the details of an executed job.
qacct -j <jobId>
SGE Reference documentation
SGE command line options and configuration details can be found here.
Last updated
Was this helpful?