1 of 1

Submit to a Compute Cluster via PBS

There are some algorithms which work well on massively parallel compute clusters. BCL Conversion is such an example and is the basis for this application example.

The concepts illustrated here are not limited to BCL Conversion; as such, they may also be applied in other scenarios. For instance the example PBS script below uses syntax for illumina's CASAVA tool, but could easily be re-purposed for the bcl2fastq tool.

Also, in this example, Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster, which has read/write access to the storage system holding the data to be converted.

Example PBS file

For illustrative purposes, an example PBS file is shown here. (As there are many ways to configure PBS, it is likely that the content of your PBS file(s) will differ from the example provided.)

Solution

Process configuration

In this example, the BCL Conversion process is configured to:

Accept a ResultFile input.
Produce at least two ResultFile outputs.

The process is configured with the following process level UDFs:

The syntax for the external program parameter is as follows:

Parameters

User Interaction and Results

The user runs the BCL Conversion process on the output of the Illumina Sequencing process. The sequencing process is aware of the Run ID, as this information is stored as a process level user-defined field (UDF).
The user supplies the following information, which is stored as process level UDFs on the BCL Conversion process:
- The name of the folder in which the converted data should be stored.

Assumptions and Notes

Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster.
The compute cluster has read/write access to the storage system holding the data to be converted.
There is an EPP node running on the Clarity LIMS server.

Attachments

ClusterBCL.py:

Submit to a Compute Cluster via PBS

There are some algorithms which work well on massively parallel compute clusters. BCL Conversion is such an example and is the basis for this application example.

Example PBS file

For illustrative purposes, an example PBS file is shown here. (As there are many ways to configure PBS, it is likely that the content of your PBS file(s) will differ from the example provided.)

Solution

Process configuration

In this example, the BCL Conversion process is configured to:

Accept a ResultFile input.
Produce at least two ResultFile outputs.

The process is configured with the following process level UDFs:

The syntax for the external program parameter is as follows:

Parameters

User Interaction and Results

The user runs the BCL Conversion process on the output of the Illumina Sequencing process. The sequencing process is aware of the Run ID, as this information is stored as a process level user-defined field (UDF).
The user supplies the following information, which is stored as process level UDFs on the BCL Conversion process:
- The name of the folder in which the converted data should be stored.

Assumptions and Notes

Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster.
The compute cluster has read/write access to the storage system holding the data to be converted.
There is an EPP node running on the Clarity LIMS server.

Attachments

ClusterBCL.py:

#!/bin/bash

#PBS -N run_casava

#PBS -q himem

#PBS -l nodes=1:ppn=20

export RUN_DIR=/data/instrument_data/120210_SN1026_0092_BXXXXXXXXX

export OUTPUT_DIR=/data/processed_data/processed_data.1.8.2/120210_SN1026_0092_BXXXXXXXXX

export SAMPLE_SHEET=/data/SampleSheets/samplesheet.csv

cd $PBS_O_WORKDIR

source /etc/profile.d/modules.sh

module load casava-1.8.2

export TMPDIR=/scratch/

export NUM_PROCESSORS=$((PBS_NUM_NODES*PBS_NUM_PPN))

configureBclToFastq.pl --input-dir $RUN_DIR/Data/Intensities/BaseCalls --output-dir $OUTPUT_DIR 
 --sample-sheet $SAMPLE_SHEET --force  --ignore-missing-bcl --ignore-missing-stats
 --use-bases-mask y*,I6,y* --mismatches 1

cd $OUTPUT_DIR

make -j $NUM_PROCESSORS

#!/bin/bash

#PBS -N run_casava

#PBS -q himem

#PBS -l nodes=1:ppn=20

export RUN_DIR=/data/instrument_data/120210_SN1026_0092_BXXXXXXXXX

export OUTPUT_DIR=/data/processed_data/processed_data.1.8.2/120210_SN1026_0092_BXXXXXXXXX

export SAMPLE_SHEET=/data/SampleSheets/samplesheet.csv

cd $PBS_O_WORKDIR

source /etc/profile.d/modules.sh

module load casava-1.8.2

export TMPDIR=/scratch/

export NUM_PROCESSORS=$((PBS_NUM_NODES*PBS_NUM_PPN))

configureBclToFastq.pl --input-dir $RUN_DIR/Data/Intensities/BaseCalls --output-dir $OUTPUT_DIR 
 --sample-sheet $SAMPLE_SHEET --force  --ignore-missing-bcl --ignore-missing-stats
 --use-bases-mask y*,I6,y* --mismatches 1

cd $OUTPUT_DIR

make -j $NUM_PROCESSORS

Submit to a Compute Cluster via PBS

hashtagExample PBS file

hashtagSolution

hashtagProcess configuration

hashtagParameters

hashtagUser Interaction and Results

hashtagAssumptions and Notes

hashtagAttachments

Submit to a Compute Cluster via PBS

hashtagExample PBS file

hashtagSolution

hashtagProcess configuration

hashtagParameters

hashtagUser Interaction and Results

hashtagAssumptions and Notes

hashtagAttachments

Example PBS file

Solution

Process configuration

Parameters

User Interaction and Results

Assumptions and Notes

Attachments

Example PBS file

Solution

Process configuration

Parameters

User Interaction and Results

Assumptions and Notes

Attachments