arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Submit to a Compute Cluster via PBS

There are some algorithms which work well on massively parallel compute clusters. BCL Conversion is such an example and is the basis for this application example.

The concepts illustrated here are not limited to BCL Conversion; as such, they may also be applied in other scenarios. For instance the example PBS script below uses syntax for illumina's CASAVA tool, but could easily be re-purposed for the bcl2fastq tool.

Also, in this example, Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster, which has read/write access to the storage system holding the data to be converted.

hashtag
Example PBS file

For illustrative purposes, an example PBS file is shown here. (As there are many ways to configure PBS, it is likely that the content of your PBS file(s) will differ from the example provided.)

hashtag
Solution

hashtag
Process configuration

In this example, the BCL Conversion process is configured to:

  • Accept a ResultFile input.

  • Produce at least two ResultFile outputs.

The process is configured with the following process level UDFs:

The syntax for the external program parameter is as follows:

hashtag
Parameters

hashtag
User Interaction and Results

  1. The user runs the BCL Conversion process on the output of the Illumina Sequencing process. The sequencing process is aware of the Run ID, as this information is stored as a process level user-defined field (UDF).

  2. The user supplies the following information, which is stored as process level UDFs on the BCL Conversion process:

    • The name of the folder in which the converted data should be stored.

hashtag
Assumptions and Notes

  • Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster.

  • The compute cluster has read/write access to the storage system holding the data to be converted.

  • There is an EPP node running on the Clarity LIMS server.

hashtag
Attachments

ClusterBCL.py:

The bases mask to be used (Required)

The {udf:Bases mask} token

-a

The name of the first output file produced (Required)

The {compoundOutputFileLuid0} token

-e

The name of the second output file produced (Required)

The {compoundOutputFileLuid1} token

-r

The name of the run (Required)

The {udf:Run Name} token

The bases mask to be used.

  • The number of mismatches.

  • The number of CPUs to dedicate to the job.

  • The BCL Conversion process launches a script (via the EPP node on the Clarity LIMS server) which does the following:

    • Builds the PBS file based on the user's input.

    • Submits the job by invoking the 'qsub' command along with the PBS file.

  • The PBS client tools have been installed and configured on the Clarity LIMS server, such that the 'qsub' command can be launched directly from the server.
  • When the 'qsub' command is invoked, a PBS file is referenced; this file contains the job description and parameters.

  • The script was written in Python (version 2.7) and relies upon the GLSRestApiUtil.py module. Both files are attached below. The required Python utility is available for download at Obtain and Use the REST API Utility Classes.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

  • -l

    The limsid of the process invoking the code (Required)

    The {processLuid} token

    -u

    The username of the current user (Required)

    The {username} token

    -p

    The password of the current user (Required)

    The {password} token

    -c

    The number of CPUs to dedicate to the run (Required)

    The {udf:Number of Cores} token

    -m

    The number of mismatches (Required)

    The {udf:Number of mismatches} token

    file-download
    6KB
    ClusterBCL.py
    arrow-up-right-from-squareOpen
    Submit_to_compute_cluster.png

    -b

    #!/bin/bash
    
    #PBS -N run_casava
    
    #PBS -q himem
    
    #PBS -l nodes=1:ppn=20
    
    export RUN_DIR=/data/instrument_data/120210_SN1026_0092_BXXXXXXXXX
    
    export OUTPUT_DIR=/data/processed_data/processed_data.1.8.2/120210_SN1026_0092_BXXXXXXXXX
    
    export SAMPLE_SHEET=/data/SampleSheets/samplesheet.csv
    
    cd $PBS_O_WORKDIR
    
    source /etc/profile.d/modules.sh
    
    module load casava-1.8.2
    
    export TMPDIR=/scratch/
    
    export NUM_PROCESSORS=$((PBS_NUM_NODES*PBS_NUM_PPN))
    
    configureBclToFastq.pl --input-dir $RUN_DIR/Data/Intensities/BaseCalls --output-dir $OUTPUT_DIR 
     --sample-sheet $SAMPLE_SHEET --force  --ignore-missing-bcl --ignore-missing-stats
     --use-bases-mask y*,I6,y* --mismatches 1
    
    cd $OUTPUT_DIR
    
    make -j $NUM_PROCESSORS
    python /opt/gls/clarity/customextensions/ClusterBCL.py -l {processLuid} -u {username} -p {password}
     -c {udf:Number of Cores} -m {udf:Number of mismatches} 
     -b "{udf:Bases mask}" -a {compoundOutputFileLuid0}.txt -e {compoundOutputFileLuid1}.txt -r "{udf:Run Name}"