arrow-left

All pages
gitbookPowered by GitBook
1 of 12

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Miscellaneous Scripts

  • Illumina LIMS Integration

  • Generating a Hierarchical Sample History

  • Protocol-based Permissions

Self-Incremental Counters
Generic CSV Parser Template (Python)
Renaming Samples to Add an Internal ID
Creating Custom Sample Sheets
Copying Output UDFs to Submitted Samples
Parsing Sequencing Meta-Data into Clarity LIMS
Submit to a Compute Cluster via PBS
Downloading a File and PDF Image Extraction

Illumina LIMS Integration

This combination of configuration and python script can be used to set up an integration between Clarity LIMS and Illumina LIMS. There are 2 main parts to this integration:

  1. Generating a sample manifest from Clarity LIMS to import the samples into Illumina LIMS.

  2. Once the analysis is completed, automatically parsing in the results from Illumina LIMS into Clarity LIMS.

Disclaimer: This application example is provided as is, with the assumption that anyone deploying this to their LIMS server will own the testing and customization of the configuration and scripts provided.

hashtag
Installation

hashtag
Protocol and Workflow Import

Using the config-slicer tool import the configuration file attached ([IlluminaLIMSIntegration.xml]) as the glsjboss user with the following command:

java -jar /opt/gls/clarity/tools/config-slicer/config-slicer-3.<x>.jar -o import -k IlluminaLIMSIntegration.xml -u <user> -p <password> -a https://<hostname>/api

hashtag
Script and Configuration Setup

  1. As the glsjboss user on the Basespace Clarity LIMS server, copy the Illumina LIMS manifest template file (IlluminaLIMS_Manifest_Template.csv) attached below to the following folder: /opt/gls/clarity/customextensions/IlluminaLIMS

  2. On the Illumina LIMS Windows/Linux workstation create a folder called Clarity_gtc_Parser and do the following :

    • Copy the clarity_gtc_parser_v2.py file into this folder and update the following configuration parameters:\

hashtag
Workflow

The configuration attached to this page contains an example protocol with two Steps.

hashtag
Prerequisites:

Samples have been accessioned into Clarity LIMS with the following sample metadata as Submitted Sample UDFs:

  • Is Control

  • Institute Sample Label

  • Species, Sex

hashtag
Protocol Step: IlluminaLIMS Sample Prep

This manual step is meant to be merged into the last step of a Sample Prep Protocol. It has the configuration to generate a Derived Sample with the LIMSID in the name so that the name can be unique, and used to match data back using the data parser to the next step.

hashtag
Protocol Step: IlluminaLIMS Manifest and Analysis

This requires the user to perform the following steps:

  1. Generate the Illumina LIMS Manifest using the button provided called "Generate Illumina LIMS Manifest".

  2. Download the manifest and import this to IlluminaLIMS Project Manager under the correct institution.

  3. Run the appropriate lab workflow on Illumina LIMS

hashtag
Attachments

IlluminaBeadArrayFiles.py:

IlluminaLIMSIntegration.xml:

IlluminaLIMS_Manifest_Template.csv:

clarity_gtc_parser_v2.py:

Parameter
Description

USERNAME = <APIUSERNAME>

Clarity user with API access

PASSWORD = <APIPASSWORD>

Password for that user

uri = 'https://<DEMAINNAME>/api/v2/artifacts'

URI to the artifact API endpoint on Clarity

path = '/<PATH>/IlluminaLIMS/gtc_folder_v3/'

Path to gtc files

gtcVersion = 3

gtcfile version

NOTE This script supports the current LIMS gtc version, 3, and will be compatible with version 5 when available.

  • Download and copy the IlluminaBeadArrayFiles.py to the same folder (Also Available herearrow-up-right). Edit the file with variables: API, gtc file path and username/password for clarity API for the relevant server.

  • Create an empty file called processed_gtc.txt in the gtc files directory.

  • Setup a scheduled taskarrow-up-right(windows) or cronjobarrow-up-right(linux) to run this python script every 10 minutes. (Assuming Python (version 2.7.1) is installed and available on the workstation).

Comments
  • Volume (ul)

  • Conc (ng/ul)

  • Extraction Method

  • Parent 1

  • Parent 2

  • Replicate(s)

  • WGA Method (if Applicable)

  • Mass of DNA used in

  • WGA

  • Tissue Source

  • After the Illumina LIMS analysis is complete, allow 10minutes and come back to Clarity LIMS to find the step in progress and ensure the following derived sample UDFs are populated:
    • Autocall Version

    • Call Rate, Cluster File

    • GC 10

    • GC 50

    • Gender

    • Imaging Date

    • LogR dev

    • Number of Calls

    • Number of No Calls

    • SNP Manifest

    • Sample Plate

    • Sample Well, 50th Percentiles in X

    • 50th Percentiles in Y

    • 5th Percentiles in X

    • 5th Percentiles in Y

    • 95th Percentiles in X

    • 95th Percentiles in Y

    • Number of Calls Number of Intensity Only Calls

    • Number of No Calls

    file-download
    36KB
    IlluminaBeadArrayFiles.py
    arrow-up-right-from-squareOpen
    file-download
    25KB
    IlluminaLIMSIntegration.xml
    arrow-up-right-from-squareOpen
    file-download
    1KB
    IlluminaLIMS_Manifest_Template.csv
    arrow-up-right-from-squareOpen
    file-download
    6KB
    clarity_gtc_parser_v2.py
    arrow-up-right-from-squareOpen

    Generating a Hierarchical Sample History

    Often, workflows do not have a linear configuration. Even in those cases, samples progressing through workflows may be re-queued, replicated, or submitted into a number of parallel workflows.

    Attempts to align downstream results with submitted samples may get hindered when trying to account for sample replicates or the dynamic decisions made in the lab.

    A visual representation of a samples complete history presented in a clear hierarchical format allows for a digestible report of the work done on the sample. This format provides at-a-glance understanding of the any of the branching or re-queuing of a sample.

    hashtag
    Solution

    This example describes a python script which, given an artifact, recursively finds all the processes for which that artifact was an input, then finds all the associated output artifacts of that process. This continues for all processes all the way down to the most downstream artifact.

    A clear visual representation of the entire history of work on a sample, similar to what was available in the Ops interface, can allow a user to see all the processes and derivations of a sample. This is especially of use for troublesome samples that have branched into numerous downstream replicates which may end up in the same or different sequencing protocol.

    hashtag
    Parameters

    The script accepts the following parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    Script output example:

    Sibling artifacts will appear aligned vertically with the same indentation. In the above example, 122-1650 Library Pooling (MiSeq) 5.0 created two replicate Analytes, 2-4110 and 2-4111. Analyte 2-4111 was the input to the subsequent step ( 24-1952 ) and no additional work was performed on 2-4110.

    Processes performed on an artifact will appear underneath with a tab indentation. In the above example, the first 4 processes ( 3 QC processes and Fragment DNA ) are all using the Root Analyte (CRA201A1PA1) as an input.

    hashtag
    Adding Colours with termcolor

    Install package termcolor for colour printing support. Entity colours can be configured within the script. Globally turn off colours by changing the variable use_colours to False ( line 16 ).

    hashtag
    Assumptions and Notes

    • Your configuration conforms with the script's requirements, as documented in .

    • You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.

    • The glsapiutil.py file is placed in the working directory.

    hashtag
    Attachments

    sample_history_colours.py:

    sample_history.py:

    The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    -a

    The LIMSID of the artifact (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    -s

    The API steps URI {stepURI:v2}. (Required)

    Solutionarrow-up-right
    file-download
    4KB
    sample_history_colours.py
    arrow-up-right-from-squareOpen
    file-download
    3KB
    sample_history.py
    arrow-up-right-from-squareOpen
    bash -l -c "/usr/bin/python /opt/gls/clarity/customextensions/sample_history.py -a DAN2A4PA1 -s {stepURI:v2} -u {username} -p {password}" 

    Renaming Samples to Add an Internal ID

    When a lab takes in samples, they are named by the lab scientist who supplies them. As such, samples may be named in any way imaginable. Conversely, the research facilities processing the samples often have strict naming conventions.

    When these two situations occur, there is a need for the sample to renamed with the strict nomenclature of the processing lab. However, in order for the final data to be meaningful for the scientist who supplied the samples, the original name must also be retained.

    hashtag
    Solution

    In this example, the attached script may be used to rename samples, while retaining their original names in a separate field.

    • The original name of the sample is saved in a user-defined field (UDF) called Customer's Sample Name, .

    • The Sample Name for the submitted sample is overwritten, using the specific naming convention of the customer.

    It is recommended that the script be launched by a process/protocol step as early as possible in the sample lifecycle.

    hashtag
    Parameters

    The script is invoked with just three parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    About the Code

    Once the command-line parameters have been harvested, and the API object set up ready to handle API requests, the renameSamples() method is called.

    This method implements the following pseudo code:

    hashtag
    Assumptions and Notes

    • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

    • A user defined field, named Customer's Sample Name has been created at the analyte (sample) level.

    • The Customer's Sample Name field is visible within the LabLink Collaborations Interface.

    hashtag
    Attachments

    renameSamples.py:

    Protocol-based Permissions

    Laboratories may want to limit what steps Researchers can start. At the time of writing, BaseSpace Clarity LIMS does not natively support protocol-based permissions. However, with an EPP at the beginning of the step we can check to see if the technician/researcher starting the step has been given approval to start the step, and halt the step from starting if they do not have permission. There are several ways this can be done, but special considerations to how these permissions are administered need to be made.

    hashtag
    Solution

    In order to allow an administrator to easily maintain permissions we will assign users to groups in a config file and our EPP will consume this information. One parameter of the EPP is the groups that are permitted to run the step. When the script is triggered at the start of the step, it will look for the name of the technician starting the step in the config file and determine if the technician is:

    • Included in the config file and,

    • Has been assigned to a group that is permitted to run the step.

    It is important to remember that by exiting a script with a negative number an EPP will fail and the user will not be able to move forward in the step. We will take advantage of this EPP feature and if the technician/researcher is part of a permitted group the step would start as expected. But, if they are not part of a permitted group, entry into the step will be halted and an error box will appear with whatever the last print message was in the script.

    hashtag
    Parameters

    The EPP command is configured to pass the following parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    User Interaction

    • The config file can reside in any directory that the EPP script will have access to.

    • The config file that is used in this example has tab delimited columns of Last Name, First Name, and Groups. The permitted groups need to be separated by commas (see the attached example config file). The script can be easily modified if a different format is desire for the config file

    • The EPP should be "automatically initiated" at "the beginning of the step"

    If the user is not allowed to move forward a message box will appear and the step is aborted.

    hashtag
    Assumptions and Notes

    • You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.

    • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

    • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    hashtag
    Attachments

    Group_Permissions.py:

    config.txt:

    You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.

  • You will need to implement your own logic to apply the new sample name in Step 4 of the script.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

  • -l

    The limsid of the process invoking the script (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    file-download
    2KB
    renameSamples.py
    arrow-up-right-from-squareOpen

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    -s

    The URI of the step that launches the script - the {stepURI:v2:http} token (Required)

    -g

    The name of the permitted groups. The permitted groups should be separated by a commas, and passed as one string (enclosed in double quotes)

    file-download
    3KB
    Group_Permissions.py
    arrow-up-right-from-squareOpen
    file-download
    140B
    config.txt
    arrow-up-right-from-squareOpen
    /usr/bin/python /opt/gls/clarity/customextensions/renameSamples.py -l 2-1234 -u admin -p securepassword
    step 1: get the inputs to this process
    
    for each input:
    
       step 2: get the URI of the submitted sample associated with the inputs
    
       step 3: get the sample
    
       step 4: update the sample
    
       step 5: save the sample
    try:
       # is the technicians name a key in the dictionary created from the config file
       # if so find the groups the techician has been assigned in the config
       config_groups = (configDict[first,last]).split(",")
       step_approved = [y.strip() for y in (args["groups"].split(","))]
       if bool(set(config_groups) & set(step_approved)) is False:
       #fail script, stop user from moving forward and have the last print statement appear in message
          print "Nice try %s %s, but you have not been approved to run this step % (first, last )
          exit (-1)
    
    except:
       print "This technician's name has not been included in the config file "
       exit (-1
    python /opt/gls/clarity/customextensions/Group_Permissions.py -u {username} -p {password} -s {stepURI:v2} -g "GroupD, GroupE" 

    Creating Custom Sample Sheets

    Clarity LIMS can create Illumina-based MiSeq and HiSeq 'flavoured' sample sheets. However, if you are using algorithms or indexes outside of those suggested by Illumina, you may be required to produce your own 'custom' sample sheet.

    This example script provides an algorithm that harvests the contents of a flow cell (or any container) that may contain pooled samples, and uses the resulting information to output a custom sample sheet.

    hashtag
    Solution

    The attached script uses aggressive caching in order to execute as quickly as possible. When extreme levels of multiplexing are involved, the cache size could consume considerable quantities of memory, which may be counter-productive.

    The algorithm has been tested on the following unpooled analytes; pooled analytes; and 'pools of pools' — in which multiple homogeneous or heterogeneous pools are themselves combined to produce a new pool.

    In these tests, the algorithm behaved as expected. If you find this is not the case please contact Illumina Support team.

    1. The algorithm uses recursion to determine the individual analytes (samples) and their indexes that are located on the flow cell lane(s).

    2. To determine whether an analyte constitutes a pool or not, the script looks at the number of submitted samples with which the analyte is associated.

      • If the answer is 1, the analyte is not a pool.

    hashtag
    Parameters

    The script is invoked with just three parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    Assumptions and Notes

    • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

    • You will need to implement your own logic to gather the fields required for your specific sample sheet.

    • You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.

    hashtag
    Attachments

    flowcellContents.py:

    If the answer is greater than 1, the analyte is considered to be a pool.

  • If a pooled analyte is discovered, the inputs of the process that produced the pooled analyte are gathered and the same test is used to see if they themselves are pools.

  • This gathering of ancestor analytes continues until the contents of each pool have been resolved, at which point the script produces some example output.

    • Note that while it is expected that you will augment this section of the script with the fields you need for your custom sample sheet, the logic to recursively identify analytes that are not themselves pools should be applicable to all.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    -l

    The luid of the flow cell / container of interest (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    file-download
    3KB
    flowcellContents.py
    arrow-up-right-from-squareOpen
    /usr/bin/python /opt/gls/clarity/customextensions/flowcellContents.py -l 27-1234 -u admin -p securepassword 

    Copying Output UDFs to Submitted Samples

    It may at times be desirable to take key data derived during a workflow and copy it to the submitted sample. There are several reasons why this could be useful:

    • All key data is combined with all of the submitted sample's data, and becomes available on a single object.

    • Key data determined during a workflow can be made immediately available to external collaborators via the LabLink Collaborations Interface, since these users have access to their submitted samples.

    • Searching for data becomes easier as the data is not spread over several entities.

    This example provides a script to allow the copying to occur, and describes how the script can be triggered.

    hashtag
    Solution

    To illustrate the script, we will copy a user-defined field (UDF) that is collected on the outputs of a QC type protocol step.

    • This UDF is named Concentration, and it is stored on the individual ResultFile entities associated with the analytes that went through the QC protocol step.

    • Once the QC protocol step has completed, the Concentration UDF values are copied to a UDF on the submitted Samples, which is called Sample Conc.

    • The QC protocol step is configured to invoke the script from a button on the step's Record Details screen.

    hashtag
    Parameters

    The EPP command is configured to pass the following parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    User Interaction

    Once the script has copied the UDF values from the output to the submitted samples, the values are visible in the Submitted Samples view of the Operations Interface:

    Similarly, assuming that the Sample Conc. UDF is set to be visible within LabLink Collaborations Interface, collaborators are able to see these values in their interface:

    hashtag
    About the Code

    The main method of interest is setUDFs(). This method carries out several operations:

    1. It harvests just enough information so that the objects required by the subsequent code can retrieve the required artifacts using the 'batch' API operations. This involves using some additional code to build and manage the cache of artifacts retrieved in the batch operations, namely:

      • cacheArtifact()

      • prepareCache()

    hashtag
    Assumptions and Notes

    • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

    • You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.

    • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    hashtag
    Attachments

    setUDFonSample.py:

    Parsing Sequencing Meta-Data into Clarity LIMS

    Once a sequencing run has occurred, there is often a requirement to store the locations of the FASTQ / BAM files in Clarity LIMS.

    For paired-end sequencing, it is likely that the meta-data file that describes the locations of these files will contain two rows for each sample sequenced: one for the first read, and another for the second read.

    Such a file is illustrated below:

    Parsing_sequencing_meta_data_ExSeq4.png

    Column 2 of the file, Sample ID, contains the LIMS IDs of the artifacts for which we want to store the FASTQ file values listed in column 3 (Fastq File).

    This example discusses the strategy for parsing and storing data against process inputs, when that data is represented by multiple lines in a data file.

    hashtag
    Solution

    The attached script will parse a data file containing multiple lines of FASTQ file values, and will store the locations of those FASTQ files in user-defined fields.

    hashtag
    Process configuration

    In this example, the process is configured to have a single shared ResultFile output.

    hashtag
    Parameters

    The EPP command is configured to pass the following parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    User Interaction

    The user-interaction is comprised of the following steps:

    hashtag
    Step 1

    The user runs the process up to the Record Details screen as shown in the following image. Note that initially:

    • The Sequencing meta-data file is still to be uploaded.

    • The values for the R1 Filename and R2 Filename fields are empty.

    hashtag
    Step 2

    The user clicks Upload file and attaches the meta-data file. Once attached, the user's screen will resemble this:

    hashtag
    Step 3

    Now that the meta-data file is attached, the user clicks Parse Meta-data File. This invokes the parsing script.

    If parsing was successful, the user's screen will resemble Figure 4 below.

    Note that the values for the R1 Filenames and R2 Filenames have been parsed from the file and will be stored in Clarity LIMS.

    hashtag
    About the code

    The key methods of interest are main(), parseFile() and fetchFile(). The main() method calls parseFile(), which in turn calls fetchFile().

    hashtag
    fetchFile() method

    The fetchFile() method relies upon the fact that the script is running on the Clarity LIMS server, and as such has access to the local file system in which the file (uploaded in Step 2) now resides.

    Thus, fetchFile() can use the API to:

    1. Convert the LIMSID of the file to the location on disk.

    2. Copy the file to the local working directory, ready to be parsed by parseFile().

    hashtag
    parseFile() method

    1. The parseFile() method creates two data structures that are used in the subsequent code within the script:

      • The COLS dictionary has the column names from the first line of the file as its key, and the index of the column as the value.

      • The DATA array contains each subsequent line of the file as a single element. Note that this parsing logic is overly-simplistic, and would need to be supplemented in a production environment. For example, if the CSV file being parsed does not have the column names in the first row, then exceptions would likely occur. Similarly, we assume the file being parsed is CSV, and likewise any data elements which themselves contain commas would likely cause a problem. For the sake of clarity such exception handling has been omitted from the script.

    hashtag
    Assumptions and Notes

    • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

    • You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.

    • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    hashtag
    Attachments

    parseMetadata.py:

    Downloading a File and PDF Image Extraction

    Compatibility: API version 2 revision 21

    A lab user can attach a PDF file containing multiple images to a result file placeholder, the script extracts the images which are automatically attached to corresponding samples as individual result files.

    Images within a PDF may be in a number of formats, and will usually be .ppm or .jpeg. The example script includes additional code to convert .ppm images to .jpeg.

    hashtag
    Prerequisites

    • You have the package 'poppler' (linux) installed.

    • You have defined a process with analytes (samples) as inputs, and outputs that generate the following:

      • A single shared result file output.

      • A result file output per input.

    • You have added samples to Clarity LIMS.

    • You have uploaded the Results PDF to Clarity LIMS during 'Step Setup'.

    • Optionally, if you wish to convert other file types to .jpeg, installation of ImageMagick (linux package).

    hashtag
    Code Example

    How it works:

    1. The lab scientist runs a process/protocol step and attaches the PDF in Clarity LIMS

    2. When run, the scrip uses the API and the 'requests' package available in python to locate and retrieve the PDF.

    3. The script generates a file for each image.

    Additionally, this script converts the images to JPEG format for compatibility with other LIMS features.

    hashtag
    Step 1. Create the script

    Part 1 - Downloading a file using the API

    The script will find and get the content of the PDF through 2 separate GET requests:

    1. Following the artifact URI using the {compoundOutputFile0} to identify the LUID of the PDF file.

    2. Using the ~/api/v2/files/{luid}/download endpoint to save the file to the temporary working directory.

    The PDF is written to the temporary directory.

    The script performs a batch retrieval of the artifact XML for all samples. Subsequently a python dictionary is created defining which LIMS id corresponds to a given well location.

    Part 2 - Extracting images as individual results files

    The script uses the pdfimages function to extract the images from the PDF. This function is from a linux package and can be called using the the os.system() function.

    This example script extracts an image from each page, beginning with page 10. Files are named with LUIDs and well location. The file names must begin with the {outputFileLuids} for automatic attachment.

    Additionally, the cookbook example script converts the image files to JPEG for compatibility with other features in Clarity LIMS. The script uses 'convert', a function from the linux package called 'ImageMagick'. Like the 'pdfimages' function, 'convert' can be called in a python script through the os.system() function.

    hashtag
    Step 2. Configure the Process

    The steps required to configure a process to run EPP are described in the example, namely:

    1. Configure the inputs and outputs.

    2. On the External Programs tab, select the check box to associate the process with an external program.

    hashtag
    Parameters

    The process parameter string for the external program is as follows:

    The EPP command is configured to pass the following parameters:

    hashtag
    Step 3. Run the Process

    Record Details page in Clarity LIMS

    The placeholder where the lab scientist can upload the PDF.

    External Process ready to be run. 'Script generated' message marking individual result file placeholder.

    hashtag
    Expected Output and Results

    External program was run successfully. Individual result files named with artifact LUID and well location.

    hashtag
    Attachments

    pdfimages.py:

    Files are named with LUIDs and well location.
  • The images are attached to the ResultFile placeholder. **The file names must begin with the {outputFileLuids} for automatic attachment.**

  • -a

    The limsid of the result file placeholder where the PDF is attached (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    -f

    A list of the output LUIDS (required, must be in quotes "")

    Process Execution with EPP/Automation Support
    file-download
    4KB
    pdfimages.py
    arrow-up-right-from-squareOpen
    def dl_pdf( artifactluid_ofpdf ): ### finds the file LUID from artifact LUID of the PDF
     artif_URI = BASE_URI + "artifacts/" + artifactluid_ofpdf
     artGET = requests.get(artif_URI, auth=(args[ "username" ],args[ "password" ]))
     root = ET.fromstring(artXML)
     for id in root.findall("{http://genologics.com/ri/file}file"):
         fileLUID = id.get("limsid")
     file_URI = BASE_URI + "files/" + fileLUID + "/download"
     fileGET = requests.get(file_URI, auth=(args[ "username" ],args[ "password" ])) 
     with open("frag.pdf", 'wb') as fd: 
         for chunk in fileGET.iter_content():
             fd.write(chunk)
    page = 10
    for each in range(len(wells)):
        well_loci = wells[each]        
        if well_loci in well_map.keys():
            limsid = well_map[well_loci]
    	filename = limsid + "_" + well_loci
            command = 'pdfimages ' + 'frag.pdf' +' -j -f ' + str(page) + ' -l ' + str(page) + ' ' + filename
    	os.system(command) 
    bash -c "/usr/bin/python /opt/gls/clarity/customextensions/pdfimages.py -a {compoundOutputFileLuid0} -u{username} -p {password} -f '{outputFileLuids}'"

    The name of the UDF to be copied from (Required)

    To copy multiple UDF values from outputs to submitted samples, a list of comma-separated values may be provided.

    getArtifact()

  • The cached artifacts are then accessed, and for each one:

    • The corresponding sample is retrieved via the API.

    • The sample XML is updated such that the UDF value is obtained from the artifact by calling api.getUDF(), and stored on the sample by calling api.setUDF().

    • The sample XML is saved by calling api.updateObject().

  • Finally, a meaningful message is reported back to the user via the contents of the successMsg and/or failMsg variables.

  • -l

    The limsid of the process invoking the script (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    -f

    The UDF on the submitted sample to be copied to (Required)

    To copy multiple UDF values from outputs to submitted samples, a list of comma-separated values may be provided.

    -t

    The type of output that has the UDF to be copied from (Required)

    For example, Analyte, ResultFile, Pool, etc.

    file-download
    4KB
    setUDFonSample.py
    arrow-up-right-from-squareOpen
    Copying_output_UDFs_to_submitted_samples_Op_Interface.png
    Copying_output_UDFs_to_submitted_samples_LabLink.png

    -v

  • Once parseFile() has executed successfully the inputs to the process that has invoked the script are gathered, and 'batch' functionality is used to gather all of the artifacts in a single batch-retrieve transaction.

  • All that remains is to step through the elements within the DATA array, and for each line, gather the values of the Fastq File and Sample ID columns. For each Sample ID value:

    • The corresponding artifact is retrieved.

    • Depending on whether the value of the Fastq File column represents the filename for the first or second read, either the R1 Filename or the R2 Filename user-defined field is updated.

  • Once the modified artifacts have been saved, the values will display in the Clarity LIMS Web Interface.

  • -l

    The limsid of the process invoking the script (Required)

    The {processLuid} token

    -u

    The username of the current user (Required)

    The {username} token

    -p

    The password of the current user (Required)

    The {password} token

    -s

    The URI of the step that launches the script (Required)

    The {stepURI:v2:http} token

    -i

    The limsid of the shared ResultFile output (Required)

    The {compoundOutputFileLuid0} token

    file-download
    4KB
    parseMetadata.py
    arrow-up-right-from-squareOpen
    Parsing_sequencing_meta_data_Fig2.png
    Parsing_sequencing_meta_data_Fig3.png
    Parsing_sequencing_meta_data_Fig4.png

    Submit to a Compute Cluster via PBS

    There are some algorithms which work well on massively parallel compute clusters. BCL Conversion is such an example and is the basis for this application example.

    The concepts illustrated here are not limited to BCL Conversion; as such, they may also be applied in other scenarios. For instance the example PBS script below uses syntax for illumina's CASAVA tool, but could easily be re-purposed for the bcl2fastq tool.

    Also, in this example, Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster, which has read/write access to the storage system holding the data to be converted.

    hashtag
    Example PBS file

    For illustrative purposes, an example PBS file is shown here. (As there are many ways to configure PBS, it is likely that the content of your PBS file(s) will differ from the example provided.)

    hashtag
    Solution

    hashtag
    Process configuration

    In this example, the BCL Conversion process is configured to:

    • Accept a ResultFile input.

    • Produce at least two ResultFile outputs.

    The process is configured with the following process level UDFs:

    The syntax for the external program parameter is as follows:

    hashtag
    Parameters

    hashtag
    User Interaction and Results

    1. The user runs the BCL Conversion process on the output of the Illumina Sequencing process. The sequencing process is aware of the Run ID, as this information is stored as a process level user-defined field (UDF).

    2. The user supplies the following information, which is stored as process level UDFs on the BCL Conversion process:

      • The name of the folder in which the converted data should be stored.

    hashtag
    Assumptions and Notes

    • Portable Batch System (PBS) is used as the job submission mechanism to the compute cluster.

    • The compute cluster has read/write access to the storage system holding the data to be converted.

    • There is an EPP node running on the Clarity LIMS server.

    hashtag
    Attachments

    ClusterBCL.py:

    /usr/bin/python /opt/gls/clarity/customextensions/setUDFonSample.py -l MCL-SA1-131211-24-6813 -u admin -p securepassword -f "Sample Conc., Units" -t ResultFile -v "Concentration, Conc.Units" 
    /usr/bin/python /opt/gls/clarity/customextensions/parseMetadata.py -l 24-9953 -u admin -p securepassword -s http://192.168.9.123:8080/api/v2/steps/24-9953 -i 92-20553

    The bases mask to be used (Required)

    The {udf:Bases mask} token

    -a

    The name of the first output file produced (Required)

    The {compoundOutputFileLuid0} token

    -e

    The name of the second output file produced (Required)

    The {compoundOutputFileLuid1} token

    -r

    The name of the run (Required)

    The {udf:Run Name} token

    The bases mask to be used.

  • The number of mismatches.

  • The number of CPUs to dedicate to the job.

  • The BCL Conversion process launches a script (via the EPP node on the Clarity LIMS server) which does the following:

    • Builds the PBS file based on the user's input.

    • Submits the job by invoking the 'qsub' command along with the PBS file.

  • The PBS client tools have been installed and configured on the Clarity LIMS server, such that the 'qsub' command can be launched directly from the server.
  • When the 'qsub' command is invoked, a PBS file is referenced; this file contains the job description and parameters.

  • The script was written in Python (version 2.7) and relies upon the GLSRestApiUtil.py module. Both files are attached below. The required Python utility is available for download at Obtain and Use the REST API Utility Classes.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

  • -l

    The limsid of the process invoking the code (Required)

    The {processLuid} token

    -u

    The username of the current user (Required)

    The {username} token

    -p

    The password of the current user (Required)

    The {password} token

    -c

    The number of CPUs to dedicate to the run (Required)

    The {udf:Number of Cores} token

    -m

    The number of mismatches (Required)

    The {udf:Number of mismatches} token

    file-download
    6KB
    ClusterBCL.py
    arrow-up-right-from-squareOpen
    Submit_to_compute_cluster.png

    -b

    Self-Incremental Counters

    Sequential numbers are sometimes needed for naming conventions and require self-incrementing counters be created and maintained. We do not recommend using the BaseSpace Clarity LIMSdatabase for this. However the Unix “(n)dbm” library provides an easy way to create and manage counters by creating Dbm objects that behave like mappings (dictionaries).

    hashtag
    Solution

    They way this would work is the attached script ( and the counters file it creates / manages ) would live on the Clarity server and other scripts would depend upon it, and use code similar to below whenever a sequential number was needed. While the script is written in python and uses the dbm module there is nothing inherently Pythonic about this code that couldn’t be reimplemented in another language. However, more information on the Python dbm module can be found at: https://docs.python.org/2/library/dbm.html

    hashtag
    User Interaction

    • The counters live in a file, the path to which is defined in the cm.setPath() command. The file will be created if it doesn’t exist.

    • The file can contain as many counters as you wish (it’s better to have many counters in one file than many files each with only one counter)

    • The name of the counter is passed to the function cm.getNextValue(). If this is the first time the counter has been used, it will be created and added to the file.

    hashtag
    Assumptions and Notes

    • You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.

    • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    hashtag
    Attachments

    clarityCounters.py:

    #!/bin/bash
    
    #PBS -N run_casava
    
    #PBS -q himem
    
    #PBS -l nodes=1:ppn=20
    
    export RUN_DIR=/data/instrument_data/120210_SN1026_0092_BXXXXXXXXX
    
    export OUTPUT_DIR=/data/processed_data/processed_data.1.8.2/120210_SN1026_0092_BXXXXXXXXX
    
    export SAMPLE_SHEET=/data/SampleSheets/samplesheet.csv
    
    cd $PBS_O_WORKDIR
    
    source /etc/profile.d/modules.sh
    
    module load casava-1.8.2
    
    export TMPDIR=/scratch/
    
    export NUM_PROCESSORS=$((PBS_NUM_NODES*PBS_NUM_PPN))
    
    configureBclToFastq.pl --input-dir $RUN_DIR/Data/Intensities/BaseCalls --output-dir $OUTPUT_DIR 
     --sample-sheet $SAMPLE_SHEET --force  --ignore-missing-bcl --ignore-missing-stats
     --use-bases-mask y*,I6,y* --mismatches 1
    
    cd $OUTPUT_DIR
    
    make -j $NUM_PROCESSORS
    python /opt/gls/clarity/customextensions/ClusterBCL.py -l {processLuid} -u {username} -p {password}
     -c {udf:Number of Cores} -m {udf:Number of mismatches} 
     -b "{udf:Bases mask}" -a {compoundOutputFileLuid0}.txt -e {compoundOutputFileLuid1}.txt -r "{udf:Run Name}"
    def test1():
    
       cm = counterManager()
       cm.setPath( "./test" )
       if cm.setup() is True:
          print( "INFO: setup Counter Manager" )
          print( "INFO: attempting to call getNextValue for: testA ..." )
          tmp = cm.getNextValue( "testA" )
          print( "INFO: getNextValue returned:" + str(tmp) )
       else:
          print( "ERROR: Failed to setup Counter Manager" )

    Each time you want the next value just call cm.getNextValue() for that counter and you will be given the next value.

  • The counters and the file will look after themselves, you don’t need to explicitly update / save them – this is all handled behind the scenes.

  • file-download
    2KB
    clarityCounters.py
    arrow-up-right-from-squareOpen

    Generic CSV Parser Template (Python)

    Compatibility: API version 2

    Many different types of CSV files are attached to BaseSpace Clarity LIMS. This example provides a template for a script that can parse a wide range of simple CSV files into Clarity LIMS. The user can change parameters to read the format of the file to be parsed.

    circle-info

    The Lab Instrument Tool Kit includes the parseCSV script, which allows for parsing of CSV files. However, this tool has strict limitations in its strategy for mapping data from the file to corresponding samples in Clarity LIMS. For information on the parseCSV script, refer to the Clarity LIMS Integrations and Tool Kits documentation.

    hashtag
    Solution

    hashtag
    Protocol step configuration

    CSV files are attached to a protocol step. Artifact UDFs, where data will be written to, need to be configured for artifacts and for the protocol step.

    hashtag
    Parameters

    The script accepts the following parameters:

    An example of the full syntax to invoke the script is as follows:

    hashtag
    About the Code

    The script contains an area with a number of configurable variables. This allows a FAS or bioinformatician to customize the script to parse their specific txt file. The following variables within the script are configurable:

    hashtag
    MAPPING MODES

    There are many attributes of samples which can be used to map the data in the text file with the corresponding derived samples in Clarity LIMS. The script should be configured such that one of these modes is set to True.

    Three modes are available:

    For any of the three modes, a mapping column value must be explicitly given. The value is the index of the column containing the mapping data (either artifact name, well location, or UDF value).

    If using the mode MapTo_UDFValue, a UDFName must be given. This is the name of the UDF in clarity which will be used to match the value found in the mapping column.

    hashtag
    artifactUDFMap

    This Python dictionary maps the name of columns in the txt file to artifact UDFs for the outputs of the step. The data from these columns in the file will be written to these UDFs for the output artifacts. The dictionary can contain an unlimited number of UDFs. The dictionary keys, (left side), are the names of the columns in the txt file, and the dictionary values, (right side), are the names of the UDFs as configured for the artifacts.

    hashtag
    Assumptions and Notes

    • You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.

    • The attached files are placed on the LIMS server, in the /opt/gls/clarity/customextensions folder.

    • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

    hashtag
    Attachments

    genericParser.py:

    glsfileutil.py:

    -r

    The luid of the result file where the csv or txt file is attached. (Required)

    -u

    The username of the current user (Required)

    -p

    The password of the current user (Required)

    -s

    The URI of the step that launches the script - the {stepURI:v2:http} token (Required)

    What will the script use to map the measurements to the artifacts in LIMS?

    A Python dictionary where the key is the name of a column in the txt file, and the value is a UDF in Clarity LIMS.

    How is the file delimited? (ex. ',' for .commas or '\t' for tabs)

    The data will be associated with the names of the output artifacts for the given step.

    The data will be associated with the well locations of the output artifacts for the given step.

    The data will be associated with the value of a specified UDF of the output artifacts.

    file-download
    5KB
    genericParser.py
    arrow-up-right-from-squareOpen
    file-download
    2KB
    glsfileutil.py
    arrow-up-right-from-squareOpen
    /usr/bin/python /opt/gls/clarity/customextensions/genericParser.py -u {username} -p {password} -s {stepURI:v2} -r {compoundOutputFileLuid0}
    MAPPING MODE
    artifactUDFMap
    delim
    MapTo_ArtifactName
    MapTo_WellLocation
    MapTo_UDFValue
    artifactUDFMap = {
        "Concentration" : "Concentration",
        "Avg. Size" : "Average Size"
    }