Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Often, workflows do not have a linear configuration. Even in those cases, samples progressing through workflows may be re-queued, replicated, or submitted into a number of parallel workflows.
Attempts to align downstream results with submitted samples may get hindered when trying to account for sample replicates or the dynamic decisions made in the lab.
A visual representation of a samples complete history presented in a clear hierarchical format allows for a digestible report of the work done on the sample. This format provides at-a-glance understanding of the any of the branching or re-queuing of a sample.
This example describes a python script which, given an artifact, recursively finds all the processes for which that artifact was an input, then finds all the associated output artifacts of that process. This continues for all processes all the way down to the most downstream artifact.
A clear visual representation of the entire history of work on a sample, similar to what was available in the Ops interface, can allow a user to see all the processes and derivations of a sample. This is especially of use for troublesome samples that have branched into numerous downstream replicates which may end up in the same or different sequencing protocol.
The script accepts the following parameters:
-a
The LIMSID of the artifact (Required)
-u
The username of the current user (Required)
-p
The password of the current user (Required)
-s
The API steps URI {stepURI:v2}. (Required)
An example of the full syntax to invoke the script is as follows:
Sibling artifacts will appear aligned vertically with the same indentation. In the above example, 122-1650 Library Pooling (MiSeq) 5.0 created two replicate Analytes, 2-4110 and 2-4111. Analyte 2-4111 was the input to the subsequent step ( 24-1952 ) and no additional work was performed on 2-4110.
Processes performed on an artifact will appear underneath with a tab indentation. In the above example, the first 4 processes ( 3 QC processes and Fragment DNA ) are all using the Root Analyte (CRA201A1PA1) as an input.
Install package termcolor for colour printing support. Entity colours can be configured within the script. Globally turn off colours by changing the variable use_colours to False ( line 16 ).
Your configuration conforms with the script's requirements, as documented in Solution.
You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.
The glsapiutil.py file is placed in the working directory.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
sample_history_colours.py:
sample_history.py:
Laboratories may want to limit what steps Researchers can start. At the time of writing, BaseSpace Clarity LIMS does not natively support protocol-based permissions. However, with an EPP at the beginning of the step we can check to see if the technician/researcher starting the step has been given approval to start the step, and halt the step from starting if they do not have permission. There are several ways this can be done, but special considerations to how these permissions are administered need to be made.
In order to allow an administrator to easily maintain permissions we will assign users to groups in a config file and our EPP will consume this information. One parameter of the EPP is the groups that are permitted to run the step. When the script is triggered at the start of the step, it will look for the name of the technician starting the step in the config file and determine if the technician is:
Included in the config file and,
Has been assigned to a group that is permitted to run the step.
It is important to remember that by exiting a script with a negative number an EPP will fail and the user will not be able to move forward in the step. We will take advantage of this EPP feature and if the technician/researcher is part of a permitted group the step would start as expected. But, if they are not part of a permitted group, entry into the step will be halted and an error box will appear with whatever the last print message was in the script.
The EPP command is configured to pass the following parameters:
-u
The username of the current user (Required)
-p
The password of the current user (Required)
-s
The URI of the step that launches the script - the {stepURI:v2:http} token (Required)
-g
The name of the permitted groups. The permitted groups should be separated by a commas, and passed as one string (enclosed in double quotes)
An example of the full syntax to invoke the script is as follows:
The config file can reside in any directory that the EPP script will have access to.
The config file that is used in this example has tab delimited columns of Last Name, First Name, and Groups. The permitted groups need to be separated by commas (see the attached example config file). The script can be easily modified if a different format is desire for the config file
The EPP should be "automatically initiated" at "the beginning of the step"
If the user is not allowed to move forward a message box will appear and the step is aborted.
You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.
Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
Group_Permissions.py:
config.txt:
This combination of configuration and python script can be used to set up an integration between Clarity LIMS and Illumina LIMS. There are 2 main parts to this integration:
Generating a sample manifest from Clarity LIMS to import the samples into Illumina LIMS.
Once the analysis is completed, automatically parsing in the results from Illumina LIMS into Clarity LIMS.
Disclaimer: This application example is provided as is, with the assumption that anyone deploying this to their LIMS server will own the testing and customization of the configuration and scripts provided.
Using the config-slicer tool import the configuration file attached ([IlluminaLIMSIntegration.xml]) as the glsjboss user with the following command:
java -jar /opt/gls/clarity/tools/config-slicer/config-slicer-3.<x>.jar -o import -k IlluminaLIMSIntegration.xml -u <user> -p <password> -a https://<hostname>/api
As the glsjboss user on the Basespace Clarity LIMS server, copy the Illumina LIMS manifest template file (IlluminaLIMS_Manifest_Template.csv) attached below to the following folder: /opt/gls/clarity/customextensions/IlluminaLIMS
On the Illumina LIMS Windows/Linux workstation create a folder called Clarity_gtc_Parser and do the following :
Copy the clarity_gtc_parser_v2.py file into this folder and update the following configuration parameters:\
USERNAME = <APIUSERNAME>
Clarity user with API access
PASSWORD = <APIPASSWORD>
Password for that user
uri = 'https://<DEMAINNAME>/api/v2/artifacts'
URI to the artifact API endpoint on Clarity
path = '/<PATH>/IlluminaLIMS/gtc_folder_v3/'
Path to gtc files
gtcVersion = 3
gtcfile version
NOTE This script supports the current LIMS gtc version, 3, and will be compatible with version 5 when available.
Download and copy the IlluminaBeadArrayFiles.py to the same folder (Also Available here). Edit the file with variables: API, gtc file path and username/password for clarity API for the relevant server.
Create an empty file called processed_gtc.txt in the gtc files directory.
Setup a scheduled task(windows) or cronjob(linux) to run this python script every 10 minutes. (Assuming Python (version 2.7.1) is installed and available on the workstation).
The configuration attached to this page contains an example protocol with two Steps.
Samples have been accessioned into Clarity LIMS with the following sample metadata as Submitted Sample UDFs:
Is Control
Institute Sample Label
Species, Sex
Comments
Volume (ul)
Conc (ng/ul)
Extraction Method
Parent 1
Parent 2
Replicate(s)
WGA Method (if Applicable)
Mass of DNA used in
WGA
Tissue Source
This manual step is meant to be merged into the last step of a Sample Prep Protocol. It has the configuration to generate a Derived Sample with the LIMSID in the name so that the name can be unique, and used to match data back using the data parser to the next step.
This requires the user to perform the following steps:
Generate the Illumina LIMS Manifest using the button provided called "Generate Illumina LIMS Manifest".
Download the manifest and import this to IlluminaLIMS Project Manager under the correct institution.
Run the appropriate lab workflow on Illumina LIMS
After the Illumina LIMS analysis is complete, allow 10minutes and come back to Clarity LIMS to find the step in progress and ensure the following derived sample UDFs are populated:
Autocall Version
Call Rate, Cluster File
GC 10
GC 50
Gender
Imaging Date
LogR dev
Number of Calls
Number of No Calls
SNP Manifest
Sample Plate
Sample Well, 50th Percentiles in X
50th Percentiles in Y
5th Percentiles in X
5th Percentiles in Y
95th Percentiles in X
95th Percentiles in Y
Number of Calls Number of Intensity Only Calls
Number of No Calls
IlluminaBeadArrayFiles.py:
IlluminaLIMSIntegration.xml:
IlluminaLIMS_Manifest_Template.csv:
clarity_gtc_parser_v2.py:
Compatibility: API version 2
Many different types of CSV files are attached to BaseSpace Clarity LIMS. This example provides a template for a script that can parse a wide range of simple CSV files into Clarity LIMS. The user can change parameters to read the format of the file to be parsed.
The Lab Instrument Tool Kit includes the parseCSV script, which allows for parsing of CSV files. However, this tool has strict limitations in its strategy for mapping data from the file to corresponding samples in Clarity LIMS. For information on the parseCSV script, refer to the Clarity LIMS Integrations and Tool Kits documentation.
CSV files are attached to a protocol step. Artifact UDFs, where data will be written to, need to be configured for artifacts and for the protocol step.
The script accepts the following parameters:
-r
The luid of the result file where the csv or txt file is attached. (Required)
-u
The username of the current user (Required)
-p
The password of the current user (Required)
-s
The URI of the step that launches the script - the {stepURI:v2:http} token (Required)
An example of the full syntax to invoke the script is as follows:
The script contains an area with a number of configurable variables. This allows a FAS or bioinformatician to customize the script to parse their specific txt file. The following variables within the script are configurable:
What will the script use to map the measurements to the artifacts in LIMS?
A Python dictionary where the key is the name of a column in the txt file, and the value is a UDF in Clarity LIMS.
How is the file delimited? (ex. ',' for .commas or '\t' for tabs)
There are many attributes of samples which can be used to map the data in the text file with the corresponding derived samples in Clarity LIMS. The script should be configured such that one of these modes is set to True.
Three modes are available:
The data will be associated with the names of the output artifacts for the given step.
The data will be associated with the well locations of the output artifacts for the given step.
The data will be associated with the value of a specified UDF of the output artifacts.
For any of the three modes, a mapping column value must be explicitly given. The value is the index of the column containing the mapping data (either artifact name, well location, or UDF value).
If using the mode MapTo_UDFValue, a UDFName must be given. This is the name of the UDF in clarity which will be used to match the value found in the mapping column.
This Python dictionary maps the name of columns in the txt file to artifact UDFs for the outputs of the step. The data from these columns in the file will be written to these UDFs for the output artifacts. The dictionary can contain an unlimited number of UDFs. The dictionary keys, (left side), are the names of the columns in the txt file, and the dictionary values, (right side), are the names of the UDFs as configured for the artifacts.
You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.
The attached files are placed on the LIMS server, in the /opt/gls/clarity/customextensions folder.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
genericParser.py:
glsfileutil.py:
Sequential numbers are sometimes needed for naming conventions and require self-incrementing counters be created and maintained. We do not recommend using the BaseSpace Clarity LIMSdatabase for this. However the Unix “(n)dbm” library provides an easy way to create and manage counters by creating Dbm objects that behave like mappings (dictionaries).
They way this would work is the attached script ( and the counters file it creates / manages ) would live on the Clarity server and other scripts would depend upon it, and use code similar to below whenever a sequential number was needed. While the script is written in python and uses the dbm module there is nothing inherently Pythonic about this code that couldn’t be reimplemented in another language. However, more information on the Python dbm module can be found at: https://docs.python.org/2/library/dbm.html
The counters live in a file, the path to which is defined in the cm.setPath() command. The file will be created if it doesn’t exist.
The file can contain as many counters as you wish (it’s better to have many counters in one file than many files each with only one counter)
The name of the counter is passed to the function cm.getNextValue(). If this is the first time the counter has been used, it will be created and added to the file.
Each time you want the next value just call cm.getNextValue() for that counter and you will be given the next value.
The counters and the file will look after themselves, you don’t need to explicitly update / save them – this is all handled behind the scenes.
You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
clarityCounters.py:
Clarity LIMS can create Illumina-based MiSeq and HiSeq 'flavoured' sample sheets. However, if you are using algorithms or indexes outside of those suggested by Illumina, you may be required to produce your own 'custom' sample sheet.
This example script provides an algorithm that harvests the contents of a flow cell (or any container) that may contain pooled samples, and uses the resulting information to output a custom sample sheet.
The attached script uses aggressive caching in order to execute as quickly as possible. When extreme levels of multiplexing are involved, the cache size could consume considerable quantities of memory, which may be counter-productive.
The algorithm has been tested on the following unpooled analytes; pooled analytes; and 'pools of pools' — in which multiple homogeneous or heterogeneous pools are themselves combined to produce a new pool.
In these tests, the algorithm behaved as expected. If you find this is not the case please contact Illumina Support team.
The algorithm uses recursion to determine the individual analytes (samples) and their indexes that are located on the flow cell lane(s).
To determine whether an analyte constitutes a pool or not, the script looks at the number of submitted samples with which the analyte is associated.
If the answer is 1, the analyte is not a pool.
If the answer is greater than 1, the analyte is considered to be a pool.
If a pooled analyte is discovered, the inputs of the process that produced the pooled analyte are gathered and the same test is used to see if they themselves are pools.
This gathering of ancestor analytes continues until the contents of each pool have been resolved, at which point the script produces some example output.
Note that while it is expected that you will augment this section of the script with the fields you need for your custom sample sheet, the logic to recursively identify analytes that are not themselves pools should be applicable to all.
The script is invoked with just three parameters:
An example of the full syntax to invoke the script is as follows:
Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.
You will need to implement your own logic to gather the fields required for your specific sample sheet.
You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
flowcellContents.py:
It may at times be desirable to take key data derived during a workflow and copy it to the submitted sample. There are several reasons why this could be useful:
All key data is combined with all of the submitted sample's data, and becomes available on a single object.
Key data determined during a workflow can be made immediately available to external collaborators via the LabLink Collaborations Interface, since these users have access to their submitted samples.
Searching for data becomes easier as the data is not spread over several entities.
This example provides a script to allow the copying to occur, and describes how the script can be triggered.
To illustrate the script, we will copy a user-defined field (UDF) that is collected on the outputs of a QC type protocol step.
This UDF is named Concentration, and it is stored on the individual ResultFile entities associated with the analytes that went through the QC protocol step.
Once the QC protocol step has completed, the Concentration UDF values are copied to a UDF on the submitted Samples, which is called Sample Conc.
The QC protocol step is configured to invoke the script from a button on the step's Record Details screen.
The EPP command is configured to pass the following parameters:
An example of the full syntax to invoke the script is as follows:
Once the script has copied the UDF values from the output to the submitted samples, the values are visible in the Submitted Samples view of the Operations Interface:
Similarly, assuming that the Sample Conc. UDF is set to be visible within LabLink Collaborations Interface, collaborators are able to see these values in their interface:
The main method of interest is setUDFs(). This method carries out several operations:
It harvests just enough information so that the objects required by the subsequent code can retrieve the required artifacts using the 'batch' API operations. This involves using some additional code to build and manage the cache of artifacts retrieved in the batch operations, namely:
cacheArtifact()
prepareCache()
getArtifact()
The cached artifacts are then accessed, and for each one:
The corresponding sample is retrieved via the API.
The sample XML is updated such that the UDF value is obtained from the artifact by calling api.getUDF(), and stored on the sample by calling api.setUDF().
The sample XML is saved by calling api.updateObject().
Finally, a meaningful message is reported back to the user via the contents of the successMsg and/or failMsg variables.
Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.
You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
setUDFonSample.py:
Once a sequencing run has occurred, there is often a requirement to store the locations of the FASTQ / BAM files in Clarity LIMS.
For paired-end sequencing, it is likely that the meta-data file that describes the locations of these files will contain two rows for each sample sequenced: one for the first read, and another for the second read.
Such a file is illustrated below:
Column 2 of the file, Sample ID, contains the LIMS IDs of the artifacts for which we want to store the FASTQ file values listed in column 3 (Fastq File).
This example discusses the strategy for parsing and storing data against process inputs, when that data is represented by multiple lines in a data file.
The attached script will parse a data file containing multiple lines of FASTQ file values, and will store the locations of those FASTQ files in user-defined fields.
In this example, the process is configured to have a single shared ResultFile output.
The EPP command is configured to pass the following parameters:
An example of the full syntax to invoke the script is as follows:
The user-interaction is comprised of the following steps:
The user runs the process up to the Record Details screen as shown in the following image. Note that initially:
The Sequencing meta-data file is still to be uploaded.
The values for the R1 Filename and R2 Filename fields are empty.
The user clicks Upload file and attaches the meta-data file. Once attached, the user's screen will resemble this:
Now that the meta-data file is attached, the user clicks Parse Meta-data File. This invokes the parsing script.
If parsing was successful, the user's screen will resemble Figure 4 below.
Note that the values for the R1 Filenames and R2 Filenames have been parsed from the file and will be stored in Clarity LIMS.
The key methods of interest are main(), parseFile() and fetchFile(). The main() method calls parseFile(), which in turn calls fetchFile().
The fetchFile() method relies upon the fact that the script is running on the Clarity LIMS server, and as such has access to the local file system in which the file (uploaded in Step 2) now resides.
Thus, fetchFile() can use the API to:
Convert the LIMSID of the file to the location on disk.
Copy the file to the local working directory, ready to be parsed by parseFile().
The parseFile() method creates two data structures that are used in the subsequent code within the script:
The COLS dictionary has the column names from the first line of the file as its key, and the index of the column as the value.
The DATA array contains each subsequent line of the file as a single element. Note that this parsing logic is overly-simplistic, and would need to be supplemented in a production environment. For example, if the CSV file being parsed does not have the column names in the first row, then exceptions would likely occur. Similarly, we assume the file being parsed is CSV, and likewise any data elements which themselves contain commas would likely cause a problem. For the sake of clarity such exception handling has been omitted from the script.
Once parseFile() has executed successfully the inputs to the process that has invoked the script are gathered, and 'batch' functionality is used to gather all of the artifacts in a single batch-retrieve transaction.
All that remains is to step through the elements within the DATA array, and for each line, gather the values of the Fastq File and Sample ID columns. For each Sample ID value:
The corresponding artifact is retrieved.
Depending on whether the value of the Fastq File column represents the filename for the first or second read, either the R1 Filename or the R2 Filename user-defined field is updated.
Once the modified artifacts have been saved, the values will display in the Clarity LIMS Web Interface.
Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.
You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.
The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.
parseMetadata.py: