Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section provides tips and tricks to help you work efficiently with the API. For example, learn how to copy and update field values, create and rename samples, work with files and QC flags, and automate BCL conversion.
The QC flag parameter qc-flag can be set on an input or output analyte (derived sample) or on an individual result file (measurement) with a few lines of Groovy code.
In the following example, the qc-flag value of the analyte artifact is set based on the value of the bp_size variable when compared to the threshold1 and threshold2 variables.
The following code determines whether a qc-flag value is previously set, such that a flag is only set if one does not exist.
This section outlines several strategies to enable this feature.
In all cases, assume that a UDF called Batch ID that was on Step A, and you want to access it on Step D:
NOTE: If the samples in Step D do not have a homogeneous lineage, expect multiple values for the Batch ID.
This method involves crawling backwards from Step D to Step A.
The general form is as follows.
Examine the inputs to Step D.
Each input (I) has a parent-process element with a URI to the step that created the artifact. In this case, it is the URI to Step C.
Get the input-output maps for Step C (from the /details resource) and find the input (I') that produced output I. Each input (I') has a parent-process element with a URI to the step that created the artifact. In this case, it is the URI to Step B.
Get the input-output maps for Step B (from the /details resource) and find the input (I'') that produced the output I'. Each input (I'') has a parent-process element with a URI to the step that created the artifact. In this case, it is the URI to Step A.
Get the value of the UDF (Batch ID) from Step A: 1234.
This method is computationally slow, but it is safe. As the number of steps that need to be crawled back through increases, so does the duration of the script to retrieve the value.
This method tried to jump straight to Step A, without passing through Steps B and C.
The general form is as follows.
Examine the inputs to Step D. Each input (I) has a sample element that contains the limsid (S) of the related submitted sample.
https://<your_hostname>/api/v2/artifacts?samplelimsid=Sandprocess-type=Step%20A
This query should give an XML response containing the URI to Step A. From there, get the value of the UDF (Batch ID): 1234.
This method makes two assumptions:
That Step A produces analytes (derived samples). Thus, if Step A is a QC process, or does not produce analyte outputs, this method fails.
That the analytes (derived samples) resulting from S only passed through Step A one time. If this assumption is not true, you receive multiple URIs to the individual instances of Step A that relate. Also, you cannot be certain which Batch ID to rely upon.
This method is computationally fast, and its duration is not reduced if there are many steps between Step A and Step D.
This method works well, but it involves making configuration changes to the steps. As such, this method is useless for legacy data resulting from samples that passed through the steps before the configuration was applied.
Its general form involves:
In Step A: Add a script that copies the value of the Batch ID UDF (1234) to every input and output of type analyte in the step.
In Step B: Add a script that copies the value of the Batch ID UDF (1234) to every output of type analyte in the step.
In Step C: Add a script that copies the value of the Batch ID UDF (1234) to every output of type analyte in the step.
In Step D: The inputs contain the value of the Batch ID.
This method relies on propagating the Step UDF through Steps A, B, and C to Step D. It is safe and fast. However, if the protocol is edited and a new step is inserted between B and C, add the script that propagates the value. This addition is so the chain does not break. This method is safe if any of the steps are QC steps or do not produce analyte outputs.
This method is a niche solution, but it works well. It assumes that the samples from Step A proceed to Step D as an intact group, and they are joined by a control sample.
This method involves making configuration changes to the steps. As such, this method is useless for legacy data resulting from samples that passed through the steps before the configuration was applied.
In Step A: Identify the control sample for the group, then copy the value of the Batch ID to the control sample.
In Step D: Identify the control sample for the group, then retrieve the value of the Batch ID from it.
This method is the least work, but it does make several assumptions that might make it impracticable.
In a highly automated workflow, a lab gains little value from manually selecting samples into the ice bucket and then transitioning them through a step. Ideally, upon completion of one step, a following step could be automated such that the output analytes were transitioned through to the Record Details screen.
The Clarity LIMS External Program Plugin (EPP)/automation system cannot aid in this transition. The last point at which an automation can be triggered is before the step completion.
This scenario requires a stand-alone API application, which can be run by an automation at the end of a step.
Using this approach, a standalone app would poll the API until each of the output analytes from the previous step were queued for the next step. After they are queued, they can be walked through to the Record Details stage.
The steps are as follows:
EPP / automation triggers at step completion and launches an API app as a new Linux process and then finishes. The parameter for the API app is the URL for the current process.
API app polls to see if each output analyte is queued.
Use the artifacts batch endpoint (api/v2/artifacts/batch/retrieve) to poll.
Check the last workflow-stage node within workflow-stages and look for status="QUEUED".
API app moves the output analytes through the step to Record Details.
Use the /api/v2/steps endpoints to start the step and then move the analytes forward.
Within a script, you may sometimes need to know to which workflow the current sample is assigned.
However, in Clarity LIMS, the XML payload that relates to the sample does not provide information about the workflow associations of the sample.
For example, consider a sample (artifact), picked at random, from a demo system:
It is evident that this XML payload does not provide the workflow information.
This following solution shows how to use the Clarity LIMS API to determine the association of a sample to one or more workflows.
The XML payload that corresponds to each sample artifact contains a link to the related submitted sample (or samples, if it is a pooled artifact).
Follow that link to see what it yields:
The XML corresponding to the submitted sample has a link to an artifact. This artifact is special for several reasons:
It is known as the 'root artifact'.
It has an unusual LIMS ID for an artifact. LIMS IDs start with '2-' for Analytes, and '92-' for ResultFiles. This one appears to be derived from the LIMS ID of the sample: KUZ407A145PA1
A root artifact is created 'behind the scenes' whenever a submitted sample is created in the system.
The sample history in Clarity LIMS makes it appear as if the first step in the workflow is run on the submitted sample. However, it is actually the root artifact that is the input to the first process.
When a submitted sample is assigned to the workflow, it is the root artifact that is assigned to that workflow.
Therefore, if gathering the XML payload corresponding to the root artifact, you should see the workflow assignment:
The key element is as follows.
The name of the artifact-group (Sanger Sequencing) should match the name of the workflow in which the root artifact (and by inference, artifacts derived from the root artifact) is assigned.
If you find that the artifact-group node is missing from some of the root artifacts, there are several potential reasons:
The workflow has been completed, causing the root artifact to be unassigned from the workflow.
The derived samples / artifacts have been removed from the workflow intentionally, because of a sample processing issue.
An API script has intentionally removed the derived samples / artifacts from the workflow.
The assigned workflow has been marked as 'Archived'.
This article explains how to make files that were produced by / attached to the LIMS in an earlier step, visible in a subsequent step.
Consider a simplified workflow / protocol containing just two steps: Produce Files and Display Files.
The first step, Produce Files, will take in analytes (derived samples), and generate individual result files (one per input).
The subsequent Display Files step will allow us to view the files associated with the analytes from the previous step.
After the files have been generated by and attached to the Produce Files step, the Record Details screen of the step displays the files.
The key to displaying these files in any subsequent step involves producing a hyperlink to the file and displaying it as a user-defined field (UDF)/custom field in subsequent steps.
You may be familiar with creating and using text, numeric, and checkbox UDFs/custom fields. However, you may be less familiar with the hyperlink option. Fields of this type are used less frequently, but they are perfect for this solution.
NOTE: As of Clarity LIMS v5.0, the term user-defined field (UDF) has been replaced with custom field in the user interface. However, the API resource is still called UDF.
This solution involves a script that runs on the Record Details screen on the subsequent Display Files step and populates the fields. See the following figure.
As you can see, the structure of the hyperlink is straightforward and includes:
The IP address / hostname of the server.
The port.
A link to the LIMS ID of the file to be linked to.
To populate these fields, there are numerous methods available within an API-based script. The method discussed here works for the two-step protocol described earlier (namely that we want the files displayed in the next step of the protocol). It also works when the steps in which the files are uploaded and displayed are separated by several intermediate steps.
Assuming that the script will run just as the Record Details screen of the Display Files step is being displayed, use pseudocode to produce the hyperlinks.
For each output:
Determine the LIMS Unique ID (LUID) of the output artifact.
Determine the LUID of the submitted sample associated with the output artifact.
Determine the LUID of the resultfile artifact produced by the earlier process, derived from the common submitted sample.
Determine the LUID of the file associated with the resultfile artifact.
Update the hyperlink UDF / custom field on the output artifact (from step 1) with the specific hyperlink value.
To illustrate these pseudocode steps, XML from a demo system is provided.
From the XML representation of the Display Files process/step. we see that we have three output artifact LUIDS: 2-81806, 2-81805 and 2-81804.
By examining the XML representation of the first output artifact (2-81806), we see the LUID of the associated submitted sample is ADM1301A2:
After the common ancestor is found, ask Clarity LIMS for the output artifacts produced by our step of interest (Produce Files) directly.
For example:
Yields the following XML:
The resultfile with LUID 92-81803 is associated with the current output artifact (2-81806), even though these entities may be separated by several steps.
If the process/step produces multiple resultfiles, you may need to further constrain the search using the name= parameter. For example:
By gathering the XML representation of artifact 92-81803, the associated file has LUID 40-3652:
Now that you know the LUID of the file associated with output artifact 2-81803, set the value of its hyperlink field in the following form:
When constructing the value for the hyperlink, the 40- prefix should be removed from the LUID of the file.
If a BaseSpace Clarity LIMS script is run in an automation context, it is easy to obfuscate usernames and passwords by choosing the appropriate tokens ({username} or {password}) to be passed in as run-time arguments.
However, this type of functionality is not easily available outside of automations, and it is often necessary to store various credentials on machines that need to interact with the LIMS API, database, or some other protected resource. This article explains how to use cryptography in Python to protect and obfuscate these important authentication tokens.
Many of the API Cookbook examples use a simple auth_tokens.py file that has usernames and passwords stored in plain text. This file can be compiled in Python, simply by importing it at a Python console:
Importing this file creates an auth_tokens.pyc file—a byte-compiled version of the source file. The source file can now be deleted, providing the first rudimentary level of security. However, the credentials can still quite easily be retrieved. Even if the permissions on this file are restricted, this solution does not present a suitable level of security for most IT administrators. It does, however, allow us to easily prototype our code, hence its use in Cookbook examples.
You have pycrypto installed (either through the OS package manager or pip).
You have generated a secret key of random ASCII characters (the easiest way to do this is to button-mash on a US-layout keyboard and include a lot of symbols).
You already have a plain-text auth_tokens.py file. An example is attached at the bottom of this article.
You have access to the Python or iPython command line console.
Python provides the pycrypto library that can easily be installed using the operating system's package manager, or the pip installation tool. It contains myriad different encryption algorithms and gives us a straightforward interface to wrap our own encryption objects and accessor functions.
The goal is to be able to create a flat text file containing obfuscated usernames, passwords, hostnames, and so on. To do this, use a utility class called ClarityCred that provides encryption and decryption functionality using the ARC4 cipher from pycrypto. The ClarityCred class is provided in cred.py, attached at the bottom of this article.
While the use of ARC4 is considered deprecated in favor of stronger encryption algorithms, such as AES, the ARC4 example lends itself to easier understanding. ARC4 simply requires a secret key and a salt size to be specified. The secret key can be generated at random using any preferred method and is hard-coded in cred.py, along with the salt size purely for ease of demonstration. Ideally, the secret key and salt size should be stored externally.
After applying the ARC4 encryption, the ClarityCred class wraps base64 encoding around it to obfuscate the data further.
Assume that you need to store a username, password, and hostname inside our auth_tokens.py, and we have this information in plain-text stored in another file called auth_tokens_plain.py. The usage is as follows.
Open a Python console, and import ClarityCred from cred.py.
Call the ClarityCred.encrypt() static function on the plain text username, password, and hostname strings.
Copy-paste these values into auth_tokens.py.
The following image illustrates steps 1 and 2, using an existing auth_tokens_plain.py file:
The old auth_tokens_plain.py looked like this:
username = 'testuser' password = 'testpass' hostname = 'https://encryptiontest.claritylims.com'
The new auth_tokens.py looks like this:
username = 'zq1AwnqIkfA=$YFY1UuO1r6edu7qPnN9/l3kMI15ZG1JAsH7IhnxnNvYulMndhYh6lxjVBfFwjN9sZEqPM0Qlx6kjq3fbht/FlRrgklDL79H7NiUP6uYM2qVltPloRA4g8SiphF3KHx4gVTE93Ku58sFCgu1rnH5u6tkCz98v0R7PsuIOW1CDMi9zSToIu+IkcYDPPYcD1b4z8ojez/7lczunaDfrmPhwopyyUiETu9BR49Bwp5fz4XSWICZFGCd9AjoEg/FTE+/X18f+0pIz0viXQyN+JjE3vJkpNsRY2Z3d72sPgQmFFZhd48m+POUtD1UXLXhaijdxp78QTcEp7AHY+TiM8hsXT7BX1Q=='
password = '9qW5BftGyXY=$6GL1t/Zl1CbSmB7Qq54uf2TJ5fI8GUlW9NdBnumkTtF/X27WLEsr1+C0ilXQX6jnLm4kzR+5pCVgnz4xz6/80/dMLMlTll6tOvCJgPU4ZkRpkUYmcPVbrp+X3azR7I024O8UjV/JeJYV869h3kvdPyWJGXRH4oJgs5NTJKI2y6URBs0wlrlgBuZ2YkO855ZGPw9J07UMM606q9xERRzQ+LT1XLRzSCuFnuSoDVEhshhYqZ/jpYWDHvA6Z5+YTYI/i099iYZ+WQdJAiU9hcgkUnWCybjcwivvHG6vAIROroLqlOefo+hrJsVFBA3uDaPS8pkgMVsKMPUGeft6vx4NgN/jaw==
hostname = 'Q+oyq2m9Nv8=$rhgeJOMdm/M+dDNlSbBA3RCsUoo0Ts65G7lePvuajRmsLSNC5Qo5bwagRuyat0ztpeZrUmD8xTxTvhUBvZYDlM6GBLsq5drBP6PFh/lplxb6O8YiSRXrboFov8tRnu6GbaTfGR8WV7s8vBZsXhrhlPn67p7yalJLnHWb9VOKhx8AgCTtytQkkEwmpm2vbDwDha9kMdK63IrOSp2jmRaI/9X3xsd4upqaxvX7zrEJ8ruGU/szN0ITxTK1rprnowpyXfBRiOEcrI7uh1bg73oqOETn3pB/uTrGkhGETKYB2aHaewwWMccbeZTgEPT0kDmuJdpoGYy+p+gxSoR9Arh3JtREIA=='
Examples of the plain-text auth_tokens_plain.py and encrypted auth_tokens.py are attached at the bottom of this article.
Now that the new auth_tokens.py is ready to use, you can import it and create the corresponding PYC file to provide that extra level of security, as previously discussed. You can remove the PY file and ship the PYC file everywhere it is required.
It may also be a good idea to restrict the read/write/execute permissions on the file to the system user that is calling the file (usually glsai in Clarity LIMS installations).
To use the values in this file in the code, we need to use the decrypt() function in ClarityCred. Look at the simple example of initializing a glsapiutil api object. For reference, the example current directory listing looks like this:
Notice the .py source files are removed wherever possible.
Using a Python console, the normal api invocation (using a plain-text auth_tokens file) would look as follows.
import glsapiutil import auth_tokens_plain api = glsapiutil.glsapiutil2() api.setHostname( auth_tokens_plain.hostname ) api.setVersion( 'v2' ) api.setup( auth_tokens_plain.username, auth_tokens_plain.password )
Now, however, with our encrypted tokens, we decrypt the values on-the-fly (changes shown in italicized red text):
import glsapiutil import auth_tokens from cred import ClarityCred api = glsapiutil.glsapiutil2() api.setHostname( ClarityCred.decrypt( auth_tokens.hostname ) ) api.setVersion( 'v2' ) api.setup( ClarityCred.decrypt( auth_tokens.username ), ClarityCred.decrypt( auth_tokens.password ) )
This method provides a relatively robust solution for encrypting and obfuscating sensitive data and can be used in any Python context, not just for Clarity LIMS API initialization. By further ensuring that only the auth_tokens.pyc file is shipped and copied with restricted read/write/execute permissions, this method should help satisfy IT security requirements.
However, the matter of storing the secret key externally remains. One idea is to store the secret key in a separate file and encrypt that file using openssl or an OpenPGP key. While the problem of storing each piece of information in encrypted format likely never fully goes away, the use of multiple methods of encryption can offer better protection and peace of mind.
auth_tokens.py:
auth_tokens_plain.py:
auth_tokens_plain.py:
This section discusses methods for integrating BaseSpace Clarity LIMS with upstream sample accessioning systems.
The following illustration shows a typical architectural overview:
Required:
A sample must have a Name / ID
A sample must be associated with a Case / Patient / Study / Project
A sample must be associated with a Container (Tube / Plate etc)
Optional (but expected):
User-defined fields (UDFs)/custom fields (defined by your LIMS configuration)
Typical flowchart of actions within the broker:
The following animation illustrates the elements of an XML sample-creation message to Clarity LIMS.
Build your own:
Pro: Not too difficult
Con: Stability as number of messages increases
?: Maintainable over the long-term
Use a commercial / open-source offering (e.g. Mirth Connect)
Pro: Quicker than build
Pro: Robust, multi-threaded support for millions of messages per day
?: May prove to be an excessive or over-complicated means to accomplish something relatively simple
Does the broker need to carry out other business logic?
For example, one customer added logic to their broker that dealt with medical billing and was able to distinguish between physicians ordering duplicate tests for a subject (not reimbursable, therefore the duplicate sample wasn’t submitted to Clarity LIMS), versus a temporal study that was reimbursable.
The best practice is to take advantage of as many legacy systems as possible, rather than creating samples in Clarity LIMS, then reinventing business logic to remove unwanted ones.
A lab may receive samples submitted from various sources. This can pose a problem with regards to sample names.There may be duplicate sample names, and/or various name formats, all of which make it hard for lab scientists to recognize a sample.
Clarity LIMS programmers often rename all incoming samples to a certain naming convention.
This section provides an example to address this problem.
When accepting a project and its samples, the receiving lab scientist runs a Clarity LIMS step named Receive Samples.
The underlying Receive Samples process type / master step is configured with analyte (sample) inputs, and no analyte outputs.
A shared result file output is configured to capture logging from the script.
The sample name could be a derivative of the Sample LIMSID, with a prefix:
Because the LIMSID is guaranteed to be unique, this approach mitigates any need to maintain an external sequence of numbers.
The Sample LIMSID is derived from the Project LIMSID, which is configurable.
The Receive Samples process is configured to trigger a script that renames the samples that are input to the process.
This trigger also passes the OriginatingProcessURI to the script. This example assumes that the original submitted sample name must be preserved, and so it is saved in a sample UDF.
The following pseudo code shows how one might implement the sample-renaming script:
Connect to the API, using the OriginatingProcessURI.
Retrieve the OriginatingProcessXML and store it in a variable.
Iterate through the inputoutput map of the OriginatingProcessXML, and for each InputArtifact:
GET the InputArtifactURI and store the input ArtifactXML in a variable.
From this ArtifactXML, GET the SourceSampleXML and store it in another variable.
Modify the SourceSampleXML. To do this:
Rename the SampleName to a desired name (see Recommendations section, above).
Finally, PUT the Sample XML back.
The incoming message contains the following:
Project ID or Name
Sample ID or Name
Container ID or Name
Container type (plate / tube type)
Container well position (if sample is on a plate) eg G:2
Sample user-defined fields (UDFs) / custom fields
POST to https://your_server/api/v2/samples:
We receive something like the following:
POST to https://your_server/api/v2/projects
We receive something like the following:
POST to https://your_server/api/v2/containers:
We receive something like the following:
POST to https://your_server/api/v2/containers:
We receive something like the following:
POST to https://your_server/api/v2/samples:
We receive something like the following:
GET: https://your_server/api/v2/projects?name=Week%2039
If the project exists, we receive something like the following:
If the project does not exist, we receive something like the following:
GET: https://your_server/api/v2/containers?name=Example%20Container%2020140910
If the container exists, we receive something like the following:
If the container does not exist, we receive something like the following:
Use the API to update the preset value of a user-defined field (UDF)/custom field configured on a step.
From your test server:
GET a chosen UDF/custom field.
Do a PUT and include a new line.
For example, to add 'My new preset', insert the preset (My new preset), after your last value in your XML:
This tool is powerful when integrating with external systems and combined with the Begin Work trigger. For example, it can be used to reach out to an external source with a script, initiated with the Begin Work trigger. The script makes sure that the presets for the Step Details UDFs/custom fields are always up to date and in sync with the server—before entering the Record Details screen.
How to copy the value of a UDF/custom field from source to destination (typically from the inputs of a process/step to the outputs) is a frequently asked question.
For example, suppose a process/step takes in libraries and tracks their normalization. In such a case, the input samples have a UDF/custom field that is used to track the library ID. Since the library ID changes, it is desirable for the output samples to also have this ID.
Use the API to gather the XML for the inputs, then copy the XML node relating to the UDF/custom field to the outputs.
Alternatively, use the out-of-the-box copyUDFs script, which Illumina provides as part of the NextGen Sequencing configuration.
The copyUDFs script is available in the ngs-extensions.jar archive*, and can be called from the EPP / automation parameter string.
The archive file may be named differently, depending upon the version you are running.
Usage:
The UDF / custom field values to be copied are defined in the -f portion of the syntax. These values must be present on both the inputs and outputs of a process.
For example, suppose you wanted to use this script to copy the value of a UDF called Library ID:
The Library ID field must be defined on both inputs and outputs.
The -f flag is defined as follows:
To copy multiple UDF values from source to destination, list them in comma-separated form as part of the -f flag.
To copy Library ID and Organism from source to destination, use the following example:
When running the Aggregate QC step in Clarity LIMS, the QC pass and fail flags for the samples display in the Record Details screen.
This section explains how to use the API instead to find the samples that passed or failed QC aggregation.
Query the API and filter the results list based on the qc-flag parameter value. For more on filtering, see section.
To filter the list by QC flag with a value of PASSED, use the following example:
To find an individual QC flag result for an individual sample, use the LIMS ID of the sample:\
Then search for the value of the element of the endpoint payload for the artifact.
The <qc-flag> element of the input analyte (sample) artifact is sent into the Aggregate QC step.
To demonstrate this detail, review the following steps:
In the API, find a single analyte artifact (derived sample) that has passed QC. The XML QC flag value is PASSED.
In Clarity LIMS, find the same sample and change the value of the element from PASSED to FAILED. Save the change.
In the API, find the sample again. See that the XML QC flag value is set to FAILED.
When a sequencing run is complete, it is often desirable to pass data to CASAVA for BCL conversion automatically rather than manually. This section proposes a method to configure this automation.
NOTE: This solution is not tested end-to-end on an instrument.
The proposed approach involves adding an automation trigger to the Sequencing step, such that it invokes a script that launches the BCL Conversion step.
However, because the BCL Conversion step does not run immediately, it is launched in a dormant state until the Sequencing step is complete.
The key event here is the Run Report that is created and attached to the Sequencing step. As the last event to occur in the step, the creation of this report is used to prompt the BCL Conversion step to 'wake up' from its dormant state and begin processing.
The following pseudocode describes the work that must occur within the script:
A required script that launches the BCL Conversion step via the API might be absent. The creation of such a script is covered in . This example only covers the functionality of the script rather than code.
In addition to the expected processURI, username, and password parameters/tokens, the script should accept another parameter (the LIMSID of the Run Report from the Sequencing step).
For example, the script can be invoked as follows:
Use this syntax when configuring the command line on the Sequencing process/step.
Configure the automation so the script is automatically triggered when exiting the Record Details screen.
The BCL Conversion process is configured:
To take in a ResultFile input and generate a non-shared ResultFile output
With a process parameter of 'Standard,' which initiates the actual BCL conversion.
The script is passed the value '92-3771' as the -r parameter.
This is then converted to a full URI and became the input element of the following XML, which is POSTed to the /processes API resource:
Update all URIs in the XML to point to the hostname and API version for the system.
Provide a valid URI for the lab scientist. There might be a user in the system with LIMS ID of '1'.
If the POST is successful, the API returns the valid XML for the created process.
Note: This scenario is one of the few occasions where the POST succeeds, yet returns XML that differs from the input XML. The results can be confusing, because a standard approach for validating whether POSTs are successful is to compare the output XML with the input. If they differ, assume that the POST failed. However, in this scenario it did not fail.
${baseName}.dirServer-side configuration allows for configuration of multiple filestores to be associated to entities (samples, projects, processes/steps, and so on) in BaseSpace Clarity LIMS.
This feature allows for linking to large data files on a different server, eliminating the need to move large files on the Clarity LIMS filestore. Large files can include results, images, and searches, and so on.
For example, sequencing instruments typically produce large result files. Attaching these files to the Sequencing step in Clarity LIMS results in the following drawbacks.
Involves transferring the files to the Clarity LIMS filestore. The larger the file, the slower the transfer speed.
Requires a large amount of space as runs build up.
An alternative solution is to set up a remote filestore to be used as the results directory from which Clarity LIMS accesses the files directly.
To do this setup, three steps are required:
Set up HTTP, HTTPS, FTP, or SFTP access to the files and folders you wish to share.
Configure the Clarity LIMS server to recognize the URI of a file on the remote filestore.
POST information to Clarity LIMS, via the REST API, to reference the file from a Clarity LIMS entity (project, sample, process/step, result file, and so on).
BaseSpace Clarity LIMS can operate with many different forms of file servers – HTTP, HTTPS, FTP, and SFTP access are all supported.
It is your responsibility to set up this access. For HTTP, you may be interested in httpd or HFS for HTTP file serving.
To track a new remote filestore, Clarity LIMS requires four database properties: directory, host, port, and scheme.
The four properties share a base name, but have different suffixes attached (dir, host, port, scheme). These suffixes are summarized in the following table.
The base name can be anything. Clarity LIMS finds any base names that end in .scheme and uses that base name to find the other information.
If necessary, add the last three properties listed in the table (with the .domain, .user, and .password suffixes) to specify a domain, username, and password to be used when accessing files.
Clarity LIMS v5 and later—For the property changes to take effect, Tomcat must be restarted.
Use the omxprops-ConfigTool.jar to create, update, and retrieve values of the database properties. This tool is found at the following location: /opt/gls/clarity/tools/propertytool
To create a property, use the following examples:
NOTE: These properties may not be global properties. Do not use the -g property here.
To get the value of an existing property:
To update the value of an existing property:
To encrypt a password:
NOTE: To set a property to the encrypted result, set the value as ENC().
The following example maps a remote HTTP URI: http://YourHTTPHost:80/limsdata/LegacyFile.RAW
In this case, the base name for the properties is http-lims-files.
Steps
As the glsjboss user, access the omxprops property tool in /opt/gls/clarity/tools/propertytool.
Add the following dir, host, port, and scheme properties to the server from the command line:
In the example above, the <http-lims-files.dir> parameter value is /limsdata. Any file in http://<YourHTTPHost/limsdata/ is available to be referenced by BaseSpace Clarity LIMS.
For all files on the web server to be available, use the / parameter value, for example:
After the filestore properties are added to Clarity LIMS (and JBoss/Tomcat has been restarted, as applicable), you can attach the files to Clarity LIMS.
To attach the files to Clarity LIMS:
POST to http://hostname/api/v2/files, with the content-location tag pointing to the remote filestore.
An example XML POST is provided, using the filestore created in the previous example:
Results
The file is now downloadable directly from Clarity LIMS.
Any entity that can have a file attached to it may be referenced in the parameter.
For more information on working with files, see Work with Files.
Property Name | Usage | Example Value | Required | Description |
---|---|---|---|---|
Directory
${baseName}.dir
/limsdata
True
The highest level directory in which it is valid to access files. Files outside this directory are not attached.
Hostname/IP
${baseName}.host
YourHTTPHost
True
The hostname or IP address to use when accessing the files.
Port
${baseName}.port
80
True
The port to use when accessing the files.
Scheme
${baseName}.scheme
http
True
The scheme of the URI used to access the files. Examples are HTTP, HTTPS, FTP, and SFTP.
Domain
${baseName}.domain
YourAuthDomain
False
The domain to use when authenticating access to the files.
Username
${baseName}.user
fileUser
False
The username to use when authenticating access to the files.
Password
${baseName}.password
filePassword
False
The password to use when authenticating access to the files.
It is highly recommended that you encrypt your password. See the following section for details.
This topic explains how to:
Detect when files have have been uploaded.
Extract the key information that might comprise a notification.
The Files API Resource
The key resource to investigate is the files resource, which provides a listing of files within the system.
On a test system accessing the files resource as follows:
produces the following output:
Although not particularly useful in itself, the files URI becomes more interesting when we filter it to only include files uploaded after a specified date-time, and also only those files that have a published status of 'true'.
For example, the following URI:
produces this output on a test system:
This outcome is much more manageable. Because they are uploaded via the Collaborations Interface, they inherently have a published status of 'true'. We use this status to exclude regular files uploaded to the LIMS via other methods and interfaces.
By following the URIs to retrieve the full XML representations of these files, the output is similar to the following:
and:
Retrieve the associated project/sample, and extract the names and/or IDs to embed into the notification, by following the URI in the 'attached-to' elements.
In this case, the following result is produced:
and:
A script must be run periodically (hourly/daily) that queries the files resource for files that have a published status of true, and are last modified in the period of interest.
After this list of files is retrieved, the following pseudocode can be applied:
An example derived from the above XML could lead to the following notifications: