In Clarity LIMS, you want to process multiple entities. To accomplish this quickly and effectively, you can use batch operations, which allows you to retrieve multiple entities using a single interaction with the API, instead of iterating over a list and retrieving each entity individually.
Batch operations greatly improve the performance of the script. These methods are available for containers and artifacts. In this example, both entities are retrieved using the batchGet() operation. If you would like to update a batch of output analytes (derived samples), you can increase the script execution speed by using batch operations. For more information, refer to Work with Batch Resources and Introduction to Batch Resources.
Before you follow the example, make sure that you have the following items:
Several samples have been added to the LIMS.
A process / step that generates derived samples in containers has been run on the samples.
A compatible version of API (v2 r21 or later).
When derived samples ('analyte artifacts' in the API) are run through a process / step, their information can be accessed by examining that process / step. In this example, we will retrieve all of the input artifacts and their respective containers.
To do this effectively using batch operations, we must collect all of the entities' URIs. These URIs must be unique, otherwise the batch operation will fail. Then, all of the entities can be retrieved in one action. It is important to note that only one type of entity can be retrieved in a call.
To retrieve the process step information, use the GET method with the process LIMS ID:
To retrieve the artifact URIs, collect the inputs of the process's input-output-map. A condition of the batchGET operation is that every entity to get must be unique. Therefore, you must call unique on your list.
You can now use batchGET to retrieve the unique input analytes:
The same can be done to gather the analytes' containers:
You have collected the unique containers in which the artifacts are located. By printing the name and URI of each container, an output similar to the following is obtained.
To retrieve the step information, use the GET method with the step LIMS ID:
To retrieve the artifact IDs, collect the inputs of the step's input-output-map. A condition of the batch retrieve operation is that every entity to get must be unique. To do this, you add the LUIDs to a set().
You can now use the function getArtifacts(), which is included in the glsapiutils.py to retrieve the unique input analytes:
UsingBatchGet.groovy:
Batchexample.py:
For a general overview of batch resources, refer to Introduction to Batch Resources.
When working with batch resources, you can do the following:
The powerful batch resources included in the Clarity LIMS Rapid Scripting API significantly increase the speed of script execution by allowing batch operations on samples and containers. These resources are useful when working with multiple samples and containers in high throughput labs.
The following simple example uses batch resources to move samples from one workflow queue into another queue.
It is useful to review the Work with Batch Resources.
Use a batch retrieve request to find all the artifacts in an artifact group, and then use a batch update request to move those artifacts into another artifact group.
The following steps are required:
Find all the artifacts that are in a particular artifact group.
Use the artifacts.batch.retrieve (list) resource to retrieve the details for all the artifacts.
Use the artifacts.batch.update (list) resource to update the artifacts and move them into a different artifact group, posting them back as a batch.
NOTE: The only HTTP method for batch resources is POST.
Before you follow the steps, make sure that you have the following items:
Clarity LIMS contains a collection of samples (artifacts) residing in the same workflow queue (artifact group)
A second queue exists into which you can move the collection of samples
A compatible version of API (v2 r21 and later).
In the REST API, artifacts are grouped with the artifact group resource. In Clarity LIMS, an artifact group is displayed as a workflow. Workflows are configured as queues, allowing lab scientists to locate samples to work with on the bench quickly.
To find the samples (artifacts) in a workflow queue (artifact group), use the following request, editing the server details and artifact group name to match those in your system:
This request returns a list of URI links for all artifacts in the artifact group specified. In our example, the my_queue queue contains three artifacts:
To retrieve the detailed XML for all of the artifacts, use a <links> tag to post the set of URI links to the server using a batch retrieve request:
This returns the detailed XML for each of the artifacts in the batch:
The XML returned includes the artifact group name and URI:
<artifact-group name="my_queue" uri="http://your-server-ip/api/v2/artifactgroups/1"/>
To move the artifacts into another queue, simply update the artifact-group name and URI values:
Finally, post the XML back to the server using a batch update request:
The Clarity LIMS API has batch retrieve endpoints for samples, artifacts, containers, and files. This article talks generically about links for any of those four entities.
When using the batch endpoints, you want to process upwards of hundreds of links. Intuitively, you may think that a single API call with all the links would be the fastest way to retrieve the data. However, analysis of the API performance shows that as the number of links increases beyond a threshold, the time per object increases.
To retrieve the data in the most efficient way, it is best to do multiple POSTs containing the optimal sized batch. A batch call takes longer than a GET to the endpoint of the sample to retrieve the data for a single sample (or other entity). However, after more than one or two samples are needed, the batch endpoint is more efficient.
Before you follow the example, make sure that you are aware of what the optimal batch size is based on the following information:
The optimal size is dependent on your specific server and the amount of UDFs / custom fields or other data attached to the object being retrieved.
The optimal batch size may be different for artifacts, samples, files, and containers. For example, if the optimal size for samples is 500, 10 batches of 500 samples will retrieve the data faster then one batch of 5000.
You must also have a compatible version of API (v2 r21 or later).
Attached below is a simple python script which will time how long batch retrieve take for an array of batch sizes. The efficiency is measured by the duration of the call divided by the number of links posted.
The attached script has hard coded parameters to define the range and increments of batch sizes to test. Additionally, the number of replications for each size is adjustable. These parameters are found on line 110, and may not require any modification since they are already set to the following by default:
For example, the above parameters will test the following sizes: 100, 125, 150, 175, 200, 225, 250, 275.
The parameters which will need to specific to your server are entered at the command line.
An example of the full syntax to invoke the script is as follows:
The script tracks how long each batch call takes to complete. The script outputs a .txt file with the raw numeric data and the batch size that returns the minimum value, and is the most efficient.
Viewing this data in a scatterplot format, you can see the range of optimal batch sizes for the artifacts/batch/retrieve endpoint is about 200 to 300 artifacts. This would be valid for artifacts only and each entity (eg, sample, file, or container) should be evaluated separately.
The shortest time per artifact is the most efficient batch size, as shown in the following example:
By default, LIMS configuration of send and receive timeout is 60 seconds. Very large batch calls will not complete if their duration is greater then the timeout configuration. This configuration is located at
BatchOptimalSizeTest.py:
As previously shown in Update UDF/Custom Field Values for a Derived Sample Output, you can update the user-defined fields/custom fields of the derived samples (referred to as analytes in the API) generated by a step. This example uses batch operations to improve the performance of that script.
As of Clarity LIMS v5, the term user-defined field (UDF) has been replaced with custom field in the user interface. However, the API resource is still called UDF.
Master step fields—Configured on master steps. Master step fields only apply to the following:
The master step on which the fields are configured.
The steps derived from those master steps.
Global fields—Configured on entities (eg, submitted sample, derived sample, measurement, etc.). Global fields apply to the entire Clarity LIMS system.
Before you follow the example, make sure that you have the following items:
A global custom field named Library Size that on the Derived Sample object.
A configured Library Prep step that applies Library Size to generated derived samples.
A Library Prep step that has been run and has generated derived samples.
A compatible version of API (v2 r21 or later).
In Clarity LIMS, the Record Details screen displays the information about the derived samples generated by a step. You can view the global fields associated with the derived samples in the Sample table.
The screenshot below shows the Library Size values for the derived samples.
Derived sample information is stored in the API in the analyte resource. Step information is stored in the process resource. Each global field value is stored as a udf.
An analyte resource contains specific derived sample details that are recorded in lab steps. Those details are typically stored in global fields, configured in the LIMS on the Derived Sample object and then associated with the step. When you update the information for a derived sample by updating the analyte API resource, only the global fields that are associated with the step can be updated.
To retrieve the process information, you can perform a GET on the created process URI, as follows:
You can now collect all of the output analytes and harvest their URIs:
After you have collected the output analyte URIs, you can retrieve the analytes with a batchGET() operation. The URIs must be unique for the batch operations to succeed.
You can now iterate through our retrieved list of analytes and set each analytes 'Library Size' UDF to 25.
To update the analytes in the system, call batchPUT(). It will attempt to call a PUT for each node in the list. (Note that each node must be unique.)
In the Record Details screen, the Sample table now shows the updated Library Size.
UsingBatchPut.groovy:
-u | username |
---|---|
-p
password
-s
hostname, including "/api/v2"
-t
entity (either: artifact, sample, file, container)