1 of 10

Lab Instrument Toolkit

Lab Instrument Toolkit provides a number of scripts that can be used for custom configuration.

The driver_file_generator script (Template File Generator) is a file-generation solution that produces custom template files without requiring scripting or development knowledge or resources.
The addBlankLines script allows for the creation of files that require a line entry for every well in a container, including those wells that are empty.
The convertToExcel script converts separated-value files (eg. CSV) to Microsoft Excel XLS or XLSX spreadsheet format.
The parseCSV script allows for the data for each well to be parsed into fields on either derived samples or measurement records that map directly to the derived samples being measured.
The parseXmlBySampleName script matches data in a result file to samples in the LIMS, using the measurement record LIMSID.
The PlacementHelper script automates sample placement according to a transfer file produced by a robot. The script covers a one-to-one, many-to-one (pooling), or one-to-many (replicates) mapping of samples for placement.

Template File Generator

Available from: BaseSpace Clarity LIMS v4.2.x

The Template File Generator is a file-generation solution that allows Clarity LIMS admins, such as lab managers, to produce custom template files without requiring scripting or development knowledge or resources.

At run time, the Template File Generator uses a script (driver_file_generator) and the supplied template file to generate a file. This may be a simple file that includes a subset of LIMS data, or a more complex sample sheet file for upload to the sequencing instrument to start a run.

The format of a template file is typically a comma-delimited CSV file. However, the following file formats are also supported: .bak, .groovy, .md5, .tsv, .txt, .xml.

How the Template File Generator Works

In Clarity LIMS:
- An automation is configured and enabled on a step.
- The driver_file_generator script is triggered from the automation command line.
The script uses the template file to generate a file, the contents of which are based on the specifications provided in the template.
The script extracts data from the LIMS via the API, based on tokens defined within the template file.
The script parses the template file and processes the sections, metadata, and tokens it contains. Sections may include header block, header, data, and footer, each of which is enclosed inside tags.

For example:

Creating Template Files

Available from: BaseSpace Clarity LIMS v4.2.x

You can create template files that the Template File Generator script (driver_file_generator) will use to generate custom files for use in your lab.

This article provides details on the following:

The parameters used by the script.
The sections of the template file—these define what is output in the generated file.
Sorting logic—options for sorting the data in the generated file.
Rules and constraints to keep in mind when creating templates and generating files.
Examples of how you can use specific tokens and metadata in your template files.

For a complete list of the metadata elements and tokens that you can include in a template file, see Template File Contents.

Script Parameters

Upgrade Note: process vs step URIs The driver_file_generator script now uses steps instead of processes for fetching information. When a process URI is supplied, the script detects it and automatically switches it to a step URI. (The PROCESS.TECHNICIAN token, which is only available on 'process' in the API, is still supported.) The behavior of the script has not changed, except that the long form -processURI parameter must be replaced by -stepURI in configuration. The -i version of this parameter remains supported and now accepts both process and step URI values. If your configuration is using -processURI or --processURI, replace each instance with -i (or -stepURI/--stepURI).

The following table defines the parameters used by the driver_file_generator script.

Command-line example:

bash -l -c "opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar script:driver_file_generator -i {stepURI:v2} -u {username} -p {password} -t /opt/gls/clarity/customextensions/InfiniumHT/driverfiletemplates/NextSeq.csv-o {compoundOutputFileLuid0}.csv -l {compoundOutputFileLuid1}"

Command-line example using -quickAttach and -destLIMS:

bash -l -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar script:driver_file_generator -i {stepURI:v2} -u {username} -p {password} -t /opt/gls/clarity/customextensions/Robot.csv -quickAttach true -destLIMSID {compoundOutputFileLuid0} -o extended_driver_x384.csv -l {compoundOutputFileLuid2}"

Data Source

The input-output-maps of the step (defined by the -stepURI parameter) are used as the data source for the content of the generated file.

If they are present, input-output-maps with the attribute output-generation-type=PerInput are used. Otherwise, all input-output-map items are used.

By default, the data source entries are sorted alphanumerically by LIMS ID. You can modify the sort order by using the SORT.BY and SORT.VERTICAL metadata elements (see Metadata section of the Template File Contents article).

Template Sections

The content of the generated file is determined by the sections defined in the template. Content for each section is contained within xml-like opening and closing tags that are structured as follows:

<SECTION>
    section content
</SECTION>

Most template files follow the same basic structure and include some or all the following sections (by convention, section names are written in capital letters, but this is not required):

<HEADER_BLOCK>
<HEADER>
<DATA>
<FOOTER>

The order of the section blocks in the template does not affect the output. In the output file, blocks will always be in the order shown.

The area outside of the sections can contain metadata elements (see Metadata section of the Template File Contents article). Anything else outside of the section tags is ignored.

The <PLACEMENT> and <TOKEN FORMAT> sections are not part of the list and do not create distinct sections in the generated file. Instead, they alter the formatting of the generated output.

HEADER_BLOCK

The header block section may include both plain text and data from the LIMS. It consists of information that does not appear multiple times in the generated file—ie, the information is not included in the data rows (see DATA section)

Tokens in the header block always resolve in the context of the first input and first output available. For example, suppose the INPUT.CONTAINER.TYPE token is used in the header block:

If there is only one type of input container present in the data source, that container type will be present in the output file.
If multiple input container types are present in the data source, only the first one encountered while processing the data will be present in the output file.

For this reason, we recommend against using tokens that will resolve to different values for different samples - such as SAMPLE.NAME. If one of these tokens is encountered, a warning is logged and the first value retrieved from the API is used. (Note that you may use.ALL tokens, where available.)

To include a header block section in a template, enclose it within the <HEADER_BLOCK> and </HEADER_BLOCK> tags.

HIDE feature: If one of the tokens of a line is empty and is part of a HIDE statement, that line will be removed entirely. See Using HIDE to Exclude Empty Columns and Using HIDE to Exclude Empty HEADER rows examples.

HEADER

The header section describes the header line of the data section (see DATA section). A simple example might be "Sample ID, Placement".

The content of this section can only include plain text and is output as is. Tokens are not supported.

To include a header section in a template, enclose it within the <HEADER> and </HEADER> tags.

HIDE feature: See 'Hide feature' in DATA section. Also note:

If multiple <HEADER> lines are present, at least one must have the same number of columns as the <DATA> template line.
<HEADER> lines that do not match the number of columns are unaffected by the HIDE feature.

DATA

Each data source entry creates a data row for each template line in the section. All entries are output for the first template line, then the next template line runs, and so on.

The data section allows tokens and text entries. All tokens are supported.

Note the following:

Duplicated rows are eliminated, if present. A row is considered duplicated if its content (after all variables and placeholders have been replaced with their corresponding values) is identical to a previous row. Tokens must therefore provide distinctive enough data (ie, something more than just CONTAINER.NAME) if all of the input-output entry pairs are desired in the generated file.
By default, the script processes only sample entries. However, there are metadata options that allow inclusion of result files/measurements and exclusion of samples.
Metadata sorting options are applied to this section of the template file only.
By default, pooled artifacts are treated as a single input artifact. They can be demultiplexed using the PROCESS.POOLED.ARTIFACTS metadata element.
If there is at least one token relevant to the step inputs or outputs, this section will produce a row for each PerInput entry in the step input-output-map. If no PerInput entries are present in the step input-output-map, the script will attempt to add data rows for PerAllInputs entries.
Input and output artifacts are always loaded if a <DATA> section is present in the template file, due to the need to determine what type of artifacts the script is dealing with.

To include a data section in a template, enclose it within the <DATA> and </DATA> tags.

HIDE feature: If the token in a given column is empty for all lines and that token is part of a HIDE statement, that column (including the matching <HEADER> columns) will be removed entirely. There can only be one <DATA> template line present when using the HIDE feature. See Using HIDE to Exclude Empty Columns and Using HIDE to Exclude Empty HEADER rows examples.

FOOTER

The content of this section can only include plain text and is output as is. Tokens are not supported.

To include a footer section in a template, enclose it within the <FOOTER> and </FOOTER> tags.

PLACEMENT

This section contains groovy code that controls the formatting of PLACEMENT tokens (see the PLACEMENT tokens in Template File Contents article Tokens table).

Within the groovy code, the following variables are available:

Note the following:

The script must return a string, which replaces the corresponding <PLACEMENT> tag in the template.
Logic within the placement tags can be as complex as needed, provided it can be compiled by a groovy compiler.
If an error occurs while running formatting code, the original location value is used.

To include a placement section in a template, enclose it within the <PLACEMENT> and </PLACEMENT> tags.

Placement Example: Container Type

In the following example:

If the container type is a 96 well plate, sample placement A1 will return as "A_1"
If the container type is not a 96 well plate, sample placement A1 will return as "A:1"

<PLACEMENT>
// The inputs to this segment are: String row, String column, Node containerTypeNode
if (containerTypeNode.@name == "96 well plate") return row + "_" + column
else return row + ":" + column
</PLACEMENT>

Placement Example: Zero Padding

<PLACEMENT>
// The inputs to this segment are: String row, String column, Node containerTypeNode
String zeroPad (String entry) {
if (entry.isNumber() && entry.size() == 1) return "0" + entry
return entry
}
return zeroPad(row) + ":" + zeroPad(column)
</PLACEMENT>

TOKEN FORMAT

This section defines logic to be applied to specific tokens to change the format in which they appear in the generated file.

Special formatting rules can be defined per token using the following groovy syntax:

${token.identifier}
…groovy code…
// or 
${token.identifier##Name}
…groovy code…

Within the groovy code, the variable 'token' refers to the original value being transformed by the formatting code. The logic replaces all instances of that token with the result.

${token.identifier} marks the beginning of the token formatting code and the end of the previous token formatting code (if any).

You can define multiple formatting logic rules for a given token, by assigning a name to the formatting section (named formatters are called 'variations'). This is done by appending “##” after the token name (eg “${token.identifier##formatterName}”).
Using the named formatter syntax without giving a name (“${token.identifier##}”) will abort the file generation.
If an error occurs while running formatting code, the resulting value will be blank.
If a named formatter is used but not defined, the value is used as is.

To include a placement section in a template, enclose it within the <TOKEN_FORMAT> and </TOKEN_FORMAT> tags.

TOKEN FORMAT Example: Technician Name

In this example, a custom format is defined for displaying the name of the technician who ran a process (step).

The name of the token appears at the beginning of the groovy code that will then be applied. In this code, the variable 'token' refers to the token being affected. The return value is what will replace all instances of this token in the file.

<TOKEN_FORMAT>
${PROCESS.TECHNICIAN}
def name = token.split(" ")
return "First name: " + name[0] + ", Last name: " + name[1]
</TOKEN_FORMAT>

TOKEN FORMAT Example: Appending a String to Container Name or Sample Name

In this second example, when special formatting is required for two tokens, the logic for both appear inside the same set of tags.

The example appends a string to the end of the input container name or a prefix to the beginning of the submitted sample name.

<TOKEN_FORMAT>
${INPUT.CONTAINER.NAME}
return token + "-PlateName"
${SAMPLE.NAME}
return "SN-" + token
</TOKEN_FORMAT>

Metadata

Metadata provides information about the template file that is not retrieved from the API — such as the file output directory to use, and how the data contents should be grouped and sorted.

Metadata is not strictly confined to a section, and is not designated by opening and closing tags. However, each metadata entry must be on a separate line.

Metadata entries can be anywhere in the template, but the recommended best practice is to group them either at the top or the bottom of the file.

For a list of supported metadata elements, rules for using them, and examples, see Template File Contents, Metadata section.

Sorting Logic

Sorting in the generated file is done either alphanumerically or by vertical placement information, using the SORT.BY. and SORT.VERTICAL metadata elements.

Sorting must be done using a combination of sort keys - provided to SORT.BY. as one or more ${token} values, each of which always produces a unique value in the file. For example, sorting by just OUTPUT.CONTAINER.NAME would work for samples placed in tubes, but would not work for samples in 96 well plates. Sorting behavior on nonunique combinations is not guaranteed to be predictable.

To sort vertically:

Include the SORT.VERTICAL metadata element in the template file. In addition, the SORT.BY.${token}, ${token} metadata must also be included, as follows:

SORT.BY.${OUTPUT.CONTAINER.ROW}${OUTPUT.CONTAINER.COLUMN}

Any SORT.BY. tokens will be sorted using the vertical sorter instead of the alphanumeric sort.

To apply sorting to samples in 96 well plates:

You could narrow the sort key to a unique combination such as:

SORT.BY.${OUTPUT.CONTAINER.NAME}${OUTPUT.CONTAINER.ROW}${OUTPUT.CONTAINER.COLUMN}

See also SORT.VERTICAL and SORT.BY. in the Template File Contents article.

Rules and Constraints

The template must adhere to the following rules:

Metadata entries must each appear on a new line and be the only entry on that line.
Metadata entries must not appear inside tags.
Opening and closing section tags must appear on a new line and as the only entry on that line.
Each opened tag must be closed, otherwise it is skipped by the script.
Any sections (opening tag + closing tag combination) can be omitted from the template file.
Entries that are separated by commas in the template will be delimited by the metadata-specified separator (default: COMMA) in the template file.
White space is allowed in the template. However, if there is a blank line inside a tag, it will also be present in the template file produced.
If an entry in the template is enclosed in double quotes it will be imported as a single entry and written to the template file as such, even if it has commas inside.
To include double-quotes or single-quotes in the template file, use the escape character: Example: \" or \'
To include an escape character in the template file, use two escape characters inside double-quotes. For example, if you want to see \\Share\Folder\Filename.txt use "\\\\Share\\Folder\\Filename.txt" as the token.

If any of the following conditions is not met - the tag, and everything inside it, is ignored by the script and a warning displays in the log file:

Except for the metadata, all template sections must be enclosed inside tags.
Each tag must have its own line, and must be the only tag present on that line.
No other entries, even empty ones, are allowed.
All opened tags must be closed.
Custom field names must not contain periods.

Examples

Illumina Instrument Sample Sheets

The LIMS provides configuration to support generation of sample sheets that are compatible with some Illumina instruments. For details, see the Illumina Instrument Sample Sheets documentation.

Generating Sample Sheets for QC Instruments

The LIMS provides configured automations that generate sample sheets compatible with a number of QC instruments. The default automation command lines are provided below.

Generate Bioanalyzer Driver File Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/bioA_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1} \
&& /opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar -u {username} -p {password} \
script:addBlankLines \
-i {stepURI:v2} \
-f {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1} \
-sep COMMA \
-b ',False,' \
-h 1 \
-c LIMSID \
-pre 'Sample '"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
<HEADER_BLOCK>
</HEADER_BLOCK>
<HEADER>
"\"Sample Name\",\"Sample Comment\",\"Rest. Digest\",\"Observation\""
</HEADER>
<DATA>
${OUTPUT.LIMSID},,False,
</DATA>
<FOOTER>
Ladder,,False,"\"Chip Lot #\",\"Reagent Kit Lot #\",
\"QC1 Min [%]\",\"QC1 Max [%]\",\"QC2 Min [%]\",\"QC2 Max [%]\"
,,,
\"Chip Comment\”
"
</FOOTER>

Generate NanoDrop Driver File Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/nd_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
<HEADER_BLOCK>
</HEADER_BLOCK>
<HEADER>
"Well Location, Sample Name"
</HEADER>
<DATA>
${INPUT.CONTAINER.PLACEMENT},${INPUT.CONTAINER.NAME}_${INPUT.CONTAINER.PLACEMENT}_${INPUT.NAME}
</DATA>

Generate Tapestation Input Sample Table CSV Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/tapestation_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
SORT.BY.${INPUT.LIMSID}
<DATA>
${OUTPUT.LIMSID}_${INPUT.NAME}
</DATA>

Create GenomeStudio Driver File Automation

Command line:

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar driver_file_generator \
-i {processURI} -u {username} -p {password} -t /opt/gls/clarity/extensions/conf/driverfiletemplates/GenomeStudioGeneExpressionTemplate.csv \
-o {compoundOutputFileLuid0}.csv -l {compoundOutputFileLuid1}.html"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
OUTPUT.SEPARATOR,COMMA
LIST.SEPARATOR,";"
ILLEGAL.CHARACTERS,COMMA
ILLEGAL.CHARACTER.REPLACEMENTS,_SORT.BY.${INPUT.CONTAINER.NAME}${INPUT.CONTAINER.ROW}${INPUT.CONTAINER.COLUMN}
<HEADER_BLOCK>
[HEADER]Investigator Name ${PROCESS.TECHNICIAN}
Project Name, ${SAMPLE.PROJECT.NAME.ALL}
Experiment Name
Date, ${DATE}
[Manifests]
${PROCESS.UDF.Manifest A}
</HEADER_BLOCK>
<HEADER>
[DATA]
Sample_Name,Sample_Well,Sample_Plate,Pool_ID,Sentrix_ID,Sentrix_Position
</HEADER>
<DATA>
${INPUT.NAME},,,,${INPUT.CONTAINER.NAME},${INPUT.CONTAINER.PLACEMENT}
</DATA>
<PLACEMENT>
// inputs to this section are String row, String column, Node containerTypeNode
int convertAlphaToNumeric(String letters) {
    int result = 0
    letters = letters.toUpperCase()
    for (int i = 0; i < letters.length(); i++) {
        result += (letters.charAt(i).minus('A' as char) + 1) * (26 ** (letters.length() - i - 1))
    }
    return result
}
int SENTRIX_POS_THRESHOLD = 12
int WELL_PLATE_SIZE_96 = 96
int xSize = containerTypeNode.'x-dimension'.size.text().toInteger()
int ySize = containerTypeNode.'y-dimension'.size.text().toInteger()
int containerSize = xSize * ySize
boolean xIsAlpha = containerTypeNode.'x-dimension'.'is-alpha'.text().toBoolean()
boolean yIsAlpha = containerTypeNode.'y-dimension'.'is-alpha'.text().toBoolean()
if (containerSize <= SENTRIX_POS_THRESHOLD && (xIsAlpha || yIsAlpha)) {
    return row
}
// R001_C001 for 96 well plate, r01c01 for other container types
if (containerSize == WELL_PLATE_SIZE_96) {
    def numFormat = java.text.NumberFormat.getNumberInstance() numFormat.setMinimumIntegerDigits(3)
    String xStr = numFormat.format(column.isInteger() ? column as int : convertAlphaToNumeric(column))
    String yStr = numFormat.format(row.isInteger() ? row as int : convertAlphaToNumeric(row))
    // Row is mapped to x coordinate, while column is mapped to y.
    // When creating an array type of size 96, swap the row and column dimension.
    // e.g 12 x 8 array should be mapped as an 8 x 12 array
    //
    // This mapping has been in RI for a while.
    // In AddIlluminaArraysStep, all 2D Illumina arrays added have a dimension of 8 x 12.
    // This driver file template then converts it back to 12 x 8.
    // This logic is now corrected to follow other arrays to make sure the driver file.
    // generated is compatible with existing arrays and software.
    return "R"+xStr+"_C"+yStr
} else {
    def numFormat = java.text.NumberFormat.getNumberInstance()
    numFormat.setMinimumIntegerDigits(2)
    String xStr = numFormat.format(column.isInteger() ? column as int : convertAlphaToNumeric(column))
    String yStr = numFormat.format(row.isInteger() ? row as int : convertAlphaToNumeric(row))
    // row is mapped to y, column is mapped to x
    return "r"+yStr +"c"+xStr
}
</PLACEMENT>

Renaming Generated Files

In the template file, the following OUTPUT.FILE.NAME metadata element renames the generated template file 'NewTemplateFileName':

OUTPUT.FILE.NAME, NewTemplateFileName.csv

In the automation command line, the following will attach the generated file to the {compoundOutputFileLuid0} placeholder, with the name defined by the OUTPUT.FILE.NAME metadata element.

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar \
script:driver_file_generator \
-i {stepURI:v2} \
-u {username} \
-p {password} \
-t /opt/gls/clarity/customextensions/Robot.csv \
-q true -destLIMSID {compoundOutputFileLuid0} \
-o extended_driver_x384.csv \
-l {compoundOutputFileLuid2}"

If the quickAttach parameter is provided without destLIMSID parameter, the script logs an error and stops execution.
If destLIMSID is provided without using quickAttach, it is ignored.

Using Token Values In File Names

The OUTPUT.FILE.NAME and OUTPUT.TARGET.DIR metadata elements support token values. This allows you to name files based on input / output values of the step - the input or output container name, for example.

The following tokens are supported for this feature:

PROCESS.LIMSID
PROCESS.UDF.<UDF NAME>
PROCESS.TECHNICIAN
DATE
INPUT.CONTAINER.NAME
INPUT.CONTAINER.TYPE
INPUT.CONTAINER.LIMSID
OUTPUT.CONTAINER.NAME
OUTPUT.CONTAINER.TYPE
OUTPUT.CONTAINER.LIMSID

Rules and Constraints

When using token values in file names, the following rules and constraints apply:

Container-related functions will return the value from a single container, even if there are multiple containers.
Other tokens will function, but will only return the value for the first row of the file (first input or output).
If the OUTPUT.FILE.NAME specified does not match the LIMS ID of the file, the output file will not be attached in the LIMS user interface. To ensure that the file is attached, include the quickAttach and destLIMSID parameters in the command-line string.
It is highly recommended that you do not use SAMPLE.PROJECT.NAME.ALL or SAMPLE.PROJECT.CONTACT.ALL, because the result is prone to surpassing the maximum length of a file name. There are similar issues with other SAMPLE tokens when dealing with pools.
Only the following characters are supported in the file name. Any other characters will be replaced by an _ (underscore) by default. This replacement character can be configured with the OUTPUT.FILE.NAME.ILLEGAL.CHARACTER.REPLACEMENT metadata element.
- a-z
- A-Z
- 0–9
- _ (underscore)
- - (dash)
- . (period)

Defining a Project Name for Control Samples

You can use the CONTROL.SAMPLE.DEFAULT.PROJECT.NAME metadata element to define a project name for control samples. The value specified by this token will be used when determining one or more values for the SAMPLE.PROJECT.NAME and SAMPLE.PROJECT.NAME.ALL tokens.

Example:

CONTROL.SAMPLE.DEFAULT.PROJECT.NAME, My Control Sample Project

Rules and Constraints

If the token is found in the template, but with no value then no project name will be given for control samples.
If the token is not found in the template, then no project name will be given for control samples.
If multiple values are provided, the first one will be used.
The SAMPLE.PROJECT.NAME.ALL list will include the control project name.

Using HIDE to Exclude Empty Columns

You can use tthe HIDE metadata element to optionally hide a column if it contains no data. The following lines in the metadata will hide a data column when empty:

HIDE, ${OUTPUT.UDF.SAMPLEUDF}, IF, NODATA

Assuming ${OUTPUT.UDF.SAMPLEUDF} is one of the data columns specified in the template, then that column will be hidden whenever there is no data to show in the output file. If a list of fields is provided, then any empty ones will be hidden:

HIDE, ${OUTPUT.UDF.SAMPLEUDF},${PROCESS.TECHNICIAN}, ${PROCESS.LIMSID}, IF, NODATA

You may also hide only one representation of a specific column or field:

HIDE, ${PROCESS.TECHNICIAN##FirstName}, IF, NODATA

Using HIDE to Exclude Empty HEADER rows

You can also use the HIDE metadata element with tokens in the header section. If one or more tokens are used for a header key value pair, and there are no values for any of the tokens, the entire row will be hidden.

Assuming ${OUTPUT.UDF.SAMPLEUDF} is one of the rows specified in the template header section, that header row will be hidden whenever there is no data to display in the output file.

If a list of tokens is provided for the value, the row will only be shown if one or more of the tokens resolves to a value:

HIDE, ${OUTPUT.UDF.SAMPLEUDF},${PROCESS.TECHNICIAN}, ${PROCESS.LIMSID}, IF, NODATA

Generating Multiple Files

If you would like to generate multiple files, you can use the following GROUP.FILES.BY metadata elements:

GROUP.FILES.BY.INPUT.CONTAINERS
GROUP.FILES.BY.OUTPUT.CONTAINERS

These elements allow a file to be created per instance of the specified element in the step, for example, one file per input or per output container. Step level information appears in all files, but sample information is specific to the samples in the given container.

For example, suppose that a step has two samples - each in their own container - with a template file calling for information about process UDFs and sample names. Using this metadata will produce two files, each of which will contain:

One sample entry
The same process UDF information

As a best practice, we recommend storing a copy of generated files in the LIMS. To do this, you must use the quickAttach script parameter. This parameter must be used with the destLIMSID parameter, which tells the Template File Generator script which file placeholder to use. (For details, see Script Parameters.)

Naming The Files

When generating multiple files, the script gathers them all into one zip file so only one file placeholder is needed regardless of how many containers are in the step.

The zip file name may be provided in the metadata as follows:

GROUP.FILES.BY.INPUT.CONTAINERS,<zip file name>

GROUP.FILES.BY.OUTPUT.CONTAINERS,<zip file name>

Inside the zip file, include any paths specified for where files should be written. An example final structure inside the zip, where the subfolders are specified using the container name token, could be as follows:

GROUP.FILES.BY.INPUT.CONTAINERS,MyZip.zip
MyZip.zip\
      \-- Container1\
              \-- SampleSheet.csv\
      \-- Container2\
              \-- SampleSheet.csv

The file naming, writing, and uploading process works as follows:

The outputPath parameter element is required for the script. You can use this parameter to specify the path to which the generated files will be written and/or the name to use for the file. Use this in the following scenarios:
- When the target path/name is constant OR
- When the target path/name includes something that can only be passed to the script via the command line - for example, if you want to include the value of a {compoundOutputFileLuidN} in the path.
The OUTPUT.TARGET.DIR metadata element overrides any path provided by outputPath, but does not change the file name. Use this:
- When the target path includes something that can only be accessed with token templates - for example, the name of the user who ran the step.
The OUTPUT.FILE.NAME metadata element overrides any value provided by outputPath entirely. This token determines the name of the files that are produced for each container - for example, SampleSheet.csv. It may also contain tokens to access information, such as the container name, and it may also contain a path.

If you provide all three of outputPath, OUTPUT.TARGT.DIR, and OUTPUT.FILE.NAME, the result is that outputPath is ignored and the path specified by OUTPUT.TARGET.DIR is used as the parent under which OUTPUT.FILE.NAME is created, even if OUTPUT.FILE.NAME includes a path in addition to the file name.

If you wish to only attach files to placeholders in the LIMS and do not wish to also write anything to disk, then omit OUTPUT.TARGET.DIR and provide the outputPath parameter value as ".". This will cause files to only be written to the temporary directory that is cleaned up after the automation completes.

To produce the example of MyZip.zip, you could use the following:

Script parameters:

-outputPath SampleSheet.csv
-q 'true'
-destLIMSID {compoundOutputFileLuid0}

Template:

GROUP.FILES.BY.OUTPUT.CONTAINERS,MyZip.zip
OUTPUT.TARGET.DIR,${OUTPUT.CONTAINER.NAME}

Rules and Constraints

You can only use one GROUP.FILES.BY metadata element in each template file.
To attach the files in the LIMS as a zip file, you must provide the quickAttach parameter along with the destLIMSID.
The zip file name may optionally be specified with the GROUP.FILES.BY metadata.
If quickAttach is used and no zip name is specified in the template, the zip will be named using the destLIMSID parameter value.
The zip file name, file paths, and file names should not contain characters that are illegal for directories and files on the target operating system. Illegal characters will be replaced with underscores.
If a file name is not unique to the target directory, e.g., if multiple SampleSheet.csv files are being written to /my/target/path, an error will be thrown and no files written.
When specifying the OUTPUT.TARGET.DIR metadata element, if a token is used that may resolve to multiple values for a single path (for example, using INPUT.NAME in the path when it will resolve to multiple sample names), one value will be chosen arbitrarily for the path. For example, you may end up with /Container1/Sample1/myfile.csv when there are two samples in the container.

Template File Contents

This article describes the metadata, tokens, and special characters that you can include in your custom template files for use with the .

Available from: BaseSpace Clarity LIMS v5.1.x

Metadata

The following table lists and describes the metadata elements that you can include in your template files.

Unless otherwise specified, metadata elements are optional. In some cases, a metadata element must be used in conjunction with another element. For example, ILLEGAL.CHARACTERS must be used with ILLEGAL.CHARACTER.REPLACEMENTS.
Unless otherwise specified, metadata elements can appear multiple times in the template. However, if they are paired with values, only the first occurrence is used. The other lines are silently ignored.
Unless otherwise specified, if a metadata element requires a single value, any additional values are ignored when the file is generated. For example, suppose you include the OUTPUT.TARGET.DIR <path> metadata in your template file and provide more than one value for <path>. The script will process only the first (valid) path value and will ignore all other values.
Unless "metadata syntax must match exactly" is specified, metadata elements are detected and used even if there is text appended before or after them. For example the following expressions are equivalent:

For more information on metadata and how to use metadata elements in your template files, see in article.

Tokens

A token is a placeholder variable that is replaced with unique data at run time. You can include tokens in automation command lines, in scripts, and in template files.

For example, suppose you include the INPUT.CONTAINER.NAME token in a template file generated by a step. At run time, this token is replaced with the name of the container that was input to the step.

All tokens included in a template file must appear in the following form: ${TOKEN}, for example - ${INPUT.CONTAINER.NAME}.

Input and Output Tokens

INCLUDE.INPUT.RESULTFILES
INCLUDE.OUTPUT.RESULTFILES

Process Tokens

Submitted Sample Tokens

Other Tokens

Special Characters

CSV and template file generation special characters have substitution symbols within templates.

Template File Generator Troubleshooting

Available from: BaseSpace Clarity LIMS v5.1.x

When the Template File Generator generates the file, it creates a log file and attaches it to the step in the LIMS.

If the file generation process encounters errors, these error conditions appear in the log file.

If the file generation process completes successfully, the log file contents resemble the following example:

When troubleshooting Template File Generator issues, you will find detailed information in the Automation Worker log files.

Add Blank Lines

Available from: BaseSpace Clarity LIMS v3.0.0

Some instruments, such as Bioanalyzer and the ABI 7900HT, require input files that include a line for every well in a container.

Currently, files created with the sample input sheet generator can only contain data lines for each well that is in use and cannot add lines for empty wells. This extension script allows you to process a file in a certain format to add customizable lines for the empty container wells.

The Script

The addBlankLines script allows for the creation of files that require a line entry for every well in a container, including those wells that are empty.

To accomplish this, the script takes in an existing file, created with sample input sheet generator, and processes it to add the new lines.

The script reads the file and separates it into header, data, and footer sections.
Using the API, the script obtains the container size, its unavailable wells, and the placement of each sample.
Using this information, the script runs through each possible placement and builds a full set of data. This includes a line for each empty well, in addition to all original lines of data. If desired, a line for each unavailable well may also be included using script parameters described.
The file is then rewritten with the header, full data, and footer.

The addBlankLines script overwrites the input file provided, replacing it with the new file that includes the additional container line entries.

Placement information is written using a numeric index that is converted from the LIMS well placement. The line information that is inserted for an empty well is provided to the script. This is controlled by the blankLineString parameter, described.

The logic that determines whether the script uses input samples and containers or output samples and containers is as follows:

By default the script will use outputs.
If there are no outputs or if the optional parameter -forceInputs true is configured, the script will use inputs.

Script Parameters and Usage

Parameter Details

columnType

The script uses this parameter to match each row to a sample in the LIMS. The parameter tells the script what data type to expect in the first column of any data row. Supported data types are PLACEMENT and LIMSID. For example:

-c LIMSID

LIMSID: The script reads the LIMS ID in the first column of data, matches it directly to the sample, and uses this to get the placement of that sample through the API. It does not replace this entry in the data, however.
PLACEMENT: The script reads the sample placement in the first column of data, then finds the sample in the container at that location. The script then replaces this placement entry with an entry of the form "{index-prefix}{index}".

forceInputs

This parameter lets you force the script to use input samples and input container, even if output samples and output container are found for the step. If this parameter is not provided, the script uses output samples and output container if available; input samples and container if they are not.

indexPrefix

This parameter provides a way to place a prefix directly before the index on blank lines. If the column type is set to PLACEMENT, the indexPrefix will also be printed before the index when replacing the well placement in the data.

blankLineString

This parameter defines what is placed after the index in a blank line for an empty well. A separator character is automatically placed between the index and the blankLineString.

Any ‘ \t ’ found in the blankLineString is replaced by a tab space in the output. Otherwise, the blankLineString is left unchanged.

separator

The script supports tab and comma separators in the output file. These can be provided with this parameter in the form “TAB” and “COMMA”. The same separator is used to interpret the original file and produce the updated file.

countUnavailable

If provided, a container is unavailable wells will be included when determining the placement index of a well.

For example, imagine a 2x2 container with wells A:1, A:2, B:1, and B:2, where wells B:1 and B:2 are marked as unavailable. By default, the script will calculate well A:2 as well index 2, excluding the unavailable wells from the index. When the countUnavailable parameter is provided, A:2 is instead calculated as well index 3.

addUnavailable

If provided, blank lines will be added to the file for unavailable wells in addition to blank lines for unoccupied wells.

Index

The index is the well number when the wells are counted from left to right, from top to bottom. By default, unavailable wells are not counted in this index. However, if the script is configured with the countUnavailable parameter, they will be included in the index count.

Blank Line Format

The format of any blank line printed for an empty well is as follows:

{indexPrefix}{index}{separator}{blankLineString}

Rules and Constraints

The script operates with the following constraints:

The input file and log file to update must exist locally.
Only supports TAB and COMMA as separators in the file.
Only supports processing one container at a time.
Can only use the first column of the data and requires either well placement or LIMS ID as values in this column.
Indexing always is done from left to right, top to bottom.
The well placement format in the original file must be alphanumeric (eg A1), or separated by a colon (eg 1:1).
As designed, the script supports processing one container only. As such, any step on which it is run must use either a single input container or a single output container, depending on which type of container you expect the script to use (see forceInputs).

Configuration

The script should be configured to run after the creation of a file occurs - either within the same automation call or as a separate automation triggered after the call that generated the file.
It can also be configured standalone for files that are locally accessible.

Example Automation Strings

Example 1

This example uses the addBlankLines script within the same automation call as the file generation. It also includes the optional parameter '- pre', adding the prefix 'Sample' before the index of each well in the file output.

The first half of the call (the section preceding the 'and and' characters) uses the file generator to create the file.
The second half of the call (the section following the 'and and' characters) takes the generated file and processes it to add blank lines.

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar driver_file_generator \
-i {processURI:v2:http} \
-u {username} \
-p {password} \ -t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/bioA_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFile1} \
&& /opt/gls/clarity/bin/java
-cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar addBlankLines \ -i {stepURI:v2:http} \
-u {username} \
-p {password} \
-f {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1} \
-sep COMMA \
-b ', False,' \
-h 1 \
-c LIMSID \
-pre 'Sample '"

Example 2

These examples show the configuration of an automation trigger that runs the addBlankLines script by itself, on a local file:

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar addBlankLines \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-f /opt/gls/clarity/customextensions/roboticsfiles/example.csv \
-l /opt/gls/clarity/customextensions/roboticsfiles/log.html \
-sep COMMA \
-b ', False,' \
-h 1 \ -c LIMSID"

Convert CSV to Excel

Available from: BaseSpace Clarity LIMS v3.1.0

Included as part of the NGS Extensions package, the convertToExcel script is designed to convert separated-value files (eg CSV) to Microsoft Excel spreadsheets of type XLS and XLSX.

The script can be run on comma- and tab-separated files with any file extension. The original file is not edited, unless its name matches the name given for the output file.
The script can update an existing Excel spreadsheet or produce an entirely new one.
When updating an existing Excel spreadsheet, if this spreadsheet does not have a file extension XLSX will be used by default.
A single worksheet is updated with the input file contents. When producing a new Excel spreadsheet, this worksheet name may optionally be specified. Otherwise, The default name will be used.
The worksheet name must be provided when updating an existing Excel spreadsheet. If the worksheet exists, its contents will be overwritten with the contents from the input file. Otherwise, a new worksheet will be added.
Each line in the input file becomes a row in the output file, and its values are placed into the cells of that row. The first value in the input file becomes the value of cell A:1, and so forth.
When updating an existing worksheet, cells that are not overwritten by values from the input file are left untouched. For example, there may be a footer section that is not updated.
The Excel file produced may be written to a location accessible from the LIMS server (a location on the server, a mounted drive, or a shared file store for example) and also attached in the LIMS via quick Attach. If both options are specified, the script will warn if the file cannot be written and report an error if the file cannot be uploaded.
The cell types currently supported are Numeric, Boolean, Blank, and String.
Supported number formats include period (.) as the decimal point and numbers that include an exponential (eg, 1e-8 or 4E2).
Boolean values are case-insensitive.

Macros and equations are not supported when updating an existing Excel file. Other cells in the file that depend on the new values will not be updated when the worksheet is.

Script Parameters and Usage

Configuration

The convertToExcel script can be run on any step, provided there is a way to supply it with an input file to convert.
The recommended configuration is to use a minimum of two shared result files on the step: One result file used to attach the final converted file; the other the log file. The input file placeholder may be the same as the final file destination, if the input file is to be overwritten with the script results, and likewise for an existing Excel file to be updated.
Configure an automation trigger, usually on the Record Details view, to use the script. The input file may be attached manually or produced automatically by another script such as the sample sheet generator.
To configure the script to both attach its output file to a placeholder in the LIMS and to write it to a location on the server (or in a directory with shared access), provide both outputFileName and destLIMSID with quickAttach. Include the destination path in outputFileName, for example
```
-outputFileName '/opt/gls/clarity/customextensions/example/output.xls'.
```

Example Script Automation Strings

The following examples include various options for file handling. These options exist to reduce the FTP/Automated Informatics (AI) overhead so that the script executes faster.

For example, if quickAttach is set to true, the script will attach the file directly to the LIMS through FTP. It will only write the file locally if upload/attachment via the API fails.

Example 1: Typical Use

bash -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar script:convertToExcel \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-srcLIMSID {compoundOutputFileLuid0} \
-outputFileName {compoundOutputFileLuid0}-converted.xlsx \
-logFileLIMSID {compoundOutputFileLuid1}"

In this example:

The file currently attached in the LIMS with LIMS ID {compoundOutputFileLuid0} is downloaded.
The file is converted to an XLSX file with the name {compoundOutputFileLuid0}-converted.xlsx.
- This file is left in the current local directory for AI to attach to the LIMS automatically.
When attached, the file overwrites the file with LIMS ID {compoundOutputFileLuid0} that was originally downloaded.
Finally, the log file is uploaded to the LIMS with the name {compoundOutputFileLuid1}-LogFile.html.

Example 2: Updating an attached Excel file and both writing it to a specific location and uploading the result

bash -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar script:convertToExcel \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-logFileLIMSID {compoundOutputFileLuid2} \
-srcLIMSID {compoundOutputFileLuid0} \
-outputFileName '/opt/gls/clarity/customextensions/example/{compoundOutputFileLuid1}.xls' \
-destLIMSID {compoundOutputFileLuid1} \
-worksheet 'Samples' \
-updateFileLIMSID {compoundOutputFileLuid1} \
-q ‘true’”

In this example:

The input file currently attached in the LIMS with LIMS ID {compoundOutputFileLuid0} is downloaded.
The Excel file to update, currently attached in the LIMS with LIMS ID {compoundOutputFileLuid1}, is downloaded.
The file to update has a worksheet with the name Samples updated (or overwritten, if already present in the file) using the contents of the input file.
The resulting file is written to /opt/gls/clarity/customextensions/example as {compoundOutputFileLuid1}.xls.
Because quickAttach is passed as true:
- The file is added to the LIMS directly with FTP with the LIMS ID {compoundOutputFileLuid1}.
- This overwrites the Excel file to update, which was previously attached here.
Finally, the log file is uploaded to the LIMS with the name {compoundOutputFileLuid2}-LogFile.html.

Example 3: Use with sample sheet generator

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar driver_file_generator \
-i {processURI:v2:http} \
-u {username} \
-p {password} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/bioA_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}-LogFile.html \
&& /opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar script:convertToExcel \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-inputFileName {compoundOutputFileLuid0}.csv \
-destLIMSID {compoundOutputFileLuid8} \
-quickAttach true \
-logFileLIMSID {compoundOutputFileLuid1}"

In this example:

A driver file is generated with the name {compoundOutputFileLuid0}.csv, and the {compoundOutputFileLuid1}-LogFile.html log file is created by the sample sheet generator.
The conversion script is executed on {compoundOutputFileLuid0}.csv.
Because quickAttach is passed as true and no outputFileName was provided, after the file has been converted:
- It is added to the LIMS directly with FTP with the LIMS ID {compoundOutputFileLuid8}.
- No file is created locally.
As the input file is converted, log messages are appended to the {compoundOutputFileLuid1}-LogFile.html file.

Example 4: Use with sample sheet generator and add blank lines

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar driver_file_generator \
-i {processURI:v2:http} \
-u {username} \
-p {password} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/bioA_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}-LogFile.html \
&& /opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
script:addBlankLines \
-f {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}-LogFile.html \
-sep COMMA \
-b ', False,' \
-h 1 \
-c LIMSID \
-pre 'Sample ' \
script:convertToExcel \
-inputFileName {compoundOutputFileLuid0}.csv \
-destLIMSID {compoundOutputFileLuid0} \
-quickAttach false \
-xls true \
-logFileLIMSID {compoundOutputFileLuid1}"

In this example:

Sample sheet generator creates the base driver file with name {compoundOutputFileLuid0}.csv.
The add blank lines script takes that file and adds extra lines for empty wells in the container, editing the file in place.
Finally, the convertToExcel script is run on that result.
- In this case, the final output is an XLS file named {compoundOutputFileLuid8}.xls.
- Because quickAttach is false, this file is written in the current local directory and it is assumed that AI will upload it to the LIMS.
- The sample log file is appended to by all three programs that are run, and is attached to the LIMS.

Rules and constraints

The input file separators supported are comma and tab.
Spaces between entries in the input file are not supported (eg "Sample Name, A:1" must instead be "Sample Name,A:1")

Logging

A short message is logged after each successful action by the script.
Any errors that occur will be logged in the log file before the script terminates.
Warnings will also be captured in the log file, and if any occur, a notification will be sent on script completion.
If a local log file exists that matches the log file name configured for the script, or if a file exists in the LIMS with the associated log file LIMSID, the log messages will be appended to these files. Otherwise, a new file will be created.

Parse CSV

Available from: Clarity LIMS v2.0.5

-haltOnMissingSample option and support for header section values (e.g., containerName) are introduced in NGS v5.4.0.

Data might sometimes need to be parsed from an instrument result file (CSV, TSV, or other character-separated format) into Clarity LIMS, for the purposes of QC.

For example, suppose that a 96 well plate is run on a Caliper GX. The instrument produces a result file, which the user imports into Clarity LIMS. The per-sample data are parsed and stored for a range of capabilities, such as QC threshold checking, searching, and visibility in the Clarity LIMS interface.

The parseCSV script allows for the data for each well to be parsed into fields on either derived samples or result files (measurement records) that map directly to the derived samples being measured.

If the instrument result file contains data that applies to the batch of derived samples being measured, this data are stored in fields on the step.

The Script

The parseCSV script automates parsing a separated-value file, configurable but typically comma- or tab-separated, into the LIMS.

Data lines in the file are matched to the corresponding sample in the LIMS using well placement information.
A line that references well A1 of container Plate123 will have its parsed data mapped to the sample placed in well position A:1 of container Plate123 in the LIMS.
Values from the file are mapped to fields (known as UDFs in the API) in Clarity LIMS based on the automation configuration for the script.

Workflow and Configuration

Configure the step to invoke the script manually via a button in Record Details screen.
Before pressing the button that invokes the script, upload a shared result file to be parsed.
Configure the automation command line to match the destination fields configured in Clarity LIMS.
Create a field for each column that will be brought into the LIMS. Field names must not contain the separator used for the automation parameter string, "::".
When using NGS v5.0 or later, fields can be configured for the step, input samples, output samples, or output result files. Versions before this release support only output result files.
Input result files are not supported.

Script Parameters

Association Strategy

The association strategy describes how information in the file is mapped to samples in the LIMS.

When running this script, there are two association strategies you can implement. Which strategy you choose is determined by the contents of the file that will be parsed. Both strategies rely on sample placement information (well and container name) to perform the mapping to the LIMS.

Strategy 1: Provide the -containerName and -wellPosition parameters to the script. Use this strategy when the well and container information are found in separate columns of the file, eg "Plate123" in column "Plate Name" and "A1" in column "Well Label"
Strategy 2: Provide the -sampleLocation parameter to the script. Use this strategy when the placement information is all found in the same column, in the following format: <container_name>_< well >_<free text>, eg "Plate123_A1_control" in column "Sample ID"

Header Section Parsing (NGS v5.4 and later)

For the association strategy provided, if matching headers are not found in the file at the provided header index, the script will then search the lines of the file that appear prior to this index (the header section) for a match.

For example, when using association strategy 1 and providing -containerName and -wellPosition, if the file contains information for only a single container the container name may only appear one time in a header section. This may look something like this for a comma-separated file: "ContainerID, plate123". With -containerName provided as "ContainerID" the script will locate the adjacent value as the one to be used as the value of the container name for the entire file and interpret the well positions as being within this container.

Mapping Parameters

Mapping parameters (*measurementUDFMap*, *partialMatchUDFMap*, and *processUDFMap*) determine which information is mapped from the file to fields in the LIMS.

The structure in which to provide these parameters is as follows, where the <Header Name> is the name of the data column or header section row in the file:

File Separator

While the most common file formats are *.csv (comma-separated) and *.tsv (tab-separated), the script may be configured to use any separator.

To use a comma or tab as the separator, provide these using the -separator parameter as "comma" or "tab" as they require additional handling by the script.

Boolean Parameters

The script supports several boolean parameters. Boolean parameter values must be provided in quotes, eg "true".

Script Usage

Example 1

This example uses matching Strategy 1 for a comma-separated file and maps two columns, "Region[100–1000] Conc. (ng/ul)" and "Region[100–1000] Size at Maximum [BP]", to output resultfile fields "Concentration" and "Size (bp)" in the LIMS, respectively:

Example 2

This example uses matching Strategy 2 for a tab-separated file, running in relaxed mode. It maps a column to an input sample field, using that input sample placement information, and maps a header section row to a protocol step field:

Parameter Details

measurementUDFMap

This performs a 1:1 parsing of column information from the file to individual sample fields in the LIMS. The column names must match exactly. The exact destination (input/output sample or result file fields) is controlled through other script options.

partialMatchUDFMap

This allows customization of the column names that appear in the file by only matching on the first part of the column name, eg a partial match of "Sample" will match to a column customized to "Sample (internal ID)." Other than providing this flexibility, this parameter functions the same as *measurementUDFMap.*

If two columns are found that begin with the partial match provided, the script will log an error and stop execution.

processUDFMap

The process UDF option is provided to parse per-run information into protocol step fields in the LIMS. When provided, the script will search for a match in the header section and the data column headers of the file.

In the following example file:

The first two lines (beginning with OPERATOR and WORKFLOW) represent a header section with information for the batch of derived samples.
The third line (S_PLATE_ID) is the data section header (header row).
The lines make up the data section, which contains data for each derived sample.

How it Works

If there is a matching header in both the header section and column headers, the value from the header section will be used.
If no matching header is found and the script isn't running in relaxed mode, the script will log an error and stop execution.
When a match is found only among the column headers, validation is done to ensure all the values in that column are equal (because they will be mapped to a single destination field). If not all of the values are the same, a warning will be logged listing the distinct values and the field in the LIMS will not be updated.

matchOutput Mode

This parameter is provided as a boolean true/false value (default is false). It toggles whether information from the file is matched to the LIMS by comparing it to the placement of the protocol step inputs or protocol step outputs.

If set to False: The script uses the placement information of the inputs.
If set to True: The script uses the placement information of protocol step outputs.

setOutput Mode

This parameter is provided as a boolean true/false value (default is true). It toggles whether per-sample information is mapped to fields on the protocol step inputs or outputs.

If set to True: The script updates the protocol step outputs.
If set to False: The script updates field information on the protocol step inputs.

Input samples, output samples, and output result files are supported. The script expects either output samples or output result files, not both.

The script will log an error and stop execution if there is more than one kind of per-input output configured for the protocol step.

relaxed Mode

This parameter is provided as a boolean true/false value (default is false) to toggle relaxed mode.

If set to False: The script considers all provided header mappings to be mandatory headers and throws an exception if anything cannot be found in the file.
If set to True: In relaxed mode, the script will log a warning if a header cannot be found in the file, and will continue execution.

haltOnMissingSample mode (NGS v5.4 and later)

This parameter is provided as a boolean true/false value (default is true) to toggle halt on missing sample mode.

If set to False: The script will warn but continue execution when placement information for a line in the file cannot be determined. This mode can be used to handle, for example, ladder entries or footer sections, where the lines in the file will not contain valid sample information for the parser to use.
If set to True: The script will log an error and stop execution when a line in the file is encountered where it cannot determine the placement information for a sample. This mode allows strict matching of all contents.

Additional Information

Other scripts you may find useful are as follows.

Name Matching XML Parser

Available from: BaseSpace Clarity LIMS v2.1.0

Often, data can be parsed from an instrument result file in XML format into Clarity LIMS, for the purposes of QC.

For example, perform a TapeStation instrument run. This produces an XML result file, which the user imports into the LIMS. The file includes information of interest for each sample, which should be parsed and stored for a range of capabilities, such as QC threshold checking, searching, and visibility in the LIMS interface.

The XmlSampleNameParser tool allows for sample data to be parsed into UDFs on result files (measurement records) that map directly to the derived samples being measured.

The XmlSampleNameParser tool is installed as a standalone jar file as part of the NGS Extensions Package. Currently it contains one script, parseXmlBySampleName.

Provided the result file is in XML format, this script can be used to match data in the file to samples in the LIMS using the measurement record LIMSID.

Values are mapped to UDFs in the LIMS using a configuration file that contains XPath mappings for the result file. (External resources, such as w3schools, can be used to learn more about Xpath, and many XML viewing tools will generate it automatically for elements of interest.)

The format for the data needed to make the association between the file contents and the sample in the LIMS is: LIMSID_NAME.

The name is optional and is supported for readability. This means it may come from the input sample on which the step is being run.
The LIMSID must come from the output result file, which is also where the parsed information will be stored in UDFs.

Typically, it is ideal to set up the instrument run with the sample and result file information, so that it will appear in the same format in the XML result file. To automate setup, you can use a tool such as the template driver file generator.

The LIMSID_NAME can be provided to the instrument as the sample name, or as a comment or other field on the sample. The only conditions are that:

The sample field that you want to use for the LIMSID_NAME must be passed into the file result file (eg via a driver file).
The configuration file must be set up such that it can access this field from the correct location. (See Configuration File Format.)

Script Parameters and Usage

The parseXmlBySampleName script uses the following parameters, all of which are required:

Example

This example shows the script run on a manually imported TapeStation XML file that has been attached to the TapeStation DNA QC process.

bash -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/XmlSampleNameParser.jar
script:parseXmlBySampleName
-i {processURI:v2:http}
-u {username}
-p {password}
-inputFile {compoundOutputFileLuid0}
-log {compoundOutputFileLuid1}.html
-configFile /opt/gls/clarity/extensions/conf/tapestation/defaultTapeStationDNAConfig.groovy"

Configuration

The process type for the steps on which information will be tracked must be configured with the following output generation:

1x fixed ResultFile output per input
2x fixed ResultFile outputs applied to all inputs
Shared output naming pattern example: {LIST:Instrument Result XML (required),XML Parsing Log}

This represents the minimum configuration. Additional shared output files may be added as required.

For each piece of information that will be parsed from the XML file and stored on the step outputs, configure desired UDFs on ResultFile and associate them with the per-input output result files for the process type.

Configuration File Format

The configuration file should be produced as a .groovy file and stored in the /opt/gls/clarity/customextensions directory. Its format allows for four types of entries:

baseSampleXPath
sampleNameXPath
process.run.UDF."UDF name"
process.output.UDF."UDF name"

The examples provided here use XPath for a TapeStation XML result file.

baseSampleXPath

Provide this one time.
This XPath indicates the list of samples and the associated sample information, relative to the root of the XML file. Specific sample information will be retrieved relative to this path.

sampleNameXPath

Provide this one time.

This XPath indicates where the LIMS sample association information (LIMSID_NAME) can be found, relative to the sample list indicated by baseSampleXPath. Often this will be stored as the sample name or in a comment field for the sample.

// **Sample matching information**
// These two entries are required to locate and identify individual samples' information in the XML
file.baseSampleXPath = "/File[1]/Samples[1]/Sample[Observations!='Ladder']"
sampleNameXPath = "${baseSampleXPath}/Comment[1]/text()"

process.run.UDF."UDF name"

May be provided multiple times.
Indicates information that is tracked for the entire run, and not on individual samples.
Typically, this will be XPath relative to the root of the XML file, as shown.

The destination result file UDF name is specified as part of the entry name and must match the UDF name in the LIMS exactly.

// **Details that correspond to the whole run**
process.run.UDF."Conc. Units".xPath = "/File[1]/Assay[1]/Units[1]/ConcentrationUnit[1]/text()"
process.run.UDF."Molarity Units".xPath = "/File[1]/Assay[1]/Units[1]/MolarityUnit[1]/text()"
process.run.UDF."MW Units".xPath = "/File[1]/Assay[1]/Units[1]/MolecularWeightUnit[1]/text()"

In the example above, three values will be parsed into the LIMS from the XML file, represented on each individual measurement record (result file) output:

Conc. Units
Molarity Units
MW Units

process.output.UDF."UDF name"

May be provided multiple times.
Indicates information that is tracked on individual samples.
Typically, this will be XPath relative to the sample XPath (baseSampleXPath), as shown.

The destination result file UDF name is specified as part of the entry name and must match the UDF name in the LIMS exactly.

// **Details that correspond to Samples**
process.output.UDF."Concentration".xPath = "${baseSampleXPath}/Concentration[1]/text()"
process.output.UDF."Region 1 Average Size - bp".xPath = "${baseSampleXPath}/Regions/Region[1]/AverageSize[1]/text()"
process.output.UDF."Region 1 Conc.".xPath = "${baseSampleXPath}/Regions/Region[1]/Concentration[1]/text()"
process.output.UDF."Peak 1 MW".xPath = "${baseSampleXPath}/Peaks/Peak[1]/Size[1]/text()"
process.output.UDF."Peak 1 Conc.".xPath = "${baseSampleXPath}/Peaks/Peak[1]/CalibratedQuantity[1]/text()"

In the example above, five values will be parsed into the LIMS from the XML file, represented on each individual measurement record (result file) output:

Concentration
Region 1 Average Size - bp
Region 1 Conc.
Peak 1 MW
Peak 1 Conc.

Additional Information

Some other scripts you may find useful:

Template file generator
Parse CSV

Sample Placement Helper

Compatibility: Clarity LIMS v2.5 or later, and v3.0 or later

The sample placement helper tool allows for automated sample placement and validation in Clarity LIMS. It consists of the following scripts, each of which handles different use cases.

Container Name Validation

Clarity LIMS supports renaming of destination containers in a step in two ways:

Renaming the containers on the Placement screen.
Renaming the containers on the Record Details screen.

Often, the container name will be the barcode of the physical container (plate, chip, and so on) in the lab, which is entered into the system using a barcode scanner. However, it's possible for the barcodes to be small or close together and for the scanner to have a wide scanning range.

The purpose of this script is for validation that the same barcode has not been scanned twice during container renaming.

How it Works

The script validates the names of the destination containers in the step, and reports if any duplicates are found.

The script may be run on any step that includes sample placement, and may be run:

Manually from the Record Details screen OR
Automatically on transition at any point after placement has been done (for example, on exit from the Placement screen).

The script may be configured to do the following:

Report either a warning or failure if duplicate container names are detected.
Reset the destination container names in the step to the default value of the container LIMSID.

Modes

Trigger Node

Trigger mode must be provided when configuring the script to be triggered manually.
This mode determines how the script reports the outcome of its validation to the user, which is a different underlying mechanism when run on-screen transition.

Error Mode

The errMode parameter accepts values of "warn" and "fail."
When in "warn" mode, you are still able to proceed in the step if duplicate destination container names are detected.
If set to "fail," you will not be able to continue as long as duplicates are present.

Reset Mode

Typically, a duplicate name is caused by an accidental repeated scan of a barcode.

Reset mode may be used to automatically reset the destination container names in the step if duplicates are detected.
This option resets all destination container names in the step to each LIMSID, which is the default value used for container names in the LIMS.

Script Parameters and Usage

Example 1: Validate container names on Record Details

In this example, the script is triggered manually on the Record Details screen. It will warn the user if duplicates are detected, and the destination container names will not be changed.

bash -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar script:validate_container_names \ -i {stepURI:v2:http} \ -u {username} \ -p {password} \ -errMode 'warn' \ -t 'true'"

Example 2: Validate container Names on Transition

In this example, the script is configured to run automatically on-screen transition. It will fail if duplicates are detected, preventing the user from advancing in the step, and the destination container names will be reset to the container LIMSID.

bash -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar
script:validate_container_names \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-errMode 'fail' \
-reset 'true'"

Placement by Pattern File

Some instruments or workflows require specific placement patterns based on the container type. Clarity LIMS may need to place samples in an unusual pattern, such as every second well in a plate. To handle this, the ability to specify new patterns based on specific types of containers is required.

The purpose of the Placement by Pattern File script is to automate sample placement according to a specific pattern based on container type. The script reads the pattern in from a file, allowing for easy customization.

There are two script modes:

Source sample index placement: This is the default mode. This mode assigns an index to each input sample and their replicates, and then transfers them to a particular well in the output container. This mode allows for cherry picking of samples.
Source well placement: This mode uses the well position of an input sample to transfer it to a certain well position in the output container. In this mode, samples from one well will always be transferred to the same well in the output container.

How it Works

Pattern files are stored in a directory that is passed as a parameter to the script.

Files are automatically selected based on the mode of the script and the container names:

Source sample index placement is toggled with the useIndexMode parameter. When using this mode, the file name must contain "IndexedTransfer" and the selected container type.
When using source well placement, the file name must contain the input container and selected container types of the protocol step.

For example, suppose that there are samples in a 96 well plate that are to be transferred into several 12x1 BeadChips.

If source well placement is used, the pattern file would be named 96 well plate_12x1 HD BeadChip.tsv.
If source sample index placement is used, the pattern file would be named IndexedTransfer_12x1 HD BeadChip.tsv (for information on the format of the pattern files, see the Pattern File Format section).

Both script modes support multiple input containers of the same type and multiple output containers of the same type. Only source sample index placement supports multiple types of input containers.

All input samples are sorted by container LIMS ID to ensure accurate placement. If useIndexMode is set to "true", samples will be further sorted to assign index. After being sorted by container LIMS ID, they are sorted based input sample well position. The sortOrder parameter specifies whether indexed samples are sorted by row or by column with respect to well position. The sortOrder parameter is only used with source index placement. Input samples are indexed from 1, and sample replicates of a particular sample are all assigned the same index.

Output containers are created and given temporary names of "Plate 1," "Plate 2," "Plate 3"... These can then be updated as desired, for example, by scanning in the barcode of the container. This temporary naming is done for ease-of-use, to make sure that the visual order in the interface is correct, making it easier to confirm that the placement pattern was followed.

It is possible to configure the script to fail or produce a warning message if the number of samples does not match the contents of the pattern file. There are three parameters to enforce the number of samples and replicates in the step: replicatesMode, minSamplesMode, and maxSamplesMode.

For example, if replicatesMode is set to "warn" and maxSamplesMode is set to "fail", then the script will produce a warning after placement occurs if the number of replicates is inaccurate and it will fail if there are more samples in the step than specified in the pattern file.

Process and Workflow Configuration

The script may be configured on any process that includes sample placement:

Configure this script to run automatically on entry to the Sample Placement screen.
Before using the script, confirm that the desired pattern files are correctly configured and are located in the correct location with the appropriate parameter provided. (See Script Parameters and Usage.)

Script Parameters and Usage

Example 1

This example uses source well placement and a pattern file placed in: /opt/gls/clarity/extensions/conf/infinium/placementpatterns

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar
script:place_samples \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-d /opt/gls/clarity/extensions/conf/infinium/placementpatterns"

Example 2

This example uses source sample index placement. Replicates mode fails the script if there is an inaccurate number of replicates, minSamplesMode warns the user that there are fewer samples than specified in the pattern file, and maxSamplesMode fails the script if there are more samples than specified in the pattern file. The pattern file is located in: /opt/gls/clarity/extensions/quantstudio/conf/placementpatterns

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar script:place_samples \
-i {stepURI:v2:http} \
-u {username} \
-p {password} \
-d /opt/gls/clarity/extensions/quantstudio/conf/placementpatterns \
-useIndexMode true \
-replicatesMode fail \
-minSamplesMode warn \
-maxSamplesMode fail"

Pattern File Selection

The location of the pattern file is customized using the appropriate parameter, as described in Script Parameters and Usage.

When source well placement is used, the exact file to use is selected based on the names of the input container type and destination container type:

The name must contain an exact match for each of the container types (eg "384 well plate_96 well plate.tsv")
Container type names must be separated from other text by underscores; any other text in the name is ignored. For example, "{type}_{type}.tsv" and "{type}_{type}_transfer file.tsv" are both valid pattern file names.
For a transfer from one container to another of the same type, "{type}.tsv" is a valid pattern file name. However, the script currently uses the first file it finds with a name containing the source and destination container types. If both "{type1}.tsv" and "{type1}_{type2}.tsv" are found in the same location, the script may not accurately detect which should be used for a protocol step that has input and output containers of type "type1." This can be avoided by using the long form for naming, ie {type1}_{type1}.tsv.

When source sample index placement is used, the pattern file name is slightly different:

The name must contain "IndexedTransfer" along with an exact match for the destination container type (eg "IndexedTransfer_96 well plate.tsv")
As with source well placement, container type names must be separated from other text by underscores; any other text in the name is ignored.

Pattern File Format

The pattern file format is a tab-separated file (.tsv) that has three columns with headers. These headers depend on the script mode. For source well placement, SRC_WELL, DEST_CONTAINER_INDEX, and DEST_WELL are the required headers. When useIndexMode is enabled, the pattern file must contain SRC_SAMPLE_INDEX, DEST_CONTAINER_INDEX, and DEST_WELL.

The table describes the four columns:

Part of an example pattern file for source well placement is shown below:

An example of a pattern file for placement via source sample index is shown below. It expects samples to have eight replicates each.

Placement by Robot Transfer File

Some labs require support for a robot-driven sample placement scenario, in which sample placement is performed on the robot and is then recorded in the LIMS, without requiring manual entry of the sample placement.

The purpose of the Placement by Transfer File script is to automate sample placement according to a transfer file produced by a robot. The script covers a one-to-one, many-to-one (pooling), or one-to-many (replicates) mapping of samples for placement. Support for reagents is planned for a future release.

The script is available as of Clarity LIMS v3.0. Replicate support added in NGS Extensions Package v5.3.1 (Placement Helper v2.1.0). Pooling support added in NGS Extensions Package v5.4.0 (Placement Helper v2.2.0) and requires LIMS v3.1.

How it Works

The robot produces a transfer file (worksheet, work list file, and so on) that contains information about which samples were used and their source and destination locations. Upload this file to the protocol Step Setup screen in the LIMS.

The Placement by Transfer File script automatically performs the same placement as the robot, recording the work that has been done in the lab, not manually enter this information.

The script looks at the protocol step inputs and matches these to the source information in the transfer file, then uses the destination information to place the protocol step outputs. On pooling steps, the script will have the intermediary step of creating the pools. After the pools are created, you are given the opportunity to make sure that their contents before the script does placements for the pools.

(For information on the format of the pattern files, see the Transfer File Format section.)

The script will first search for containers that exist in the LIMS with names matching the destination container names provided in the transfer file, and if only a single container matches a given destination container name it will be used. Otherwise the containers will be created using the destination container type, or selected container type if type information is not provided in the transfer file. The only case where the script will proceed if it finds multiple containers in the LIMS matching a destination container name in the transfer file is if one of those containers is the selected container for the protocol step.

The sample name column is used for validation. If the sample found in the LIMS that matches the input placement information does not have the same name, an exception will be thrown by the script.

Process and Workflow Configuration

The script may be configured on any protocol step that includes sample placement.
Configure this script to run automatically on entry to the Sample Placement screen. (See Script Parameters and Usage.)
If you are pooling samples, the script must also be configured to run on entry to the Pool Samples screen. The configuration strings would be the same for both automatic EPP triggers.
The Step Setup screen should also be enabled in the protocol step configuration, to allow for attachment of the transfer file.

Script Parameters and Usage

Example 1

This example uses the minimum configuration for the script:

bash -l -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar place_samples_by_robot_file
-i {stepURI:v2:http}
-u {username}
-p {password}
-f {compoundOutputFileLuid0}
-srcContainer 'S_PLATE_ID'
-srcWell 'S_PLATE_XY'
-destContainer 'D_PLATE_ID' -destWell 'D_PLATE_XY'"

Example 2

This example provides some of the optional parameters. The script will parse a tab-separated file that has its header on the third line, with additional validation on sample name and destination container type:

bash -l -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/PlacementHelper.jar place_samples_by_robot_file
-i {stepURI:v2:http}
-u {username}
-p {password}
-f {compoundOutputFileLuid0}
-headerIndex '3'
-srcContainer 'S_PLATE_ID'
-srcWell 'S_PLATE_XY'
-sampleName 'SAMPLE_ID'
-destContainer 'D_PLATE_ID'
-destWell 'D_PLATE_XY'
-destType 'D_PLATE_TYPE'
-separator 'tab'"

Transfer File Format

Currently the script supports comma- and tab-separated formats with a single-line header followed by data rows for the transfer information.

The minimum information required in the transfer file is a column for each of the following:

source container
source well location
destination container
destination well location

For additional validation, it is possible to specify columns to use for the following:

sample name
destination container type

The contents of the transfer file must correspond to the number of sample outputs per input configured for the step. In the transfer file, replicates are represented as multiple lines with the same source container and well, but different destination containers or wells. A line must appear for each expected output, including replicates, or else an exception will be thrown by the script and placement will fail.

To pool multiple inputs together, make sure that the corresponding inputs have the same destination container and well. If so, and the step is configured for pooling, when the script is triggered on entry to the pooling screen your pools will be created. If the step is not configured for pooling, then such a transfer file will cause an error to be thrown by the script.

An example from a Hamilton robot (.tsv, tab-separated file) is shown below:

Constraints

Placements separated by a colon (e.g., 1:1) or that are alphanumeric (e.g., A1) are valid. Placements that are numeric only (e.g., 11) are not supported.
If the destination container type information is not supplied, the selected container type for the protocol step will be used to place the samples.

Additional Information

Some other scripts you may find useful:

Parse CSV

Creating Template Files

Available from: BaseSpace Clarity LIMS v4.2.x

You can create template files that the Template File Generator script (driver_file_generator) will use to generate custom files for use in your lab.

This article provides details on the following:

The parameters used by the script.
The sections of the template file—these define what is output in the generated file.
Sorting logic—options for sorting the data in the generated file.
Rules and constraints to keep in mind when creating templates and generating files.
Examples of how you can use specific tokens and metadata in your template files.

For a complete list of the metadata elements and tokens that you can include in a template file, see Template File Contents.

Script Parameters

Upgrade Note: process vs step URIs The driver_file_generator script now uses steps instead of processes for fetching information. When a process URI is supplied, the script detects it and automatically switches it to a step URI. (The PROCESS.TECHNICIAN token, which is only available on 'process' in the API, is still supported.) The behavior of the script has not changed, except that the long form -processURI parameter must be replaced by -stepURI in configuration. The -i version of this parameter remains supported and now accepts both process and step URI values. If your configuration is using -processURI or --processURI, replace each instance with -i (or -stepURI/--stepURI).

The following table defines the parameters used by the driver_file_generator script.

Command-line example:

bash -l -c "opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar script:driver_file_generator -i {stepURI:v2} -u {username} -p {password} -t /opt/gls/clarity/customextensions/InfiniumHT/driverfiletemplates/NextSeq.csv-o {compoundOutputFileLuid0}.csv -l {compoundOutputFileLuid1}"

Command-line example using -quickAttach and -destLIMS:

bash -l -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar script:driver_file_generator -i {stepURI:v2} -u {username} -p {password} -t /opt/gls/clarity/customextensions/Robot.csv -quickAttach true -destLIMSID {compoundOutputFileLuid0} -o extended_driver_x384.csv -l {compoundOutputFileLuid2}"

Data Source

The input-output-maps of the step (defined by the -stepURI parameter) are used as the data source for the content of the generated file.

If they are present, input-output-maps with the attribute output-generation-type=PerInput are used. Otherwise, all input-output-map items are used.

The output generation type specifies how the step outputs were generated in relation to the inputs. PerInput entries are available for the following step types: Standard, Standard QC, Add Labels, and Analysis.

Template Sections

The content of the generated file is determined by the sections defined in the template. Content for each section is contained within xml-like opening and closing tags that are structured as follows:

<SECTION>
    section content
</SECTION>

Most template files follow the same basic structure and include some or all the following sections (by convention, section names are written in capital letters, but this is not required):

<HEADER_BLOCK>
<HEADER>
<DATA>
<FOOTER>

The order of the section blocks in the template does not affect the output. In the output file, blocks will always be in the order shown.

The area outside of the sections can contain metadata elements (see Metadata section of the Template File Contents article). Anything else outside of the section tags is ignored.

The <PLACEMENT> and <TOKEN FORMAT> sections are not part of the list and do not create distinct sections in the generated file. Instead, they alter the formatting of the generated output.

HEADER_BLOCK

Only a subset of the tokens is available for use in the header block section. For details, see the Template File Contents article Tokens table. If an unsupported token is included, file generation will complete with a warning message and a warning will appear in the log file.

Tokens in the header block always resolve in the context of the first input and first output available. For example, suppose the INPUT.CONTAINER.TYPE token is used in the header block:

If there is only one type of input container present in the data source, that container type will be present in the output file.
If multiple input container types are present in the data source, only the first one encountered while processing the data will be present in the output file.

To include a header block section in a template, enclose it within the <HEADER_BLOCK> and </HEADER_BLOCK> tags.

HEADER

The header section describes the header line of the data section (see DATA section). A simple example might be "Sample ID, Placement".

The content of this section can only include plain text and is output as is. Tokens are not supported.

To include a header section in a template, enclose it within the <HEADER> and </HEADER> tags.

HIDE feature: See 'Hide feature' in DATA section. Also note:

If multiple <HEADER> lines are present, at least one must have the same number of columns as the <DATA> template line.
<HEADER> lines that do not match the number of columns are unaffected by the HIDE feature.

DATA

Each data source entry creates a data row for each template line in the section. All entries are output for the first template line, then the next template line runs, and so on.

The data section allows tokens and text entries. All tokens are supported.

Note the following:

Duplicated rows are eliminated, if present. A row is considered duplicated if its content (after all variables and placeholders have been replaced with their corresponding values) is identical to a previous row. Tokens must therefore provide distinctive enough data (ie, something more than just CONTAINER.NAME) if all of the input-output entry pairs are desired in the generated file.
By default, the script processes only sample entries. However, there are metadata options that allow inclusion of result files/measurements and exclusion of samples.
Metadata sorting options are applied to this section of the template file only.
By default, pooled artifacts are treated as a single input artifact. They can be demultiplexed using the PROCESS.POOLED.ARTIFACTS metadata element.
If there is at least one token relevant to the step inputs or outputs, this section will produce a row for each PerInput entry in the step input-output-map. If no PerInput entries are present in the step input-output-map, the script will attempt to add data rows for PerAllInputs entries.
Input and output artifacts are always loaded if a <DATA> section is present in the template file, due to the need to determine what type of artifacts the script is dealing with.

To include a data section in a template, enclose it within the <DATA> and </DATA> tags.

FOOTER

The content of this section can only include plain text and is output as is. Tokens are not supported.

To include a footer section in a template, enclose it within the <FOOTER> and </FOOTER> tags.

PLACEMENT

This section contains groovy code that controls the formatting of PLACEMENT tokens (see the PLACEMENT tokens in Template File Contents article Tokens table).

Within the groovy code, the following variables are available:

Note the following:

The script must return a string, which replaces the corresponding <PLACEMENT> tag in the template.
Logic within the placement tags can be as complex as needed, provided it can be compiled by a groovy compiler.
If an error occurs while running formatting code, the original location value is used.

To include a placement section in a template, enclose it within the <PLACEMENT> and </PLACEMENT> tags.

Placement Example: Container Type

In the following example:

If the container type is a 96 well plate, sample placement A1 will return as "A_1"
If the container type is not a 96 well plate, sample placement A1 will return as "A:1"

<PLACEMENT>
// The inputs to this segment are: String row, String column, Node containerTypeNode
if (containerTypeNode.@name == "96 well plate") return row + "_" + column
else return row + ":" + column
</PLACEMENT>

Placement Example: Zero Padding

<PLACEMENT>
// The inputs to this segment are: String row, String column, Node containerTypeNode
String zeroPad (String entry) {
if (entry.isNumber() && entry.size() == 1) return "0" + entry
return entry
}
return zeroPad(row) + ":" + zeroPad(column)
</PLACEMENT>

TOKEN FORMAT

This section defines logic to be applied to specific tokens to change the format in which they appear in the generated file.

Special formatting rules can be defined per token using the following groovy syntax:

${token.identifier}
…groovy code…
// or 
${token.identifier##Name}
…groovy code…

Within the groovy code, the variable 'token' refers to the original value being transformed by the formatting code. The logic replaces all instances of that token with the result.

${token.identifier} marks the beginning of the token formatting code and the end of the previous token formatting code (if any).

You can define multiple formatting logic rules for a given token, by assigning a name to the formatting section (named formatters are called 'variations'). This is done by appending “##” after the token name (eg “${token.identifier##formatterName}”).
Using the named formatter syntax without giving a name (“${token.identifier##}”) will abort the file generation.
If an error occurs while running formatting code, the resulting value will be blank.
If a named formatter is used but not defined, the value is used as is.

To include a placement section in a template, enclose it within the <TOKEN_FORMAT> and </TOKEN_FORMAT> tags.

TOKEN FORMAT Example: Technician Name

In this example, a custom format is defined for displaying the name of the technician who ran a process (step).

<TOKEN_FORMAT>
${PROCESS.TECHNICIAN}
def name = token.split(" ")
return "First name: " + name[0] + ", Last name: " + name[1]
</TOKEN_FORMAT>

TOKEN FORMAT Example: Appending a String to Container Name or Sample Name

In this second example, when special formatting is required for two tokens, the logic for both appear inside the same set of tags.

The example appends a string to the end of the input container name or a prefix to the beginning of the submitted sample name.

<TOKEN_FORMAT>
${INPUT.CONTAINER.NAME}
return token + "-PlateName"
${SAMPLE.NAME}
return "SN-" + token
</TOKEN_FORMAT>

Metadata

Metadata provides information about the template file that is not retrieved from the API — such as the file output directory to use, and how the data contents should be grouped and sorted.

Metadata is not strictly confined to a section, and is not designated by opening and closing tags. However, each metadata entry must be on a separate line.

Metadata entries can be anywhere in the template, but the recommended best practice is to group them either at the top or the bottom of the file.

For a list of supported metadata elements, rules for using them, and examples, see Template File Contents, Metadata section.

Sorting Logic

Sorting in the generated file is done either alphanumerically or by vertical placement information, using the SORT.BY. and SORT.VERTICAL metadata elements.

To sort vertically:

Include the SORT.VERTICAL metadata element in the template file. In addition, the SORT.BY.${token}, ${token} metadata must also be included, as follows:

SORT.BY.${OUTPUT.CONTAINER.ROW}${OUTPUT.CONTAINER.COLUMN}

Any SORT.BY. tokens will be sorted using the vertical sorter instead of the alphanumeric sort.

To apply sorting to samples in 96 well plates:

You could narrow the sort key to a unique combination such as:

SORT.BY.${OUTPUT.CONTAINER.NAME}${OUTPUT.CONTAINER.ROW}${OUTPUT.CONTAINER.COLUMN}

See also SORT.VERTICAL and SORT.BY. in the Template File Contents article.

Rules and Constraints

The template must adhere to the following rules:

Metadata entries must each appear on a new line and be the only entry on that line.
Metadata entries must not appear inside tags.
Opening and closing section tags must appear on a new line and as the only entry on that line.
Each opened tag must be closed, otherwise it is skipped by the script.
Any sections (opening tag + closing tag combination) can be omitted from the template file.
Entries that are separated by commas in the template will be delimited by the metadata-specified separator (default: COMMA) in the template file.
White space is allowed in the template. However, if there is a blank line inside a tag, it will also be present in the template file produced.
If an entry in the template is enclosed in double quotes it will be imported as a single entry and written to the template file as such, even if it has commas inside.
To include double-quotes or single-quotes in the template file, use the escape character: Example: \" or \'
To include an escape character in the template file, use two escape characters inside double-quotes. For example, if you want to see \\Share\Folder\Filename.txt use "\\\\Share\\Folder\\Filename.txt" as the token.

If any of the following conditions is not met - the tag, and everything inside it, is ignored by the script and a warning displays in the log file:

Except for the metadata, all template sections must be enclosed inside tags.
Each tag must have its own line, and must be the only tag present on that line.
No other entries, even empty ones, are allowed.
All opened tags must be closed.
Custom field names must not contain periods.

Examples

Illumina Instrument Sample Sheets

The LIMS provides configuration to support generation of sample sheets that are compatible with some Illumina instruments. For details, see the Illumina Instrument Sample Sheets documentation.

Generating Sample Sheets for QC Instruments

The LIMS provides configured automations that generate sample sheets compatible with a number of QC instruments. The default automation command lines are provided below.

Generate Bioanalyzer Driver File Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/bioA_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1} \
&& /opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/ngs-extensions.jar -u {username} -p {password} \
script:addBlankLines \
-i {stepURI:v2} \
-f {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1} \
-sep COMMA \
-b ',False,' \
-h 1 \
-c LIMSID \
-pre 'Sample '"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
<HEADER_BLOCK>
</HEADER_BLOCK>
<HEADER>
"\"Sample Name\",\"Sample Comment\",\"Rest. Digest\",\"Observation\""
</HEADER>
<DATA>
${OUTPUT.LIMSID},,False,
</DATA>
<FOOTER>
Ladder,,False,"\"Chip Lot #\",\"Reagent Kit Lot #\",
\"QC1 Min [%]\",\"QC1 Max [%]\",\"QC2 Min [%]\",\"QC2 Max [%]\"
,,,
\"Chip Comment\”
"
</FOOTER>

Generate NanoDrop Driver File Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/nd_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
<HEADER_BLOCK>
</HEADER_BLOCK>
<HEADER>
"Well Location, Sample Name"
</HEADER>
<DATA>
${INPUT.CONTAINER.PLACEMENT},${INPUT.CONTAINER.NAME}_${INPUT.CONTAINER.PLACEMENT}_${INPUT.NAME}
</DATA>

Generate Tapestation Input Sample Table CSV Automation

Command line:

bash -l -c "/opt/gls/clarity/bin/java -jar /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar -u {username} -p {password} \
script:driver_file_generator \
-i {processURI:v2} \
-t /opt/gls/clarity/extensions/ngs-common/v5/EPP/conf/readonly/tapestation_driver_file_template.csv \
-o {compoundOutputFileLuid0}.csv \
-l {compoundOutputFileLuid1}"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
SORT.BY.${INPUT.LIMSID}
<DATA>
${OUTPUT.LIMSID}_${INPUT.NAME}
</DATA>

Create GenomeStudio Driver File Automation

Command line:

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar driver_file_generator \
-i {processURI} -u {username} -p {password} -t /opt/gls/clarity/extensions/conf/driverfiletemplates/GenomeStudioGeneExpressionTemplate.csv \
-o {compoundOutputFileLuid0}.csv -l {compoundOutputFileLuid1}.html"

Template file content:

INCLUDE.OUTPUT.RESULTFILES
OUTPUT.SEPARATOR,COMMA
LIST.SEPARATOR,";"
ILLEGAL.CHARACTERS,COMMA
ILLEGAL.CHARACTER.REPLACEMENTS,_SORT.BY.${INPUT.CONTAINER.NAME}${INPUT.CONTAINER.ROW}${INPUT.CONTAINER.COLUMN}
<HEADER_BLOCK>
[HEADER]Investigator Name ${PROCESS.TECHNICIAN}
Project Name, ${SAMPLE.PROJECT.NAME.ALL}
Experiment Name
Date, ${DATE}
[Manifests]
${PROCESS.UDF.Manifest A}
</HEADER_BLOCK>
<HEADER>
[DATA]
Sample_Name,Sample_Well,Sample_Plate,Pool_ID,Sentrix_ID,Sentrix_Position
</HEADER>
<DATA>
${INPUT.NAME},,,,${INPUT.CONTAINER.NAME},${INPUT.CONTAINER.PLACEMENT}
</DATA>
<PLACEMENT>
// inputs to this section are String row, String column, Node containerTypeNode
int convertAlphaToNumeric(String letters) {
    int result = 0
    letters = letters.toUpperCase()
    for (int i = 0; i < letters.length(); i++) {
        result += (letters.charAt(i).minus('A' as char) + 1) * (26 ** (letters.length() - i - 1))
    }
    return result
}
int SENTRIX_POS_THRESHOLD = 12
int WELL_PLATE_SIZE_96 = 96
int xSize = containerTypeNode.'x-dimension'.size.text().toInteger()
int ySize = containerTypeNode.'y-dimension'.size.text().toInteger()
int containerSize = xSize * ySize
boolean xIsAlpha = containerTypeNode.'x-dimension'.'is-alpha'.text().toBoolean()
boolean yIsAlpha = containerTypeNode.'y-dimension'.'is-alpha'.text().toBoolean()
if (containerSize <= SENTRIX_POS_THRESHOLD && (xIsAlpha || yIsAlpha)) {
    return row
}
// R001_C001 for 96 well plate, r01c01 for other container types
if (containerSize == WELL_PLATE_SIZE_96) {
    def numFormat = java.text.NumberFormat.getNumberInstance() numFormat.setMinimumIntegerDigits(3)
    String xStr = numFormat.format(column.isInteger() ? column as int : convertAlphaToNumeric(column))
    String yStr = numFormat.format(row.isInteger() ? row as int : convertAlphaToNumeric(row))
    // Row is mapped to x coordinate, while column is mapped to y.
    // When creating an array type of size 96, swap the row and column dimension.
    // e.g 12 x 8 array should be mapped as an 8 x 12 array
    //
    // This mapping has been in RI for a while.
    // In AddIlluminaArraysStep, all 2D Illumina arrays added have a dimension of 8 x 12.
    // This driver file template then converts it back to 12 x 8.
    // This logic is now corrected to follow other arrays to make sure the driver file.
    // generated is compatible with existing arrays and software.
    return "R"+xStr+"_C"+yStr
} else {
    def numFormat = java.text.NumberFormat.getNumberInstance()
    numFormat.setMinimumIntegerDigits(2)
    String xStr = numFormat.format(column.isInteger() ? column as int : convertAlphaToNumeric(column))
    String yStr = numFormat.format(row.isInteger() ? row as int : convertAlphaToNumeric(row))
    // row is mapped to y, column is mapped to x
    return "r"+yStr +"c"+xStr
}
</PLACEMENT>

Renaming Generated Files

In the template file, the following OUTPUT.FILE.NAME metadata element renames the generated template file 'NewTemplateFileName':

OUTPUT.FILE.NAME, NewTemplateFileName.csv

In the automation command line, the following will attach the generated file to the {compoundOutputFileLuid0} placeholder, with the name defined by the OUTPUT.FILE.NAME metadata element.

bash -c "/opt/gls/clarity/bin/java -cp /opt/gls/clarity/extensions/ngs-common/v5/EPP/DriverFileGenerator.jar \
script:driver_file_generator \
-i {stepURI:v2} \
-u {username} \
-p {password} \
-t /opt/gls/clarity/customextensions/Robot.csv \
-q true -destLIMSID {compoundOutputFileLuid0} \
-o extended_driver_x384.csv \
-l {compoundOutputFileLuid2}"

When the LIMS attaches a file to a placeholder in the LIMS, it assumes that the file is named with the step LIMSID, and uses this LIMSID to identify the placeholder to which the file should be attached. However, when using OUTPUT.FILE.NAME, you can give the file a name that does not begin with the LIMSID of the placeholder to which it will be attached. To do this, you must use the quickAttach and destLIMSID parameters in the automation command line.
If the quickAttach parameter is provided without destLIMSID parameter, the script logs an error and stops execution.
If destLIMSID is provided without using quickAttach, it is ignored.

Using Token Values In File Names

The following tokens are supported for this feature:

PROCESS.LIMSID
PROCESS.UDF.<UDF NAME>
PROCESS.TECHNICIAN
DATE
INPUT.CONTAINER.NAME
INPUT.CONTAINER.TYPE
INPUT.CONTAINER.LIMSID
OUTPUT.CONTAINER.NAME
OUTPUT.CONTAINER.TYPE
OUTPUT.CONTAINER.LIMSID

Rules and Constraints

When using token values in file names, the following rules and constraints apply:

Container-related functions will return the value from a single container, even if there are multiple containers.
Other tokens will function, but will only return the value for the first row of the file (first input or output).
If the OUTPUT.FILE.NAME specified does not match the LIMS ID of the file, the output file will not be attached in the LIMS user interface. To ensure that the file is attached, include the quickAttach and destLIMSID parameters in the command-line string.
It is highly recommended that you do not use SAMPLE.PROJECT.NAME.ALL or SAMPLE.PROJECT.CONTACT.ALL, because the result is prone to surpassing the maximum length of a file name. There are similar issues with other SAMPLE tokens when dealing with pools.
Only the following characters are supported in the file name. Any other characters will be replaced by an _ (underscore) by default. This replacement character can be configured with the OUTPUT.FILE.NAME.ILLEGAL.CHARACTER.REPLACEMENT metadata element.
- a-z
- A-Z
- 0–9
- _ (underscore)
- - (dash)
- . (period)

Providing a full file path for OUTPUT.FILE.NAME is still supported, but deprecated. If the full path is provided, the file/directory separator will be automatically detected and will not be replaced in the static parts of the file name. Any of these separators derived from the result of a token value will be replaced.

Defining a Project Name for Control Samples

Example:

CONTROL.SAMPLE.DEFAULT.PROJECT.NAME, My Control Sample Project

Rules and Constraints

If the token is found in the template, but with no value then no project name will be given for control samples.
If the token is not found in the template, then no project name will be given for control samples.
If multiple values are provided, the first one will be used.
The SAMPLE.PROJECT.NAME.ALL list will include the control project name.

Using HIDE to Exclude Empty Columns

You can use tthe HIDE metadata element to optionally hide a column if it contains no data. The following lines in the metadata will hide a data column when empty:

HIDE, ${OUTPUT.UDF.SAMPLEUDF}, IF, NODATA

HIDE, ${OUTPUT.UDF.SAMPLEUDF},${PROCESS.TECHNICIAN}, ${PROCESS.LIMSID}, IF, NODATA

You may also hide only one representation of a specific column or field:

HIDE, ${PROCESS.TECHNICIAN##FirstName}, IF, NODATA

Using HIDE to Exclude Empty HEADER rows

Assuming ${OUTPUT.UDF.SAMPLEUDF} is one of the rows specified in the template header section, that header row will be hidden whenever there is no data to display in the output file.

If a list of tokens is provided for the value, the row will only be shown if one or more of the tokens resolves to a value:

HIDE, ${OUTPUT.UDF.SAMPLEUDF},${PROCESS.TECHNICIAN}, ${PROCESS.LIMSID}, IF, NODATA

Generating Multiple Files

If you would like to generate multiple files, you can use the following GROUP.FILES.BY metadata elements:

GROUP.FILES.BY.INPUT.CONTAINERS
GROUP.FILES.BY.OUTPUT.CONTAINERS

One sample entry
The same process UDF information

Naming The Files

When generating multiple files, the script gathers them all into one zip file so only one file placeholder is needed regardless of how many containers are in the step.

The zip file name may be provided in the metadata as follows:

GROUP.FILES.BY.INPUT.CONTAINERS,<zip file name>

GROUP.FILES.BY.OUTPUT.CONTAINERS,<zip file name>

GROUP.FILES.BY.INPUT.CONTAINERS,MyZip.zip
MyZip.zip\
      \-- Container1\
              \-- SampleSheet.csv\
      \-- Container2\
              \-- SampleSheet.csv

The file naming, writing, and uploading process works as follows:

The outputPath parameter element is required for the script. You can use this parameter to specify the path to which the generated files will be written and/or the name to use for the file. Use this in the following scenarios:
- When the target path/name is constant OR
- When the target path/name includes something that can only be passed to the script via the command line - for example, if you want to include the value of a {compoundOutputFileLuidN} in the path.
The OUTPUT.TARGET.DIR metadata element overrides any path provided by outputPath, but does not change the file name. Use this:
- When the target path includes something that can only be accessed with token templates - for example, the name of the user who ran the step.
The OUTPUT.FILE.NAME metadata element overrides any value provided by outputPath entirely. This token determines the name of the files that are produced for each container - for example, SampleSheet.csv. It may also contain tokens to access information, such as the container name, and it may also contain a path.

To produce the example of MyZip.zip, you could use the following:

Script parameters:

-outputPath SampleSheet.csv
-q 'true'
-destLIMSID {compoundOutputFileLuid0}

Template:

GROUP.FILES.BY.OUTPUT.CONTAINERS,MyZip.zip
OUTPUT.TARGET.DIR,${OUTPUT.CONTAINER.NAME}

Rules and Constraints

You can only use one GROUP.FILES.BY metadata element in each template file.
To attach the files in the LIMS as a zip file, you must provide the quickAttach parameter along with the destLIMSID.
The zip file name may optionally be specified with the GROUP.FILES.BY metadata.
If quickAttach is used and no zip name is specified in the template, the zip will be named using the destLIMSID parameter value.
The zip file name, file paths, and file names should not contain characters that are illegal for directories and files on the target operating system. Illegal characters will be replaced with underscores.
If a file name is not unique to the target directory, e.g., if multiple SampleSheet.csv files are being written to /my/target/path, an error will be thrown and no files written.
When specifying the OUTPUT.TARGET.DIR metadata element, if a token is used that may resolve to multiple values for a single path (for example, using INPUT.NAME in the path when it will resolve to multiple sample names), one value will be chosen arbitrarily for the path. For example, you may end up with /Container1/Sample1/myfile.csv when there are two samples in the container.