Creating Template Files
Available from: BaseSpace Clarity LIMS v4.2.x
You can create template files that the Template File Generator script (driver_file_generator) will use to generate custom files for use in your lab.
This article provides details on the following:
The parameters used by the script.
The sections of the template file—these define what is output in the generated file.
Sorting logic—options for sorting the data in the generated file.
Rules and constraints to keep in mind when creating templates and generating files.
Examples of how you can use specific tokens and metadata in your template files.
For a complete list of the metadata elements and tokens that you can include in a template file, see Template File Contents.
Script Parameters
⚠️ Upgrade Note: process vs step URIs The driver_file_generator script now uses steps instead of processes for fetching information. When a process URI is supplied, the script detects it and automatically switches it to a step URI. (The PROCESS.TECHNICIAN token, which is only available on 'process' in the API, is still supported.) The behavior of the script has not changed, except that the long form -processURI parameter must be replaced by -stepURI in configuration. The -i version of this parameter remains supported and now accepts both process and step URI values. If your configuration is using -processURI or --processURI, replace each instance with -i (or -stepURI/--stepURI).
The following table defines the parameters used by the driver_file_generator script.
Option
Name
Description
-i {stepURI:v2} -stepURI {stepURI:v2}
Step URI
(Required) LIMS step URI Provides context to resolve all token values. See Upgrade Note above.
-u {username} -username {username}
Username
(Required) LIMS login username
-p {password} -password {password}
Password
(Required) LIMS login password
-t <templateFile> -templatePath <templateFile>
Template file
(Required) Template file path
-o <outputFile> -outputPath <outputFile>
Output file
(Required) Output file path If the folder structure specified in the path does not exist, it is created. Details on the following metadata elements are provided in the Metadata section of the Template File Contents article:
This output file parameter value is overwritten by OUTPUT.FILE.NAME
To output multiple files, use GROUP.FILES.BY.INPUT.CONTAINERS and GROUP.FILES.BY.OUTPUT.CONTAINERS
Files generated are in CSV format by default. Other value-separated formats are available—see OUTPUT.SEPARATOR.
-l <logFile> -logFileName <logFile>
Log file
(Required) Log file name
-q [true
false] -quickAttach [true
false]
-destLIMSID <LIMSID>
Destination LIMS ID
LIMSID of the output to attach the template file to. Use with quickAttach. See Renaming Generated Files and Generating Multiple Files examples.
Command-line example:
Command-line example using -quickAttach and -destLIMS:
See also the Examples section.
Data Source
The input-output-maps of the step (defined by the -stepURI parameter) are used as the data source for the content of the generated file.
If they are present, input-output-maps with the attribute output-generation-type=PerInput are used. Otherwise, all input-output-map items are used.
By default, the data source entries are sorted alphanumerically by LIMS ID. You can modify the sort order by using the SORT.BY and SORT.VERTICAL metadata elements (see Metadata section of the Template File Contents article).
ℹ️ The output generation type specifies how the step outputs were generated in relation to the inputs. PerInput entries are available for the following step types: Standard, Standard QC, Add Labels, and Analysis.
Template Sections
The content of the generated file is determined by the sections defined in the template. Content for each section is contained within xml-like opening and closing tags that are structured as follows:
Most template files follow the same basic structure and include some or all the following sections (by convention, section names are written in capital letters, but this is not required):
The order of the section blocks in the template does not affect the output. In the output file, blocks will always be in the order shown.
The area outside of the sections can contain metadata elements (see Metadata section of the Template File Contents article). Anything else outside of the section tags is ignored.
The <PLACEMENT> and <TOKEN FORMAT> sections are not part of the list and do not create distinct sections in the generated file. Instead, they alter the formatting of the generated output.
HEADER_BLOCK
⚠️ Only a subset of the tokens is available for use in the header block section. For details, see the Template File Contents article Tokens table. If an unsupported token is included, file generation will complete with a warning message and a warning will appear in the log file.
The header block section may include both plain text and data from the LIMS. It consists of information that does not appear multiple times in the generated file—ie, the information is not included in the data rows (see DATA section)
Tokens in the header block always resolve in the context of the first input and first output available. For example, suppose the INPUT.CONTAINER.TYPE token is used in the header block:
If there is only one type of input container present in the data source, that container type will be present in the output file.
If multiple input container types are present in the data source, only the first one encountered while processing the data will be present in the output file.
For this reason, we recommend against using tokens that will resolve to different values for different samples - such as SAMPLE.NAME. If one of these tokens is encountered, a warning is logged and the first value retrieved from the API is used. (Note that you may use.ALL tokens, where available.)
To include a header block section in a template, enclose it within the <HEADER_BLOCK> and </HEADER_BLOCK> tags.
HIDE feature: If one of the tokens of a line is empty and is part of a HIDE statement, that line will be removed entirely. See Using HIDE to Exclude Empty Columns and Using HIDE to Exclude Empty HEADER rows examples.
HEADER
The header section describes the header line of the data section (see DATA section). A simple example might be "Sample ID, Placement".
The content of this section can only include plain text and is output as is. Tokens are not supported.
To include a header section in a template, enclose it within the <HEADER> and </HEADER> tags.
HIDE feature: See 'Hide feature' in DATA section. Also note:
If multiple <HEADER> lines are present, at least one must have the same number of columns as the <DATA> template line.
<HEADER> lines that do not match the number of columns are unaffected by the HIDE feature.
DATA
Each data source entry creates a data row for each template line in the section. All entries are output for the first template line, then the next template line runs, and so on.
The data section allows tokens and text entries. All tokens are supported.
Note the following:
Duplicated rows are eliminated, if present. A row is considered duplicated if its content (after all variables and placeholders have been replaced with their corresponding values) is identical to a previous row. Tokens must therefore provide distinctive enough data (ie, something more than just CONTAINER.NAME) if all of the input-output entry pairs are desired in the generated file.
By default, the script processes only sample entries. However, there are metadata options that allow inclusion of result files/measurements and exclusion of samples.
Metadata sorting options are applied to this section of the template file only.
By default, pooled artifacts are treated as a single input artifact. They can be demultiplexed using the PROCESS.POOLED.ARTIFACTS metadata element.
If there is at least one token relevant to the step inputs or outputs, this section will produce a row for each PerInput entry in the step input-output-map. If no PerInput entries are present in the step input-output-map, the script will attempt to add data rows for PerAllInputs entries.
Input and output artifacts are always loaded if a <DATA> section is present in the template file, due to the need to determine what type of artifacts the script is dealing with.
To include a data section in a template, enclose it within the <DATA> and </DATA> tags.
HIDE feature: If the token in a given column is empty for all lines and that token is part of a HIDE statement, that column (including the matching <HEADER> columns) will be removed entirely. There can only be one <DATA> template line present when using the HIDE feature. See Using HIDE to Exclude Empty Columns and Using HIDE to Exclude Empty HEADER rows examples.
FOOTER
The content of this section can only include plain text and is output as is. Tokens are not supported.
To include a footer section in a template, enclose it within the <FOOTER> and </FOOTER> tags.
PLACEMENT
This section contains groovy code that controls the formatting of PLACEMENT tokens (see the PLACEMENT tokens in Template File Contents article Tokens table).
Within the groovy code, the following variables are available:
Variable Name
Description
containerTypeNode
The container type holding the derived sample
row
The row part of the derived sample's location
column
The column part of the derived sample's location
Note the following:
The script must return a string, which replaces the corresponding <PLACEMENT> tag in the template.
Logic within the placement tags can be as complex as needed, provided it can be compiled by a groovy compiler.
If an error occurs while running formatting code, the original location value is used.
To include a placement section in a template, enclose it within the <PLACEMENT> and </PLACEMENT> tags.
Placement Example: Container Type
In the following example:
If the container type is a 96 well plate, sample placement A1 will return as "A_1"
If the container type is not a 96 well plate, sample placement A1 will return as "A:1"
Placement Example: Zero Padding
TOKEN FORMAT
This section defines logic to be applied to specific tokens to change the format in which they appear in the generated file.
Special formatting rules can be defined per token using the following groovy syntax:
Within the groovy code, the variable 'token' refers to the original value being transformed by the formatting code. The logic replaces all instances of that token with the result.
${token.identifier} marks the beginning of the token formatting code and the end of the previous token formatting code (if any).
You can define multiple formatting logic rules for a given token, by assigning a name to the formatting section (named formatters are called 'variations'). This is done by appending “##” after the token name (eg “${token.identifier##formatterName}”).
Using the named formatter syntax without giving a name (“${token.identifier##}”) will abort the file generation.
If an error occurs while running formatting code, the resulting value will be blank.
If a named formatter is used but not defined, the value is used as is.
To include a placement section in a template, enclose it within the <TOKEN_FORMAT> and </TOKEN_FORMAT> tags.
TOKEN FORMAT Example: Technician Name
In this example, a custom format is defined for displaying the name of the technician who ran a process (step).
The name of the token appears at the beginning of the groovy code that will then be applied. In this code, the variable 'token' refers to the token being affected. The return value is what will replace all instances of this token in the file.
TOKEN FORMAT Example: Appending a String to Container Name or Sample Name
In this second example, when special formatting is required for two tokens, the logic for both appear inside the same set of tags.
The example appends a string to the end of the input container name or a prefix to the beginning of the submitted sample name.
Metadata
Metadata provides information about the template file that is not retrieved from the API — such as the file output directory to use, and how the data contents should be grouped and sorted.
Metadata is not strictly confined to a section, and is not designated by opening and closing tags. However, each metadata entry must be on a separate line.
Metadata entries can be anywhere in the template, but the recommended best practice is to group them either at the top or the bottom of the file.
For a list of supported metadata elements, rules for using them, and examples, see Template File Contents, Metadata section.
Sorting Logic
Sorting in the generated file is done either alphanumerically or by vertical placement information, using the SORT.BY. and SORT.VERTICAL metadata elements.
Sorting must be done using a combination of sort keys - provided to SORT.BY. as one or more ${token} values, each of which always produces a unique value in the file. For example, sorting by just OUTPUT.CONTAINER.NAME would work for samples placed in tubes, but would not work for samples in 96 well plates. Sorting behavior on nonunique combinations is not guaranteed to be predictable.
To sort vertically:
Include the SORT.VERTICAL metadata element in the template file. In addition, the SORT.BY.${token}, ${token} metadata must also be included, as follows:
Any SORT.BY. tokens will be sorted using the vertical sorter instead of the alphanumeric sort.
To apply sorting to samples in 96 well plates:
You could narrow the sort key to a unique combination such as:
See also SORT.VERTICAL and SORT.BY. in the Template File Contents article.
Rules and Constraints
The template must adhere to the following rules:
Metadata entries must each appear on a new line and be the only entry on that line.
Metadata entries must not appear inside tags.
Opening and closing section tags must appear on a new line and as the only entry on that line.
Each opened tag must be closed, otherwise it is skipped by the script.
Any sections (opening tag + closing tag combination) can be omitted from the template file.
Entries that are separated by commas in the template will be delimited by the metadata-specified separator (default: COMMA) in the template file.
White space is allowed in the template. However, if there is a blank line inside a tag, it will also be present in the template file produced.
If an entry in the template is enclosed in double quotes it will be imported as a single entry and written to the template file as such, even if it has commas inside.
To include double-quotes or single-quotes in the template file, use the escape character: Example: \" or \'
To include an escape character in the template file, use two escape characters inside double-quotes. For example, if you want to see \\Share\Folder\Filename.txt use "\\\\Share\\Folder\\Filename.txt" as the token.
If any of the following conditions is not met - the tag, and everything inside it, is ignored by the script and a warning displays in the log file:
Except for the metadata, all template sections must be enclosed inside tags.
Each tag must have its own line, and must be the only tag present on that line.
No other entries, even empty ones, are allowed.
All opened tags must be closed.
Custom field names must not contain periods.
Examples
Illumina Instrument Sample Sheets
The LIMS provides configuration to support generation of sample sheets that are compatible with some Illumina instruments. For details, see the Illumina Instrument Sample Sheets documentation.
Generating Sample Sheets for QC Instruments
The LIMS provides configured automations that generate sample sheets compatible with a number of QC instruments. The default automation command lines are provided below.
Renaming Generated Files
In the template file, the following OUTPUT.FILE.NAME metadata element renames the generated template file 'NewTemplateFileName':
In the automation command line, the following will attach the generated file to the {compoundOutputFileLuid0} placeholder, with the name defined by the OUTPUT.FILE.NAME metadata element.
ℹ️ When the LIMS attaches a file to a placeholder in the LIMS, it assumes that the file is named with the step LIMSID, and uses this LIMSID to identify the placeholder to which the file should be attached. However, when using OUTPUT.FILE.NAME, you can give the file a name that does not begin with the LIMSID of the placeholder to which it will be attached. To do this, you must use the quickAttach and destLIMSID parameters in the automation command line.
If the quickAttach parameter is provided without destLIMSID parameter, the script logs an error and stops execution.
If destLIMSID is provided without using quickAttach, it is ignored.
Using Token Values In File Names
The OUTPUT.FILE.NAME and OUTPUT.TARGET.DIR metadata elements support token values. This allows you to name files based on input / output values of the step - the input or output container name, for example.
The following tokens are supported for this feature:
PROCESS.LIMSID
PROCESS.UDF.<UDF NAME>
PROCESS.TECHNICIAN
DATE
INPUT.CONTAINER.NAME
INPUT.CONTAINER.TYPE
INPUT.CONTAINER.LIMSID
OUTPUT.CONTAINER.NAME
OUTPUT.CONTAINER.TYPE
OUTPUT.CONTAINER.LIMSID
Rules and Constraints
When using token values in file names, the following rules and constraints apply:
Container-related functions will return the value from a single container, even if there are multiple containers.
Other tokens will function, but will only return the value for the first row of the file (first input or output).
If the OUTPUT.FILE.NAME specified does not match the LIMS ID of the file, the output file will not be attached in the LIMS user interface. To ensure that the file is attached, include the quickAttach and destLIMSID parameters in the command-line string.
It is highly recommended that you do not use SAMPLE.PROJECT.NAME.ALL or SAMPLE.PROJECT.CONTACT.ALL, because the result is prone to surpassing the maximum length of a file name. There are similar issues with other SAMPLE tokens when dealing with pools.
Only the following characters are supported in the file name. Any other characters will be replaced by an _ (underscore) by default. This replacement character can be configured with the OUTPUT.FILE.NAME.ILLEGAL.CHARACTER.REPLACEMENT metadata element.
a-z
A-Z
0–9
_ (underscore)
- (dash)
. (period)
⚠️ Providing a full file path for OUTPUT.FILE.NAME is still supported, but deprecated. If the full path is provided, the file/directory separator will be automatically detected and will not be replaced in the static parts of the file name. Any of these separators derived from the result of a token value will be replaced.
Defining a Project Name for Control Samples
You can use the CONTROL.SAMPLE.DEFAULT.PROJECT.NAME metadata element to define a project name for control samples. The value specified by this token will be used when determining one or more values for the SAMPLE.PROJECT.NAME and SAMPLE.PROJECT.NAME.ALL tokens.
Example:
Rules and Constraints
If the token is found in the template, but with no value then no project name will be given for control samples.
If the token is not found in the template, then no project name will be given for control samples.
If multiple values are provided, the first one will be used.
The SAMPLE.PROJECT.NAME.ALL list will include the control project name.
Using HIDE to Exclude Empty Columns
You can use tthe HIDE metadata element to optionally hide a column if it contains no data. The following lines in the metadata will hide a data column when empty:
Assuming ${OUTPUT.UDF.SAMPLEUDF} is one of the data columns specified in the template, then that column will be hidden whenever there is no data to show in the output file. If a list of fields is provided, then any empty ones will be hidden:
You may also hide only one representation of a specific column or field:
Using HIDE to Exclude Empty HEADER rows
You can also use the HIDE metadata element with tokens in the header section. If one or more tokens are used for a header key value pair, and there are no values for any of the tokens, the entire row will be hidden.
Assuming ${OUTPUT.UDF.SAMPLEUDF} is one of the rows specified in the template header section, that header row will be hidden whenever there is no data to display in the output file.
If a list of tokens is provided for the value, the row will only be shown if one or more of the tokens resolves to a value:
Generating Multiple Files
If you would like to generate multiple files, you can use the following GROUP.FILES.BY metadata elements:
GROUP.FILES.BY.INPUT.CONTAINERS
GROUP.FILES.BY.OUTPUT.CONTAINERS
These elements allow a file to be created per instance of the specified element in the step, for example, one file per input or per output container. Step level information appears in all files, but sample information is specific to the samples in the given container.
For example, suppose that a step has two samples - each in their own container - with a template file calling for information about process UDFs and sample names. Using this metadata will produce two files, each of which will contain:
One sample entry
The same process UDF information
As a best practice, we recommend storing a copy of generated files in the LIMS. To do this, you must use the quickAttach script parameter. This parameter must be used with the destLIMSID parameter, which tells the Template File Generator script which file placeholder to use. (For details, see Script Parameters.)
Naming The Files
When generating multiple files, the script gathers them all into one zip file so only one file placeholder is needed regardless of how many containers are in the step.
The zip file name may be provided in the metadata as follows:
Inside the zip file, include any paths specified for where files should be written. An example final structure inside the zip, where the subfolders are specified using the container name token, could be as follows:
The file naming, writing, and uploading process works as follows:
The outputPath parameter element is required for the script. You can use this parameter to specify the path to which the generated files will be written and/or the name to use for the file. Use this in the following scenarios:
When the target path/name is constant OR
When the target path/name includes something that can only be passed to the script via the command line - for example, if you want to include the value of a {compoundOutputFileLuidN} in the path.
The OUTPUT.TARGET.DIR metadata element overrides any path provided by outputPath, but does not change the file name. Use this:
When the target path includes something that can only be accessed with token templates - for example, the name of the user who ran the step.
The OUTPUT.FILE.NAME metadata element overrides any value provided by outputPath entirely. This token determines the name of the files that are produced for each container - for example, SampleSheet.csv. It may also contain tokens to access information, such as the container name, and it may also contain a path.
If you provide all three of outputPath, OUTPUT.TARGT.DIR, and OUTPUT.FILE.NAME, the result is that outputPath is ignored and the path specified by OUTPUT.TARGET.DIR is used as the parent under which OUTPUT.FILE.NAME is created, even if OUTPUT.FILE.NAME includes a path in addition to the file name.
If you wish to only attach files to placeholders in the LIMS and do not wish to also write anything to disk, then omit OUTPUT.TARGET.DIR and provide the outputPath parameter value as ".". This will cause files to only be written to the temporary directory that is cleaned up after the automation completes.
To produce the example of MyZip.zip, you could use the following:
Script parameters:
Template:
Rules and Constraints
You can only use one GROUP.FILES.BY metadata element in each template file.
To attach the files in the LIMS as a zip file, you must provide the quickAttach parameter along with the destLIMSID.
The zip file name may optionally be specified with the GROUP.FILES.BY metadata.
If quickAttach is used and no zip name is specified in the template, the zip will be named using the destLIMSID parameter value.
The zip file name, file paths, and file names should not contain characters that are illegal for directories and files on the target operating system. Illegal characters will be replaced with underscores.
If a file name is not unique to the target directory, e.g., if multiple SampleSheet.csv files are being written to /my/target/path, an error will be thrown and no files written.
When specifying the OUTPUT.TARGET.DIR metadata element, if a token is used that may resolve to multiple values for a single path (for example, using INPUT.NAME in the path when it will resolve to multiple sample names), one value will be chosen arbitrarily for the path. For example, you may end up with /Container1/Sample1/myfile.csv when there are two samples in the container.
Last updated