Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
Empty Form
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
Single file input
An example of a single file input which can be in a TXT, CSV, or FASTA format.
<pd:dataInputcode="in"format="TXT, CSV, FASTA"type="FILE"required="true"multiValue="false"> <pd:label>Input file</pd:label> <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description> </pd:dataInput>
Folder as an input
To use a folder as an input the following form is required:
For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
<pd:dataInputcode="tumor_fastqs"format="FASTQ"type="FILE"required="false"multiValue="true"> <pd:label>Tumor FASTQs</pd:label> <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
</pd:description></pd:dataInput>
Settings
Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
Integers
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
<pd:parametercode="ht_seed_len"minValues="0"maxValues="1"classification="USER"> <pd:label>Seed Length</pd:label> <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
</pd:description> <pd:integerTypeminimumValue="10"maximumValue="50"/> <pd:value>21</pd:value></pd:parameter>
Options
Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
<pd:parametercode="cnv_segmentation_mode"minValues="0"maxValues="1"classification="USER"> <pd:label>Segmentation Algorithm</pd:label> <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
</pd:description> <pd:optionsType> <option>CBS</option> <option>SLM</option> <option>HSLM</option> <option>ASLM</option> </pd:optionsType> <pd:value>false</pd:value></pd:parameter>
Option types can also be used to specify a boolean, for example
One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
nextflow.enable.dsl =2// Define parameters with default valuesparams.file =falseparams.str =false// Check that at least one of the parameters is specifiedif (!params.file &&!params.str) { error "You must specify at least one input: --file or --str"}process printInputs { container 'public.ecr.aws/lts/ubuntu:22.04' pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small' input: file(input_file) script:""" echo "File contents:" cat $input_file """}process printInputs2 { container 'public.ecr.aws/lts/ubuntu:22.04' pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small' input: val(input_str) script:""" echo "String input: $input_str" """}workflow {if (params.file) { file_ch =Channel.fromPath(params.file) file_ch.view() str_ch =Channel.empty() printInputs(file_ch) }else { file_ch =Channel.empty() str_ch =Channel.of(params.str) str_ch.view() file_ch.view() printInputs2(str_ch) } }