arrow-left

All pages
gitbookPowered by GitBook
1 of 15

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Sun Grid Engine (SGE) on ICA Bench

hashtag
Running Jobs in a Bench SGE Cluster

Once a cluster is started, the cluster manager can be accessed from the workspace node.

hashtag
Job resources

Every cluster member has a certain capacity which is determined by the selected model for the cluster member.

The following complex values have been added to the SGE cluster environment and are requestable.

  • static_cores (default: 1)

  • static_mem (default: 2G)

These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.

To ensure stability of the system, some headroom is deducted from the total node capacity.

hashtag
Scaling

These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.

Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.

circle-exclamation

Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.

The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log

hashtag
Submitting jobs

Submitting a single job

Submitting a job array

circle-info

Do not limit the job concurrency amount as this will result in unused cluster members.

hashtag
Monitoring members

Listing all members of the cluster

hashtag
Managing running/pending jobs

listing all jobs in the cluster

Showing the details of a job.

Deleting a job.

hashtag
Managing executed jobs

Showing the details of an executed job.

hashtag
SGE Reference documentation

SGE command line options and configuration details can be found .

Resource
herearrow-up-right
qsub -l static_mem=1G -l static_cores=1 /data/myscript.sh
qsub -l static_mem=1G -l static_cores=1 -t 1-100 /data/myscript.sh
qhost
qstat -f
qstat -f -j <jobId>
qdel <jobId>
qacct -j <jobId>

JupyterLab

Bench workspaces require setting a Docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default Docker image with JupyterLabarrow-up-right installed.

JupyterLab supports Jupyter Notebook documentsarrow-up-right (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.

The JupyterLab Docker image contains the following environment variables:

Variable
Set to

ICA_URL

https://ica.illumina.com/ica (ICA server URL)

ICA_PROJECT (Obsolete)

circle-info

To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data.

hashtag
ICA Python Library

Included in the default JupyterLab Docker image is a python library with APIs to perform actions in ICA, such as add data, launch pipelines, and operate on Base tables. The python library is generated from the using .

The ICA Python library API documentation can be found in folder /etc/ica/data/ica_v2_api_docs within the JupyterLab Docker image.

See the for examples on using the ICA Python library.

Bench

ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.

hashtag
Access

Having access to Bench depends on the following conditions:

ICA project ID

ICA_PROJECT_UUID

Current ICA project UUID

ICA_SNOWFLAKE_ACCOUNT

ICA Snowflake (Base) Account ID

ICA_SNOWFLAKE_DATABASE

ICA Snowflake (Base) Database ID

ICA_PROJECT_TENANT_NAME

Name of the owning tenant of the project where the workspace is created.

ICA_STARTING_USER_TENANT_NAME

Name of the tenant of the user which last started the workspace.

ICA_COHORTS_URL

URL of the Cohorts web application used to support the Cohort's view

ICA Open API specificationarrow-up-right
openapi-generatorarrow-up-right
Bench ICA Python Library Tutorial

Bench needs to be included in your ICA subscription.

  • The project owner needs to enable Bench for their project.

  • Individual users of that project need to be given access to Bench.

  • hashtag
    Enabling Bench for your project

    After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. The entitlements you have determine the available resources for your Bench workspaces. If you have multiple entitlements, all the resources of your individual entitlements are taken into account. Once bench is enabled, users with matching permissions have access to the Bench module in that project.

    circle-info

    If you do not see the Enable button for Bench, then either your tenant subscription does not include Bench or the tenant to which you belong is not the one where the project was created. Users from other tenants can create workspaces in Bench once Bench is enabled, but they cannot enable the Bench module itself.

    hashtag
    Setting user level access.

    Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.

    • Tenant administrators and project owners are always able to access Bench and perform all actions.

    • The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.

      • No Access means you have no access to the Bench workspace for that project.

      • Contributor gives you the right to start and stop the Bench workspace and to access the workspace contents, but not to create or edit the workspace.

      • Administrator gives you the right to create, edit, delete, start and stop the Bench workspace, and to access the actual workspace contents. In addition, the administrator can also build new derived Bench images and tools.

    • Finally, a verification is done of your user rights against the required workspace permissions. You will only have access when your user rights meet or exceed the required workspace permissions. The possible required Workspace permissions include:

      • Upload / Download rights (Download rights are mandatory for technical reasons)

      • Project Level (No Access / Data Provider / Viewer / Contributor)

    Flow diagram of access to Bench

    Workspaces

    The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.

    You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.

    For each workspace, you can see the status by the color.

    circle-exclamation

    Flow (No Access / Viewer / Contributor)
  • Base (No Access / Viewer / Contributor)

  • Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.

    You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.

    hashtag
    Create Workspace

    circle-info

    If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.

    1. Click Projects > Your_Project > Bench > Workspaces > + Create Workspace

    2. Complete the following fields and save the changes.

    circle-info

    (*1) URLs must comply with the following rules:

    • URLs can be between 1 and 263 characters including dot (.).

    • URLs can begin with a leading dot (.).

    • Domain and Sub-domains:

      • Can include alphanumeric characters (Letters A-Z and digits 0-9). Case insensitive.

      • Can contain hyphens (-) and underscores (_), but not as a first or last character.

    • Dot (.) must be placed after a domain or sub-domain.

    • If you use a trailing slash like in the path ftp.example.net/folder/ then you will not be able to access the path ftp.example.net/folder without the trailing slash included.

    • Regex for URL : [(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=-]\{2,256}\.[a-z]\{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)

    circle-exclamation

    (*2) When you grant workspace access to multiple users, you need to provide an API key to access the workspace. Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. enter the API Key generated from the product dashboard. See here for more information.

    chevron-rightExample URLshashtag

    The following are example URLs which will be considered valid.

    example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\

    Example data science-specific whitelist compatible with restricted Bench workspaces. There are two required URLs to allow for Python pip installs:

    pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\

    The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.

    hashtag
    Workspace permissions

    When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

    hashtag
    Administrator vs Contributor

    • Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.

    • Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.

    hashtag
    Setting Workspace Permissions

    The teams settingarrow-up-right determines if someone is an administrator or contributor, while the dedicated permissions you set on the workspace level indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.

    circle-info

    For security reasons, the Tenant administrator and Project owner can always access the workspace.

    circle-info

    If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".

    The permissions that a Bench workspace can receive are the following:

    • Upload rights

    • Download rights (required)

    • Project (No Access - Dataprovider - Viewer - Contributor)

    • Flow (No Access - Viewer - Contributor)

    • Base (No Access - Viewer - Contributor)

    Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.

    If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.

    circle-info

    Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.

    hashtag
    Delete workspace (Bench Administrators Only)

    To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.

    The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.

    hashtag
    Use workspace

    The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.

    circle-info

    As long as the workspace is running, the resources provided for this workspace will be charged.

    hashtag
    Start workspace

    To start the workspace, follow the next steps:

    1. Go to Projects > your_project > Bench > Workspaces > your_workspace > Details

    2. Click on Start Workspace button

    3. On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.

    4. Wait until the status is “Running” and the “Access” tab can be opened. This can take some time because the necessary resources have to be provisioned.

    You can refresh the workspace status by selecting the round refresh symbol at the top right.

    Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the Automatic Shutdown field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.

    circle-info

    You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.

    circle-info

    If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.

    hashtag
    Stop workspace

    When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.

    Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.

    circle-exclamation

    Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.

    The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.

    You can see who is using a workspace in the workspace list view.

    hashtag
    Workspace Tabs

    hashtag
    Access tab

    Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.

    The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.

    hashtag
    Docker Builds tab (Bench Administrators only)

    To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).

    circle-info

    The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.

    In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.

    • Name: By default, this is the same name as the original image and it is recommended to change the name.

    • Version: Required field which can by any value.

    • Description: The description for your docker image (for example, indicating which apps it contains).

    • Code: The Docker file commands must be provided in this section.

    The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.

    Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.

    hashtag
    Tools (Bench Administrators Only)

    From within the workspace it is possible to create a tool from the Docker image.

    1. Click the Manage > Create CWL Tool button in the top right corner of the workspace.

    2. Give the tool a name.

    3. Replace the description of the tool to describe what it does.

    4. Add a version number for the tool.

    5. Click the Docker Build tab.

      • Here the image that accompanies the tool will be created.

      • Change the name for the image.

    6. Click the General tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the section in the Flow documentation.

    7. Click the Save button in the upper, right-hand corner to start the build process.

    The building can take a while. When it has completed, the tool will be available in the Tool Repository.

    hashtag
    Workspace Data

    To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.

    • For fast read-only access, link folders with the CLI command workspace-ctl data create-mount --mode read-only.

    • For fast read/write access, link non-indexed folders which are visible, but whose contents are not accessible from ICA. Use the CLI command workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.

    Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).

    File Mapping

    hashtag
    Activity tab

    The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.

    In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.

    Length between 1 and 63 characters.

    Change the version.
  • Replace the description to describe what the image does.

  • Below the line where it says “#Add your commands below.” write the code necessary for running this docker image.

  • Tool creation

    Run DRAGEN in Bench - Interactive

    hashtag
    Introduction

    DRAGEN can run in Bench workspaces

    • In either FPGA mode (hardware-accelerated) or software mode when using FPGA instances. This can be useful when comparing performance gains by hardware acceleration or to distribute concurrent processes between the FPGA and cpu.

    • In software mode when using non-FPGA instances.

    circle-info

    To run DRAGEN in software mode, you need to use the DRAGEN --sw-mode parameter.

    The DRAGEN command line parameters to specify the location of the licence file are different.

    • FPGA mode uses LICENSE_PARAMS=``"--lic-instance-id-location /opt/dragen-licence/instance-identity.protected --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds.lic"

    • Software mode uses LICENSE_PARAMS="--sw-mode --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds-sw-mode.lic"

    hashtag
    DRAGEN Bench Images

    DRAGEN software is provided in specific Bench images with names starting with Dragen. For example (available versions may vary):

    • Dragen 4.4.1 - Minimal provides DRAGEN 4.4.1 and SSH access

    • Dragen 4.4.6 provides DRAGEN 4.4.6, SSH and JupyterLab.

    hashtag
    Prerequisites

    hashtag
    Memory

    The instance type is selected during workspace creation (Projects > your_project > Bench > Workspaces). The amount of RAM available on the instance is critical. 256GiB RAM is a safe choice to run DRAGEN in production. All FPGA2 instances offer 256GiB or more of RAM.

    When running in Software mode, use (348GiB RAM) or (144 GiB RAM) to ensure enough RAM is available for your runs.

    circle-info

    During pipeline development, when typically using small amounts of data, you can try to scale down in instance types to save costs. You can start at hicpu-large and progressively use smaller instances, though you will need at least standard-xlarge. If DRAGEN runs out of available memory, the system is rebooted, losing your currently running commands and interface.

    DRAGEN version 4.4.6 and later verify if the system has at least 128GB of memory available. If not enough memory is available, you will encounter an error stating that the Available memory is less than the minimum system memory required 128GB. This can be overridden with the command line parameter dragen --min-memory 0

    hashtag
    FPGA-mode

    Using an fpga2-medium .

    hashtag
    Example

    hashtag
    Software-mode

    Using a standard-xlarge .

    Software mode is activated with the DRAGEN --sw-mode parameter.

    hashtag
    Example

    FUSE Driver

    Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.

    As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:

    • Copy project data

    • Delete project data

    • Mount project data (CLI only)

    • Unmount project data (CLI only)

    When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.

    triangle-exclamation

    This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.

    hashtag
    Copy project data

    The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa. There is a file size limit of 500 GB per file for the FUSE driver.

    hashtag
    Delete project data

    The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.

    triangle-exclamation

    Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.

    hashtag
    CLI

    Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.

    circle-info

    MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.

    Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.

    triangle-exclamation

    Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.

    hashtag
    Restrictions

    triangle-exclamation

    Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.

    Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.

    Some examples of other actions or commands that will not work because of the above mentioned limitations:

    • Save a jupyter notebook or R script on the /project location

    • Add/remove a file from an existing zip file

    • Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt

    A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.

    Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.

    hashtag
    Jupyter notebooks

    To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.

    hashtag
    Old Bench workspaces

    This functionality won't work for old workspaces unless you enable the permissions for that old workspace.

    Containers in Bench

    Bench has the ability to handle containers inside a running workspace. This allows you to install and package software more easily as a container image and provides capabilities to pull and run containers inside a workspace.

    Bench offers a container runtime as a service in your running workspace. This allows you to do standardized container operations such as pulling in images from public and private registries, build containers at runtime from a Dockerfile, run containers and eventually publish your container to a registry of choice to be used in different ICA products such as ICA Flow.

    hashtag
    Setup

    The Container Service is accessible from your Bench workspace

    Bench Clusters

    hashtag
    Managing a Bench cluster

    hashtag
    Introduction

    Workspaces can have their own dedicated cluster which consists of a number of nodes. First the workspace node, which is used for interacting with the cluster, is started. Once the workspace node is started, the workspace cluster can be started.

    Spark on ICA Bench

    Running a Spark application in a Bench Spark Cluster

    hashtag
    Running a pyspark application

    The JupyterLab environment is by default configured with 3 additional kernels

    • PySpark –

    Rename a file due to the existing association between ICA and AWS
  • Move files or folders.

  • Using vi or another editor

  • himem-large
    hicpu-large
    instance type
    instance type
    environment by default.

    The container service uses the workspace disk to store any container images you pulled in or created.

    To interact with the Container Service, a container remote client CLI is exposed automatically in the /data/.local/bin folder. The Bench workspace environment is preconfigured to automatically detect where the Container Service is made available using environment variables. These environment variables are automatically injected into your environment and are not determined by the Bench Workspace Image.

    hashtag
    Container Management

    Use either docker or podman cli to interact with the Container Service. Both are interchangeable and support all the standardized operations commonly known.

    hashtag
    Pulling a Container Image

    To run a container, the first step is to either build a container from a source container or pull in a container from a registry

    hashtag
    Public Registry

    A public image registry does not require any form of authentication to pull the container layers.

    The following command line example shows how to pull in a commonly known image.

    circle-info

    The Container Service uses Dockerhub by default to pull images from if no registry hostname is defined in the container image URI.

    hashtag
    Private Registry

    To pull images from a private registry, the Container Service needs to authenticate to the Private Registry.

    The following command line example shows how to instruct the Container Service to login into the Private registry.hub.docker.com registry

    circle-info

    Depending on your authorisations in the private registry you will be able to pull and push images. These authorisations are managed outside of the scope of ICA.

    hashtag
    Pushing a Container Image

    Depending on the Registry setup you can publish Container Images with or without authentication. If Authentication is required, follow the login procedure described in Private Registry

    The following command line example shows how to publish a locally available Container Image to a private registry in Dockerhub.

    hashtag
    Saving a Container Image as an Archive

    The following example shows how to save a locally available Container Image as a compressed tar archive.

    This lets you upload the container image into the Private ICA Docker Registry.

    hashtag
    Listing Locally Available Container Images

    The following example shows how to list all locally available Container Images

    hashtag
    Deleting a Container Image

    Container Images require storage capacity on the Bench Workspace disk. The capacity is shown when listing the locally available container images. The container Images are persisted on disk and remain available whenever a workspace stops and restarts.

    The following example shows how to clean up a locally available Container Image

    circle-info

    When a Container Image has multiple tags, all the tags need to be removed individually to free up disk capacity.

    hashtag
    Running a Container

    A Container Image can be instantiated in a Container running inside a Bench Workspace.

    By default the workspace disk (/data) will be made available inside the running Container. This lets you to access data from the workspace environment.

    When running a Container, the default user defined in the Container Image manifest will be used and mapped to the uid and the gid of the user in the running Bench Workspace (uid:1000, gid: 100). This will ensure files created inside the running container on the workspace disk will have the same file ownership permissions.

    hashtag
    Run a Container as a normal user

    The following command line example shows how to run an instance a locally available Container Image as a normal user

    hashtag
    Run a Container as root user

    Running a Container as root user maps the uid and gid inside the running Container to the running non-root user in the Bench Workspace. This lets you act as user with uid 0 and gid 0 inside the context of the container.

    By enabling this functionality, you can install system level packages inside the context of the Container. This can be leveraged to run tools that require additional system level packages at runtime.

    The following command line example shows how to run an instance of a locally available Container as root user and install system level packages

    When no specific mapping is defined using the --userns flag, the user in the running Container user will be mapped to an undefined uid and gid based on an offset of id 100000. Files created in your workspace disk from the running Container will also use this uid and gid to define the ownership of the file.

    Building a Container

    To build a Container Image, you need to describe the instructions in a Dockerfile.

    This next example builds a local Container Image and tags it as myimage:1.0 The Dockerfile used in this example is

    The following command line example will build the actual Container Image

    circle-info

    When defining the build context location, keep in mind that using the HOME folder (/data) will index all files available in /data, which can be a lot and will slow down the process of building. Hence the reason to use a minimal build context whenever possible.

    mkdir /data/demo 
    cd /data/demo 
    
    # download ref 
    wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa 
    # => 0.5min 
    
    # Build ht-ref 
    mkdir ref 
    dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref 
    # => 6.5min 
    
    # run DRAGEN mapper 
    FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
    
    # Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering unrecognised option '--validate-pangenome-reference=false'.
    DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false" 
    
    # License Parameters
    LICENSE_PARAMS="--lic-instance-id-location /opt/dragen-licence/instance-identity.protected --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds.lic"
    
    mkdir out
    dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS} 
    # => 1.5min (10 sec if fpga already programmed)
    mkdir /data/demo 
    cd /data/demo 
    
    # download ref 
    wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa 
    # => 0.5min 
    
    # Build ht-ref 
    mkdir ref 
    dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref 
    # => 6.5min 
    
    # run DRAGEN mapper 
    FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
    
    # Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering ERROR: unrecognised option '--validate-pangenome-reference=false'.
    DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false"
    
    # When using DRAGEN 4.4.6 and later, the line above should be extended with --min-memory 0 to skip the memory check.
    DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false --min-memory 0" 
    
    # License Parameters
    LICENSE_PARAMS="--sw-mode --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds-sw-mode.lic" 
    
    mkdir out 
    dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS} 
    # => 2min
    # Pull Container image from Dockerhub 
    /data $ docker pull alpine:latest  
    # Pull a Container Image from Dockerhub 
    /data $ docker login -u <username> registry.hub.docker.com 
    Password:  
    Login Succeeded! 
    /data $ docker pull registry.hub.docker.com/<privateContainerUri>:<tag> 
    # Push a Container Image to a Private registry in Dockerhub 
    /data $ docker pull alpine:latest 
    /data $ docker tag alpine:latest registry.hub.docker.com/<privateContainerUri>:<tag> 
    /data $ docker push registry.hub.docker.com/<privateContainerUri>:<tag> 
    # Save a Container Image as a compressed archive 
    /data $ docker pull alpine:latest 
    /data $ docker save alpine:latest | bzip2 > /data/alpine_latest.tar.bz2 
    # List all local available images 
    /data $ docker images 
    REPOSITORY                TAG         IMAGE ID      CREATED      SIZE 
    docker.io/library/alpine  latest      aded1e1a5b37  3 weeks ago  8.13 MB 
    # Remove a locally available image 
    /data $ docker rmi alpine:latest 
    # Run a Container as a normal user 
    /data $ docker run -it --rm alpine:latest 
    ~ $ id 
    uid=1000(ica) gid=100(users) groups=100(users)  
    # Run a Container as root user 
    /data $ docker run -it --rm --userns keep-id:uid=0,gid=0 --user 0:0 alpine:latest 
    / # id 
    uid=0(root) gid=0(root) groups=0(root) 
    / # apk add rsync 
    ... 
    / # rsync  
    rsync  version 3.4.0  protocol version 32 
    ... 
    # Run a Container as a non-mapped root user 
    /data $ docker run -it --rm --user 0:0 alpine:latest 
    / # id 
    uid=0(root) gid=0(root) groups=100(users),0(root) 
    / # touch /data/myfile 
    / #  
    # Exited the running Container back to the shell in the running Bench Workspace 
    /data $ ls -al /data/myfile  
    -rw-r--r-- 1 100000 100000 0 Mar 13 08:27 /data/myfile 
    FROM alpine:latest 
    RUN apk add rsync 
    COPY myfile /root/myfile 
    # Build a Container image locally 
    /data $ mkdir /tmp/buildContext 
    /data $ touch /tmp/buildContext/myFile 
    /data $ docker build -f /tmp/Dockerfile -t myimage:1.0 /tmp/buildContext 
    ... 
    /data $ docker images 
    REPOSITORY                TAG         IMAGE ID      CREATED             SIZE 
    docker.io/library/alpine  latest      aded1e1a5b37  3 weeks ago         8.13 MB 
    localhost/myimage         1.0         06ef92e7544f  About a minute ago  12.1 MB 

    The cluster consists of 2 components

    • The manager node which orchestrates the workload across the members.

    • Anywhere between 0 and up to maximum 50 member nodes.

    hashtag
    Clusters can run in two modes.

    • Static - A static cluster has a manager node and a static number of members. At start-up of the cluster, the system ensures the predefined number of members are added to the cluster. These nodes will keep running as long as the entire cluster runs. The system will not automatically remove or add nodes depending on the job load. This gives the fastest resource availability, but at additional cost as unused nodes stay active, waiting for work.

    • Dynamic - A dynamic cluster has a manager node and a dynamic number of workers up to a predefined maximum (with a hard limit of 50). Based on the job load the system will scale the number of members up or down. This saves resources as only as much worker nodes as needed to perform the work are being used.

    hashtag
    Configuration

    You manage Bench Clusters via the Illumina Connected Analytics UI in Projects > your_project > Bench > Workspaces > your_workspace > Details.

    The following settings can be defined for a bench cluster:

    Field
    Description

    Web access

    Enable or disable web access to the cluster manager.

    Dedicated Cluster Manager

    Use a dedicated node for the cluster manager. This means that an entire machine of the type defined at resource model is reserved for your cluster manager. If no dedicated cluster manager is selected, one core per cluster member will be reserved for scheduling. For example, if you have 2 nodes of standard-medium (4 cores) and no dedicated cluster manager, then only 6 (2x3) cores are available to run tasks as each node reserves 1 core for the cluster manager.

    Type

    Choose between cluster members

    Scaling interval

    For static, set the number of cluster member nodes (maximum 50), for dynamic, choose the minimum and maximum (up to 50) amount of cluster member nodes.

    Resource model

    The type of on which the cluster member(s) will run. For every cluster member, one of these machines is used as resource, so be aware of the possible cost impact when running many machines with a high individual .

    Economy mode

    Economy mode uses AWS . This halves many compute iCredit rates vs standard mode, but may be interrupted. See for a list of which resource models support economy pricing.

    hashtag
    Operations

    Once the workspace is started, the cluster can be started at Projects > your_project > Bench > Workspaces > your_workspace > Details and the cluster can be stopped without stopping the workspace. Stopping the workspace will also stop all clusters in that workspace.

    hashtag
    Managing Data in a Bench cluster

    Data in a bench workspace can be divided into three groups:

    • Workspace data is accessible in read/write mode and can be accessed from all workspace components (workspace node, cluster manager node, cluster member nodes ) at /data. The size of the workspace data is defined at the creation of the workspace but can be increased when editing a workspace in the Illumina connected analytics UI. This is persistent storage and data remains when a workspace is shut down.

    • Project data can be accessed from all workspace components at /data/project. Every component will have their own dedicated mount to the project. Depending on the project data permissions you will be able to access it in either Read-Only or Read-Write mode.

    • Scratch data is available on the cluster members at /scratch and can be used to store intermediate results for a given job dedicated to that member. This is temporary storage, and all data is deleted when a cluster member is removed from the cluster.

    hashtag
    Fast Read-Only Access

    All mounts occur in /data/mounts/, see data access and workspace-ctl data.

    Managing these mounts is done via the workspace cli /data/.local/bin/workspace-ctl in the workspace. Every node will have his dedicated mount.

    For fast data access, bench offers a mount solution to expose project data on every component in the workspace. This mount provides read-only access to a given location in the project data and is optimized for high read throughput per single file with concurrent access to files. It will try to utilise the full bandwidth capacity of the node.

    All mounts occur in path /data/mounts/

    hashtag
    Show mounts

    hashtag
    Creating a mount

    For fast read-only access, link folders with the CLI command workspace-ctl data create-mount --mode read-only.

    circle-info

    This has the same effect as using the --mode read-only option because this is applied by default when using workspace-ctl data create-mount .

    hashtag
    Removing a mount

    PySpark – Remote

  • PySpark – Remote – Dynamic

  • When one of the above kernels is selected, the spark context is automatically initialised and can be accessed using the sc object.

    hashtag
    PySpark - Local

    The PySpark - Local runtime environment launches the spark driver locally on the workspace node and all spark executors are created locally on the same node. It does not require a spark cluster to run and can be used for running smaller spark applications which don’t exceed the capacity of a single node.

    The spark configuration can be found at /data/.spark/local/conf/spark-defaults.conf.

    circle-info

    Making changes to the configuration requires a restart of the Jupyter kernel.

    hashtag
    PySpark - Remote

    The PySpark – Remote runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.

    This configuration will not dynamically spin up executors, hence it will not trigger the cluster to auto scale when using a Dynamic Bench cluster.

    The spark configuration can be found at /data/.spark/remote/conf/spark-defaults.conf.

    circle-info

    Making changes to the configuration requires a restart of the Jupyter kernel.

    hashtag
    PySpark – Remote - Dynamic

    The PySpark – Remote - Dynamic runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.

    This configuration will increase/decrease the required executors which will result into a cluster that auto scales using a Dynamic Bench cluster

    The spark configuration can be found at /data/.spark/remote/conf-dynamic/spark-defaults.conf.

    circle-info

    Making changes to the configuration requires a restart of the Jupyter kernel.

    hashtag
    Job resources

    Every cluster member has a certain capacity depending on the selection of the Resource model for the member.

    A spark application consists of 1 or more jobs. Each Job consists out of one or more stages. Each stage consists out of one or more tasks. Task are handled by executors and executors are run on a worker (cluster member).

    The following setting define the amount of cpus needed per task

    The following settings define the size of a single executor which handles the execution of a task

    The above example allows an executor to handle 4 tasks concurrently and share a total capacity of 4Gb of memory. Depending on the resource model chosen (e.g. standard-2xlarge) a single cluster member (worker node) is able to run multiple executors concurrently (e.g. 32 cores, 128 Gb for 8 concurrent executors on a single cluster member)

    hashtag
    Spark User Interface

    The Spark UI can be accessed via the Cluster. The Web Access URL is displayed in the Workspace details page

    This Spark UI will register all applications submitted when using one of the Remote Jupyter kernels. It will provide an overview of the registered workers (Cluster members) and the applications running in the Spark cluster.

    hashtag
    Spark Reference documentation

    See the apachearrow-up-right website

    Local

    Bring Your Own Bench Image

    Bench images are Docker containers tailored to run in ICA with the necessary permissions, configuration and resources. For more information of Docker images, please refer to

    The following steps are needed to get your bench image running in ICA.

    hashtag
    Requirements

    You need to have Docker installed in order to build your images.

    nf-core Pipelines

    hashtag
    Introduction

    This tutorial shows you how to

    workspace-ctl data get-mounts
    workspace-ctl data create-mount --mount-path /data/mounts/mydata --source /data/project/mydata
    workspace-ctl data delete-mount --mount-path /data/mounts/mydata
    spark.task.cpus 1
    spark.executor.cores 4 
    spark.executor.memory 4g

    Include ephemeral storage

    Select this to create scratch space for your nodes. Enabling it will make the storage size selector appear. The stored data in this space is deleted when the instance is terminated. When you deselect this option, the storage size is 0.

    Storage size

    How much storage space (1GB - 16 TB) should be reserved per node as dedicated scratch space, available at /scratch

    Static and Dynamic
    machine
    cost
    spot instancesarrow-up-right
    Pricing
    For your Docker bench image to work in ICA, they must run on Linux X86 architecture, have the correct user id and initialization script in the Docker file.
    circle-info

    For easy reference, you can find examples of preconfigured Bench images on the Illumina websitearrow-up-right which you can copy to your local machine and edit to suit your needs.

    Bench-console provides an example to build a minimal image compatible with ICA Bench to run a SSH Daemon.

    Bench-web provides an example to build a minimal image compatible with ICA Bench to run a Web Daemon.

    Bench-rstudio provides an example to build a minimal image compatible with ICA Bench to run a rStudio Open Source.

    These examples come with information on the available parameters.

    hashtag
    Scripts

    The following scripts must be part of your Docker bench image. Please refer to the examples from the Illumina websitearrow-up-right for more details.

    hashtag
    Init Script (Dockerfile)

    This script copies the ica_start.sh file which takes care of the Initialization and termination of your workspace to the location in your project from where it can be started by ICA when you request to start your workspace.

    hashtag
    User (Dockerfile)

    The user settings must be set up so that bench runs with UID 1000.

    hashtag
    Shutdown Script (ica_start.sh)

    To do a clean shutdown, you can capture the sigterm which is transmitted 30 seconds before the workspace is terminated.

    hashtag
    Building a Bench Image

    Once you have Docker installed and completed the configuration of your Docker files, you can build your bench image.

    1. Open the command prompt on your machine.

    2. Navigate to the root folder of your Docker files.

    3. Execute docker build -f Dockerfile -t mybenchimage:0.0.1 . with mybenchimage being the name you want to give to your image and 0.0.1 replaced with the version number which you want your bench image to be. For more information on this command, see https://docs.docker.com/reference/cli/docker/buildx/build/arrow-up-right

    4. Once the image has been built, save it as docker tar file with the command docker save mybenchimage:0.0.1 | bzip2 > ../mybenchimage-0.0.1.tar.bz2 The resulting tar file will appear next to the root folder of your docker files.

    circle-info

    If you want to build on a mac with Apple Silicon, then the build command is docker buildx build --platform linux/amd64 -f Dockerfile -t mybenchimage:0.0.1 .

    hashtag
    Upload Your Docker Image to ICA

    1. Open ICA and log in.

    2. Go to Projects > your_project > Data.

    3. For small Docker images, upload the docker image file which you generated in the previous step. For large Docker images use the service connector to better performance and reliability to import the Docker image.

    4. Select the uploaded image file and perform Manage > Change Format.

    5. From the format list, select DOCKER and save the change.

    6. Go to System Settings > Docker Repository > Create > Image.

    7. Select the uploaded docker image and fill out the other details.

      • Name: The name by which your docker image will be seen in the list

      • Version: A version number to keep track of which version you have uploaded. In our example this was 0.0.1

    8. Once the settings are entered, select Save. The creation of the Docker image typically takes between 5 and 30 minutes. The status of your docker image will be partial during creation and available once completed.

    hashtag
    Start Your Bench Image

    1. Navigate to Projects > your_project > Bench > Workspaces.

    2. Create a new workspace with + Create Workspace or edit an existing workspace.

    3. Fill in the bench workspace details according to Workspaces.

    4. Save your changes.

    5. Select Start Workspace

    6. Wait for the workspace to be started and you can access it either via console or the GUI.

    hashtag
    Access Bench Image

    Once your bench image has been started, you can access it via console, web or both, depending on your configuration.

    • Web access (HTTP) is done from either Projects > your_project > Bench > Workspaces > your_Workspace > Access tab or from the link provided at provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.

    • Console access (SSH) is performed from your command prompt by going to the path provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.

    circle-info

    The password needed for SSH access is any one of your personal API keys

    hashtag
    Command-line Interface

    To execute the commands, your workspace needs a way to run them such as the inclusion of an SSH daemon, be it integrated into your web access image or into your console access. There is no need to download the workspace command-line interface, you can run it from within the workspace.

    hashtag
    Restrictions

    hashtag
    Root User

    • The bench image will be instantiated as a container which will be forcedly started as user with UID 1000 and GID 100.

    • You cannot elevate your permissions in a running workspace.

    circle-exclamation

    Do not run containers as root as this is bad security practice.

    hashtag
    Read-only Root Filesystem

    Only the following folders are writeable:

    • /data

    • /tmp

    All other folders are mounted as read-only.

    hashtag
    Network Access

    For inbound access, the following ports on the container are publicly exposed, depending on the selection made at startup.

    • Web: TCP/8888

    • Console: TCP/2222

    For outbound access, a workspace can be started in two modes:

    • Public: Access to public IP’s is allowed using TCP protocol.

    • Restricted: Access to list of URLs are allowed.

    hashtag
    Context

    hashtag
    Environment Variables

    At runtime, the following Bench-specific environment variables are made available to the workspace instantiated from the Bench image.

    Name
    Description
    Example Values

    ICA_WORKSPACE

    The unique identifier related to the started workspace. This value is bound to a workspace and will never change.

    32781195

    ICA_CONSOLE_ENABLED

    Whether Console access is enabled for this running workspace.

    true, false

    ICA_WEB_ENABLED

    Whether Web access is enabled for this running workspace.

    true, false

    ICA_SERVICE_ACCOUNT_USER_API_KEY

    An API key that allows interaction with ICA using the ICA CLI and is bound to the permissions defined at startup of the workspace.

    hashtag
    Configuration Files

    Following files and folders will be provided to the workspace and made accessible for reading at runtime.

    Name
    Description

    /etc/workspace-auth

    Contains the SSH rsa public/private keypair which is required to be used to run the workspace SSHD.

    hashtag
    Software Files

    At runtime, ICA-related software will automatically be made available at /data/.software in read-only mode.

    New versions of ICA software will be made available after a restart of your workspace.

    hashtag
    Important Folders

    Name
    Description

    /data

    This folder contains all data specific to your workspace.

    Data in this folder is not persisted in your project and will be removed at deletion of the workspace.

    /data/project

    This folder contains all your project data.

    /data/.software

    This folder contains ICA-related software.

    hashtag
    Bench Lifecycle

    hashtag
    Workspace Lifecycle

    When a bench workspace is instantiated from your selected bench image, the following script is invoked: /usr/local/bin/ica_start.sh

    circle-info

    This script needs to be available and executable otherwise your workspace will not boot.

    This script is the main process in your running workspace and cannot run to completion as it will stop the workspace and instantiate a restart (see init script).

    This script can be used to invoke other scripts.

    When you stop a workspace, a TERM signal is sent to the main process in your bench workspace. You can trap this signal to handle the stop gracefully (see shutdown script) and shut down child processes of the main process. The workspace will be forcedly shut down after 30 seconds if your main process hasn’t stopped within the given period.

    hashtag
    Troubleshooting

    hashtag
    Build Argument

    If you get the error "docker buildx build" requires exactly 1 argument when trying to build your docker image, then a possible cause is missing the last . of the command.

    hashtag
    Server Connection Error

    When you stop the workspace when users are still actively using it, they will receive a message showing a Server Connection Error.

    https://docs.docker.com/reference/dockerfile/arrow-up-right
    Bring Your Own Bench Image Steps
    Run the pipeline in Bench.
    • monitor the execution

  • Deploy pipeline as an ICA Flow pipeline.

  • Launch Flow validation test from Bench.

  • hashtag
    Preparation

    • Start Bench workspace

      • For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

        • If using a cluster, choose standard-small or standard-medium for the workspace master node

        • Otherwise, choose at least standard-large as nf-core pipelines often need more than 4 cores to run.

      • Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

      • Specify at least 100GB of disk space

    • Optional: After choosing the image, enable a cluster with at least this one standard-largeinstance type

    • Start the workspace, then (if applicable) start the cluster

    hashtag
    Import nf-core Pipeline to Bench

    If conda and/or nextflow are not installed, pipeline-dev will offer to install them.

    The Nextflow files are pulled into the nextflow-src subfolder.

    circle-info

    A larger example that still runs quickly is nf-core/sarek

    hashtag
    Result

    hashtag
    Run Validation Test in Bench

    All nf-core pipelines conveniently define a "test" profile that specifies a set of validation inputs for the pipeline.

    The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance.

    circle-info

    The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.

    hashtag
    Result

    hashtag
    Monitoring

    When a pipeline is running locally (i.e. not on a Bench cluster), you can monitor the task execution from another terminal with docker ps

    When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

    • qstat to see the tasks being pending or running

    • tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

    hashtag
    Data Locations

    • The output of the pipeline is in the outdir folder

    • Nextflow work files are under the work folder

    • Log files are .nextflow.log* and output.log

    hashtag
    Deploy as Flow Pipeline

    After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here). It then asks if you want to update the latest version or create a new one.

    Choose "3" and enter a name of your choice to avoid conflicts with other users following this same tutorial.

    At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

    hashtag
    Run Validation Test in Flow

    This launches an analysis in ICA Flow, using the same inputs as the nf-core pipeline's "test" profile.

    Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder bench-pipeline-dev/temp-data.

    hashtag
    Hints

    chevron-rightUsing older versions of Nextflowhashtag

    Some older nf-core flows are still using DSL1, which is only working up to Nextflow 22.

    An easy solution is to create a conda environment for nextflow 22:

    Import any nf-core pipeline from their public repository.

    Pipeline Development in Bench (Experimental)

    hashtag
    Introduction

    The Pipeline Development Kit in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in /data/.software (regardless of which Bench image is selected) and provides the following features:

    # Init script invoked at start of a bench workspace
    COPY --chmod=0755 --chown=root:root ${FILES_BASE}/ica_start.sh /usr/local/bin/ica_start.sh
    # Bench workspaces need to run as user with uid 1000 and be part of group with gid 100
    RUN adduser -H -D -s /bin/bash -h ${HOME} -u 1000 -G users ica
    # Terminate function
    function terminate() {
            # Send SIGTERM to child processes
            kill -SIGTERM $(jobs -p)
    
            # Send SIGTERM to waitpid
            echo "Stopping ..."
            kill -SIGTERM ${WAITPID}
    }
    
    # Catch SIGTERM signal and execute terminate function.
    # A workspace will be informed 30s before forcefully being shutdown.
    trap terminate SIGTERM
    
    # Hold init process until TERM signal is received
    tail -f /dev/null &
    WAITPID=$!
    wait $WAITPID
    conda create -n nextflow22
     
    # If, like me, you never ran "conda init", do it now:
    conda init
    bash -l # To load the conda's bashrc changes
     
    conda activate nextflow22
    conda install -y nextflow=22
     
    # Check
    nextflow -version
     
    # Then use the pipeline-dev tools as in the demo
    mkdir demo
    cd demo
    pipeline-dev import-from-nextflow nf-core/demo
    /data/demo $ pipeline-dev import-from-nextflow nf-core/demo
    
    Creating output folder nf-core/demo
    Fetching project nf-core/demo
    
    Fetching project info
    project name: nf-core/demo
    repository  : https://github.com/nf-core/demo
    local path  : /data/.nextflow/assets/nf-core/demo
    main script : main.nf
    description : An nf-core demo pipeline
    author      : Christopher Hakkaart
    
    Pipeline “nf-core/demo” successfully imported into nf-core/demo.
    
    Suggested actions:
      cd nf-core/demo
      pipeline-dev run-in-bench
      [ Iterative dev: Make code changes + re-validate with previous command ]
      pipeline-dev deploy-as-flow-pipeline
      pipeline-dev launch-validation-in-flow
    cd nf-core/demo
    pipeline-dev run-in-bench
    pipeline-dev deploy-as-flow-pipeline
    Choice: 3
    Creating ICA Flow pipeline dev-nf-core-demo_v4
    Sending inputForm.json
    Sending onRender.js
    Sending main.nf
    Sending nextflow.config
    pipeline-dev launch-validation-in-flow
    Description: Provide a description explaining what your docker images does or is suited for.
  • Type: The type of this image is Bench. The Tool type is reserved for tool images.

  • Cluster compatible: Indicates if this docker images is suited for cluster computing.

  • Access: This setting must match the available access options of your Docker image. You can choose web access (HTTP), console access (SSH) or both. What is selected here becomes available on the + New Workspace screen. Enabling an option here which your Docker image does not support, will result in access denied errors when trying to run the workspace.

  • Regions: If your tenant has access to multiple regions, you can select to which regions to replicate the docker image.

  • ICA_BENCH_URL

    The host part of the public URL which provides access to the running workspace.

    use1-bench.platform.illumina.com

    ICA_PROJECT_UUID

    The unique identifier related to the ICA project in which the workspace was started.

    ICA_URL

    The ICA Endpoint URL.

    https://ica.illumina.com/icaarrow-up-right

    HTTP_PROXY

    HTTPS_PROXY

    The proxy endpoint in case the workspace was started in restricted mode.

    HOME

    The home folder.

    /data

    Import to Bench
    • From public nf-core pipelines

    • From existing ICA Flow Nextflow pipelines

  • Run in Bench

  • Modify and re-run in Bench, providing fast development iterations

  • Deploy to Flow

  • Launch validation in Flow

  • hashtag
    Prerequisites

    • Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.

    • The pipeline development tools require

      • Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh is present in the image.

      • Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)

      • git (automatically installed using conda)

      • jq, curl (which should be made available in the image)

    hashtag
    NextFlow Requirements / Best Practices

    Pipeline development tools work best when the following items are defined:

    • Nextflow profiles:

      • test profile, specifying inputs appropriate for a validation run

      • docker profile, instructing NextFlow to use Docker

    • nextflow_schema.json, as described . This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.

    ICA Flow adds one additional constraint. The output directory out is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The -outdir parameter recommended by nf-core should therefore be set to--outdir=out when running as a Flow pipeline.

    hashtag
    Pipeline Development Tools

    circle-info

    New Bench pipeline development tools only become active after a workspace reboot.

    These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.

    Pipeline-dev fulfils a number of roles:

    • Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.

    • Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.

    • Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>.

    • Launches the appropriate sub-tool.

    • Prints out errors with backtrace, to help report issues.


    hashtag
    Usage

    hashtag
    1) Starting a new Project

    A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.

    circle-exclamation

    If you start a project manually, you must follow the same folder structure.

    • Project base folder

      • nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.

        • main.nf

        • nextflow.config

        • nextflow_schema.json

      • pipeline-dev.project-info: contains project name, description, etc.

      • nextflow-bench.config (automatically generated when needed): contains definitions for bench.

      • ica-flow-config: Directory of files used when deploying pipeline to Flow.

        • inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.

        • onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.

    hashtag
    Pipeline Sources

    The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the nextflow_schema.json. Tutorial Pipeline from Scratch goes into more details about this use case.

    A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.

    Tutorial Nf Core Pipelines goes into more details about this use case.

    A directory called imported-flow-analysis is created and the analysis+pipeline assets are downloaded.

    Tutorial goes into more details about this use case.

    circle-info

    Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.


    hashtag
    2) Running in Bench

    Optional parameters --local / --sge can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.

    The script then launches nextflow. The full nextflow command line is printed and launched.

    In case of errors, full logs are saved as .pipeline-dev.log

    circle-info

    Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.

    hashtag
    Output Example

    Nextflow output

    hashtag
    Container (Docker) images

    Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the nextflow --profile docker.

    In NextFlow, Docker images can be specified at the process level

    • This is done with the container "<image_name:version>" directive, which can be specified

      • in nextflow config files (preferred method when following the nf-core best practices)

      • or at the start of each process definition.

    • Each process can use a different docker image

    • It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

    Resources such as #cpu and memory can be specified as described herearrow-up-right See containersarrow-up-right or our tutorials for details about Nextflow-Docker syntax.

    Bench can push/pull/create/modify Docker images, as described in Containers.


    hashtag
    3) Deploying to ICA Flow

    This command does the following:

    1. Generate the JSON file describing the ICA Flow user interface.

      • If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .

    2. Generate the JSON file containing the validation launch inputs.

      • If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.

      • If local files are used as validation inputs or as default input values:

    3. Identify the pipeline name to use for this new pipeline deployment:

      • If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.

      • Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .

    4. New ICA Flow pipeline gets created (except in case of pipeline update) .

      • The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow

    5. nextflow-src folder is uploaded file by file as pipeline assets.

    Output Example:

    The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

    Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:


    hashtag
    4) Launching Validation in Flow

    The ica-flow-config/launchPayload_inputFormValues.json file generated in the previous step is submitted to ICA Flow to start an analysis with the same validation inputs as “nextflow --profile test”.

    Output Example:

    launch-validation-in-flow

    The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.


    hashtag
    Tutorials

    • Creating a Pipeline from Scratch

    • nf-core Pipelines

    • Updating an Existing Flow Pipeline

    Creating a Pipeline from Scratch

    hashtag
    Introduction

    This tutorial shows you how to start a new pipeline from scratch

    $ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>
    $ pipeline-dev import-from-flow [--analysis-id=…] 
    $ pipeline-dev run-in-bench [--local|--sge] 
    $ pipeline-dev deploy-as-flow-pipeline [--create|--update] 
    $ pipeline-dev launch-validation-in-flow 

    launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.

    copy them to /data/project/pipeline-dev-files/temp .
  • get their ICA file ids.

  • use these file ids in the launch specifications.

  • If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.

  • Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.

    herearrow-up-right
    Updating an Existing Flow Pipeline

    wrap in Nextflow

  • wrap the pipeline in Bench

  • deploy pipeline as an ICA Flow pipeline

  • launch Flow validation test from Bench


  • hashtag
    Preparation

    Start Bench workspace

    • For this tutorial, any instance size will work, even the smallest standard-small.

    • Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.

    • A small amount of disk space (10GB) will be enough.

    We are going to wrap the "gzip" linux compression tool with inputs:

    • 1 file

    • compression level: integer between 1 and 9

    circle-info

    We intentionally do not include sanity checks, to keep this scenario simple.

    hashtag
    Creation of test file:


    hashtag
    Wrapping in Nextflow

    Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:

    hashtag
    nextflow-src/main.nf

    Save this file as nextflow-src/main.nf, and check that it works:

    hashtag
    Result


    hashtag
    Wrap the Pipeline in Bench

    We now need to:

    • Use Docker

    • Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools

    hashtag
    Using Docker:

    In NextFlow, Docker images can be specified at the process level

    • Each process may use a different docker image

    • It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

    Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified

    • at the start of each process definition

    • or in nextflow config files (preferred when following nf-core guidelines)

    For example, create nextflow-src/nextflow.config:

    We can now run with nextflow's -with-docker option:

    Following some nf-core best practices arrow-up-rightto make our source+test compatible with the pipeline-dev tools:

    hashtag
    Create NextFlow “test” profile

    Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:

    hashtag
    nextflow-src/nextflow.config

    With this profile defined, we can now run the same test as before with this command:

    hashtag
    Create NextFlow “docker” profile

    A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:

    hashtag
    nextflow-src/nextflow.config

    We can now run the same test as before with this command:

    We also have enough structure in place to start using the pipeline-dev command:

    In order to deploy our pipeline to ICA, we need to generate the user interface input form.

    This is done by using nf-core's recommended nextflow_schema.json.

    For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):

    hashtag
    nextflow-src/nextflow_schema.json

    In the next step, this gets converted to the ica-flow-config/inputForm.json file.

    circle-info

    Note: For large pipelines, as described on the nf-core websitearrow-up-right

    Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

    We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.


    hashtag
    Deploy as a Flow Pipeline

    We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:

    hashtag
    pipeline-dev.project_info

    We can now run:

    After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).

    It then asks if we want to update the latest version or create a new one.

    Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.

    At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.


    hashtag
    Run Validation Test in Flow

    This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

    Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

    hashtag
    Result

    prepare linux tool + validation inputs

    Updating an Existing Flow Pipeline

    hashtag
    Introduction

    This tutorial shows you how to

    • with a supporting validation analysis

    mkdir demo_gzip
    cd demo_gzip
    echo test > test_input.txt
    mkdir nextflow-src
    # Create nextflow-src/main.nf using contents below
    vi nextflow-src/main.nf
    nextflow.enable.dsl=2
     
    process COMPRESS {
      input:
        path input_file
        val compression_level
     
      output:
        path "${input_file.simpleName}.gz" // .simpleName keeps just the filename
        publishDir 'out', mode: 'symlink'
     
      script:
        """
        gzip -c -${compression_level} ${input_file} > ${input_file.simpleName}.gz
        """
    }
     
    workflow {
        input_path = file(params.input_file)
        gzip_out = COMPRESS(input_path, params.compression_level)
    }
    nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5
    process.container = 'ubuntu:latest'
    nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5 -with-docker
    process.container = 'ubuntu:latest'
     
    profiles {
      test {
        params {
          input_file = 'test_input.txt'
          compression_level = 5
        }
      }
    }
    nextflow run nextflow-src/ -profile test -with-docker
    process.container = 'ubuntu:latest'
     
    profiles {
      test {
        params {
          input_file = 'test_input.txt'
          compression_level = 5
        }
      }
     
      docker {
        docker.enabled = true
      }
    }
    nextflow run nextflow-src/ -profile test,docker
    pipeline-dev run-in-bench
    {
        "$defs": {
            "input_output_options": {
                "title": "Input/output options",
                "properties": {
                    "input_file": {
                        "description": "Input file to compress",
                        "help_text": "The file that will get compressed",
                        "type": "string",
                        "format": "file-path"
                    },
                    "compression_level": {
                        "type": "integer",
                        "description": "Compression level to use (1-9)",
                        "default": 5,
                        "minimum": 1,
                        "maximum": 9
                   }
                }
            }
        }
    }
    $ pipeline-dev project-info --init
     
    pipeline-dev.project-info not found. Let's create it with 2 questions:
     
    Please enter your project name: demo_gzip
    Please enter a project description: Bench gzip demo
    pipeline-dev deploy-as-flow-pipeline
    pipeline-dev launch-validation-in-flow
    /data/demo $ pipeline-dev launch-validation-in-flow
    
    pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
    Getting Analysis Storage Id
    Launching as ICA Flow Analysis...
    ICA Analysis created:
    - Name: Test demo_gzip
    - Id: 17106efc-7884-4121-a66d-b551a782b620
    - Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620

    run the pipeline in Bench

    • monitor the execution

  • Iterative development: modify pipeline code and validate in Bench

    • Modify nextflow code

    • Modify Docker image contents (Dockerfile or Interactive method)

  • redeploy pipeline to ICA Flow

  • launch Flow validation test from Bench

  • hashtag
    Preparation

    Make sure you have access in ICA Flow to:

    • the pipeline you want to work with

    • an analysis exercising this pipeline, preferably with a short execution time, to use as validation test

    hashtag
    Start Bench Workspace

    For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

    • When using a cluster, choose standard-small or standard-medium for the workspace master node

    • Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.

    • Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

    • Specify at least 100GB of disk space

    • Optional: After choosing the image, enable a cluster with at least one standard-large instance type.

    • Start the workspace, then (if applicable) also start the cluster

    hashtag
    Import Existing Pipeline and Analysis to Bench

    The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).

    If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.

    • If conda and/or nextflow are not installed, pipeline-dev will offer to install them.

    • A folder called imported-flow-analysis is created.

    • Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.

    • Pipeline input form and associated javascript are downloaded into the ica-flow-config sub-folder.

    • Analysis input specs are downloaded to the ica-flow-config/launchPayload_inputFormValues.json file.

    • The analysis inputs are converted into a "test" profile for Nextflow, stored - among other items - in nextflow_bench.conf

    hashtag
    Results

    hashtag
    Run Validation Test in Bench

    The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:

    circle-info

    The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.

    hashtag
    Monitoring

    When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

    • qstat to see the tasks being pending or running

    • tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

    hashtag
    Data Locations

    • The output of the pipeline is in the outdir folder

    • Nextflow work files are under the work folder

    • Log files are .nextflow.log* and output.log

    hashtag
    Modify Pipeline

    Nextflow files (located in the nextflow-src folder) are easy to modify. Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.

    After modifying the source code, you can run a validation iteration with the same command as before:

    hashtag
    Identify Docker Image

    Modifying the Docker image is the next step.

    Nextflow (and ICA) allow the Docker images to be specified at different places:

    • in config files such as nextflow-src/nextflow.config

    • in nextflow code files:

    grep container may help locate the correct files:

    hashtag
    Docker Image Update: Dockerfile Method

    Use case: Update some of the software (mimalloc) by compiling a new version

    With the appropriate permissions, you can then "docker login" and "docker push" the new image.

    hashtag
    Docker Image Update: Interactive Method

    With the appropriate permissions, you can then "docker login" and "docker push" the new image.

    circle-info

    Fun fact: VScode with the "Dev Containers" extension lets you edit the files inside your running container:

    circle-exclamation

    Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...

    Update the nextflow code and/or configs to use the new image

    Validate your changes in Bench:

    hashtag
    Deploy as Flow Pipeline

    After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).

    It then asks if we want to update the latest version or create a new one.

    At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

    hashtag
    Result

    hashtag
    Run Validation Test in Flow

    This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

    Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

    hashtag
    Result

    import an existing ICA Flow pipeline
    mkdir demo-flow-dev
    cd demo-flow-dev
     
    pipeline-dev import-from-flow
     or
    pipeline-dev import-from-flow --analysis-id=9415d7ff-1757-4e74-97d1-86b47b29fb8f
    Enter the number of the entry you want to use: 21
    Fetching analysis 9415d7ff-1757-4e74-97d1-86b47b29fb8f ...
    Fetching pipeline bb47d612-5906-4d5a-922e-541262c966df ...
    Fetching pipeline files... main.nf
    Fetching test inputs
    New Json inputs detected
    Resolving test input ids to /data/mounts/project paths
    Fetching input form..
    Pipeline "GWAS pipeline_1.
    _2_1_20241215_130117" successfully imported.
    pipeline name: GWAS pipeline_1_2_1_20241215_130117 
    analysis name: Test GWAS pipeline_1_2_1_20241215_130117 
    pipeline id : bb47d612-5906-4d5a-922e-541262c966df
    analysis id : 9415d7ff-1757-4e74-97d1-86b47b29fb8f
    Suggested actions:
    pipeline-dev run-in-bench 
    I Iterative dev: Make code changes + re-validate with previous command ] 
    pipeline-dev deploy-as-flow-pipeline
    pipeline-dev run-in-flow
    cd imported-flow-analysis
    pipeline-dev run-in-bench
    /data/demo $ tail /data/logs/sge-scaler.log.*
    2025-02-10 18:27:19,657 - SGEScaler - INFO: SGE Marked Overview - {'UNKNOWN': O, 'DEAD': O, 'IDLE': O, 'DISABLED': O, 'DELETED': O, 'UNRESPONSIVE': 0}
    2025-02-10 18:27:19,657 - SGEScaler - INFO: Job Status - Active jobs : 0, Pending jobs : 6
    2025-02-10 18:27:26,291 - SGEScaler - INFO: Cluster Status - State: Transitioning,
    Online Members: 0, Offline Members: 2, Requested Members: 2, Min Members: 0, Max Members: 2
    code nextflow-src # Open in Visual Studio Code
    code .            # Open current dir in Visual Studio Code
    vi nextflow-src/main.nf
    pipeline-dev run-in-bench
    /data/demo-flow-dev $ head nextflow-src/main.nf
    nextflow.enable.dsl = 2
    process top_level_process t
    container 'docker.io/ljanin/gwas-pipeline:1.2.1'
    IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
    IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:tmpdemo
     
    # Create directory for Dockerfile
    mkdir dirForDockerfile
    cd dirForDockerfile
    
    # Create Dockerfile
    cat <<EOF > Dockerfile
    FROM ${IMAGE_BEFORE}
    RUN mkdir /mimalloc-compile \
     && cd /mimalloc-compile \
     && git clone -b v2.0.6 https://github.com/microsoft/mimalloc \
     && mkdir -p mimalloc/out/release \
     && cd mimalloc/out/release \
     && cmake ../.. \
     && make \
     && make install \
     && cd / \
     && rm -rf mimalloc-compile
    EOF
    
    # Build image
    docker build -t ${IMAGE_AFTER} .
    IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
    IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:1.2.2
    docker run -it --rm ${IMAGE_BEFORE} bash
     
    # Make some modifications
    vi /scripts/plot_manhattan.py
    <Fix "manhatten.png" into "manhattAn.png">
    <Enter :wq to save and quit vi>
    <Start another terminal (try Ctrl+Shift+T if using wezterm)>
    # Identify container id
    # Save container changes into new image layer
    CONTAINER_ID=c18670335247
    docker commit ${CONTAINER_ID} ${IMAGE_AFTER}
    sed --in-place "s/${IMAGE_BEFORE}/${IMAGE_AFTER}/" nextflow-src/main.nf
    pipeline-dev run-in-bench
    pipeline-dev deploy-as-flow-pipeline
    Choice: 2
    Creating ICA Flow pipeline dev-nf-core-demo_v4
    Sending inputForm.json
    Sending onRender.js
    Sending main.nf
    Sending nextflow.config
    /data/demo $ pipeline-dev deploy-as-flow-pipeline
    
    Generating ICA input specs...
    Extracting nf-core test inputs...
    Deploying project nf-core/demo
    - Currently being developed as: dev-nf-core-demo
    - Last version updated in ICA:  dev-nf-core-demo_v3
    - Next suggested version:       dev-nf-core-demo_v4
    
    How would you like to deploy?
    1. Update dev-nf-core-demo (current version)
    2. Create dev-nf-core-demo_v4
    3. Enter new name
    4. Update dev-nf-core-demo_v3 (latest version updated in ICA)
    Sending docs/images/nf-core-demo-subway.svg
    Sending docs/images/nf-core-demo_logo_dark.png
    Sending docs/images/nf-core-demo_logo_light.png
    Sending docs/images/nf-core-demo-subway.png
    Sending docs/README. md
    Sending docs/output.md
    
    Pipeline successfully deployed
    - Id : 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
    - URL: https://stage.v2.stratus.illumina.com/ica/projects/1873043/pipelines/26bc5aa5-0218-4e79-8a63-ee92954C6cd9
    
    Suggested actions:
      pipeline-dev run-in-flow
    pipeline-dev launch-validation-in-flow
    /data/demo $ pipeline-dev launch-validation-in-flow
    
    pipelineld: 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
    Getting Analysis Storage Id
    Launching as ICA Flow Analysis...
    
    ICA Analysis created:
    - Name: Test dev-nf-core-demo_v4
    - Id:   cadcee73-d975-435d-b321-5d60e9aec1ec
    - Url:   https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/cadcee73-d975-435d-b321-5d60e9aec1ec

    Bench Command Line Interface

    hashtag
    Command Index

    The following is a list of available bench CLI commands and thier options.

    Please refer to the examples from the Illumina websitearrow-up-right for more details.

    hashtag
    workspace-ctl

    hashtag
    workspace-ctl completion

    hashtag
    workspace-ctl compute

    workspace-ctl compute get-cluster-details

    workspace-ctl compute get-logs

    workspace-ctl compute get-pools

    workspace-ctl compute scale-pool

    hashtag
    workspace-ctl data

    workspace-ctl data create-mount

    workspace-ctl data delete-mount

    workspace-ctl data get-mounts

    hashtag
    workspace-ctl help

    workspace-ctl help completion

    workspace-ctl help compute

    workspace-ctl help compute get-cluster-details

    workspace-ctl help compute get-logs

    workspace-ctl help compute get-pools

    workspace-ctl help compute scale-pool

    workspace-ctl help data

    workspace-ctl help data create-mount

    workspace-ctl help data delete-mount

    workspace-ctl help data get-mounts

    workspace-ctl help help

    workspace-ctl help software

    workspace-ctl help software get-server-metadata

    workspace-ctl help software get-software-settings

    workspace-ctl help workspace

    workspace-ctl help workspace get-cluster-settings

    workspace-ctl help workspace get-connection-details

    workspace-ctl help workspace get-workspace-settings

    hashtag
    workspace-ctl software

    workspace-ctl software get-server-metadata

    workspace-ctl software get-software-settings

    hashtag
    workspace-ctl workspace

    workspace-ctl workspace get-cluster-settings

    workspace-ctl workspace get-connection-details

    workspace-ctl workspace get-workspace-settings

    Usage:
      workspace-ctl [flags]
      workspace-ctl [command]
    
    Available Commands:
      completion  Generate completion script
      compute     
      data        
      help        Help about any command
      software    
      workspace   
    
    Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
      -h, --help               help for workspace-ctl
          --help-tree          
          --help-verbose       
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl [command] --help" for more information about a command.
    cmd execute error:  accepts 1 arg(s), received 0
    Usage:
      workspace-ctl compute [flags]
      workspace-ctl compute [command]
    
    Available Commands:
      get-cluster-details 
      get-logs            
      get-pools           
      scale-pool          
    
    Flags:
      -h, --help           help for compute
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl compute [command] --help" for more information about a command.
    Usage:
      workspace-ctl compute get-cluster-details [flags]
    
    Flags:
      -h, --help           help for get-cluster-details
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl compute get-logs [flags]
    
    Flags:
      -h, --help           help for get-logs
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl compute get-pools [flags]
    
    Flags:
          --cluster-id string   Required. Cluster ID
      -h, --help                help for get-pools
          --help-tree           
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl compute scale-pool [flags]
    
    Flags:
          --cluster-id string       Required. Cluster ID
      -h, --help                    help for scale-pool
          --help-tree               
          --help-verbose            
          --pool-id string          Required. Pool ID
          --pool-member-count int   Required. New pool size
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl data [flags]
      workspace-ctl data [command]
    
    Available Commands:
      create-mount Create a data mount under /data/mounts. Return newly created mount.
      delete-mount Delete a data mount
      get-mounts   Returns the list of data mounts
    
    Flags:
      -h, --help           help for data
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl data [command] --help" for more information about a command.
    Create a data mount under /data/mounts. Return newly created mount.
    
    Usage:
      workspace-ctl data create-mount [flags]
    
    Aliases:
      create-mount, mount
    
    Flags:
      -h, --help                help for create-mount
          --help-tree           Display commands as a tree
          --help-verbose        Extended help topics and options
          --mode string         Enum:["read-only","read-write"]. Mount mode i.e. read-only, read-write
          --mount-path string   Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
          --source string       Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
          --wait                Wait for new mount to be available on all nodes before sending response
          --wait-timeout int    Max number of seconds for wait option. Absolute max: 300 (default 300)
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Delete a data mount
    
    Usage:
      workspace-ctl data delete-mount [flags]
    
    Aliases:
      delete-mount, unmount
    
    Flags:
      -h, --help                help for delete-mount
          --help-tree           
          --help-verbose        
          --id string           Id of mount to remove
          --mount-path string   Path of mount to remove
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Returns the list of data mounts
    
    Usage:
      workspace-ctl data get-mounts [flags]
    
    Aliases:
      get-mounts, list-mounts
    
    Flags:
      -h, --help           help for get-mounts
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl [flags]
      workspace-ctl [command]
    
    Available Commands:
      completion  Generate completion script
      compute     
      data        
      help        Help about any command
      software    
      workspace   
    
    Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
      -h, --help               help for workspace-ctl
          --help-tree          
          --help-verbose       
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl [command] --help" for more information about a command.
    To load completions:
    
    Bash:
    
      $ source <(yourprogram completion bash)
    
      # To load completions for each session, execute once:
      # Linux:
      $ yourprogram completion bash > /etc/bash_completion.d/yourprogram
      # macOS:
      $ yourprogram completion bash > /usr/local/etc/bash_completion.d/yourprogram
    
    Zsh:
    
      # If shell completion is not already enabled in your environment,
      # you will need to enable it.  You can execute the following once:
    
      $ echo "autoload -U compinit; compinit" >> ~/.zshrc
    
      # To load completions for each session, execute once:
      $ yourprogram completion zsh > "${fpath[1]}/_yourprogram"
    
      # You will need to start a new shell for this setup to take effect.
    
    fish:
    
      $ yourprogram completion fish | source
    
      # To load completions for each session, execute once:
      $ yourprogram completion fish > ~/.config/fish/completions/yourprogram.fish
    
    PowerShell:
    
      PS> yourprogram completion powershell | Out-String | Invoke-Expression
    
      # To load completions for every new session, run:
      PS> yourprogram completion powershell > yourprogram.ps1
      # and source this file from your PowerShell profile.
    
    Usage:
      workspace-ctl completion [bash|zsh|fish|powershell]
    
    Flags:
      -h, --help   help for completion
    Usage:
      workspace-ctl compute [flags]
      workspace-ctl compute [command]
    
    Available Commands:
      get-cluster-details 
      get-logs            
      get-pools           
      scale-pool          
    
    Flags:
      -h, --help           help for compute
          --help-tree      
          --help-verbose
    
    Use "workspace-ctl compute [command] --help" for more information about a command.
    Usage:
      workspace-ctl compute get-cluster-details [flags]
    
    Flags:
      -h, --help           help for get-cluster-details
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl compute get-logs [flags]
    
    Flags:
      -h, --help           help for get-logs
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl compute get-pools [flags]
    
    Flags:
          --cluster-id string   Required. Cluster ID
      -h, --help                help for get-pools
          --help-tree           
          --help-verbose
    Usage:
      workspace-ctl compute scale-pool [flags]
    
    Flags:
          --cluster-id string       Required. Cluster ID
      -h, --help                    help for scale-pool
          --help-tree               
          --help-verbose            
          --pool-id string          Required. Pool ID
          --pool-member-count int   Required. New pool size
    Usage:
      workspace-ctl data [flags]
      workspace-ctl data [command]
    
    Available Commands:
      create-mount Create a data mount under /data/mounts. Return newly created mount.
      delete-mount Delete a data mount
      get-mounts   Returns the list of data mounts
    
    Flags:
      -h, --help           help for data
          --help-tree      
          --help-verbose
    
    Use "workspace-ctl data [command] --help" for more information about a command.
    Create a data mount under /data/mounts. Return newly created mount.
    
    Usage:
      workspace-ctl data create-mount [flags]
    
    Aliases:
      create-mount, mount
    
    Flags:
      -h, --help                help for create-mount
          --help-tree           
          --help-verbose        
          --mount-path string   Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
          --source string       Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
          --wait                Wait for new mount to be available on all nodes before sending response
          --wait-timeout int    Max number of seconds for wait option. Absolute max: 300 (default 300)
    Delete a data mount
    
    Usage:
      workspace-ctl data delete-mount [flags]
    
    Aliases:
      delete-mount, unmount
    
    Flags:
      -h, --help                help for delete-mount
          --help-tree           
          --help-verbose        
          --id string           Id of mount to remove
          --mount-path string   Path of mount to remove
    Returns the list of data mounts
    
    Usage:
      workspace-ctl data get-mounts [flags]
    
    Aliases:
      get-mounts, list-mounts
    
    Flags:
      -h, --help           help for get-mounts
          --help-tree      
          --help-verbose
    Help provides help for any command in the application.
    Simply type workspace-ctl help [path to command] for full details.
    
    Usage:
      workspace-ctl help [command] [flags]
    
    Flags:
      -h, --help   help for help
    Usage:
      workspace-ctl software [flags]
      workspace-ctl software [command]
    
    Available Commands:
      get-server-metadata   
      get-software-settings 
    
    Flags:
      -h, --help           help for software
          --help-tree      
          --help-verbose
    
    Use "workspace-ctl software [command] --help" for more information about a command.
    Usage:
      workspace-ctl software get-server-metadata [flags]
    
    Flags:
      -h, --help           help for get-server-metadata
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl software get-software-settings [flags]
    
    Flags:
      -h, --help           help for get-software-settings
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl workspace [flags]
      workspace-ctl workspace [command]
    
    Available Commands:
      get-cluster-settings   
      get-connection-details 
      get-workspace-settings 
    
    Flags:
      -h, --help           help for workspace
          --help-tree      
          --help-verbose
    
    Use "workspace-ctl workspace [command] --help" for more information about a command.
    Usage:
      workspace-ctl workspace get-cluster-settings [flags]
    
    Flags:
      -h, --help           help for get-cluster-settings
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl workspace get-connection-details [flags]
    
    Flags:
      -h, --help           help for get-connection-details
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl workspace get-workspace-settings [flags]
    
    Flags:
      -h, --help           help for get-workspace-settings
          --help-tree      
          --help-verbose
    Usage:
      workspace-ctl software [flags]
      workspace-ctl software [command]
    
    Available Commands:
      get-server-metadata   
      get-software-settings 
    
    Flags:
      -h, --help           help for software
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl software [command] --help" for more information about a command.
    Usage:
      workspace-ctl software get-server-metadata [flags]
    
    Flags:
      -h, --help           help for get-server-metadata
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl software get-software-settings [flags]
    
    Flags:
      -h, --help           help for get-software-settings
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl workspace [flags]
      workspace-ctl workspace [command]
    
    Available Commands:
      get-cluster-settings   
      get-connection-details 
      get-workspace-settings 
    
    Flags:
      -h, --help           help for workspace
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    
    Use "workspace-ctl workspace [command] --help" for more information about a command.
    Usage:
      workspace-ctl workspace get-cluster-settings [flags]
    
    Flags:
      -h, --help           help for get-cluster-settings
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl workspace get-connection-details [flags]
    
    Flags:
      -h, --help           help for get-connection-details
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")
    Usage:
      workspace-ctl workspace get-workspace-settings [flags]
    
    Flags:
      -h, --help           help for get-workspace-settings
          --help-tree      
          --help-verbose
    
    Global Flags:
          --X-API-Key string   
          --base-path string   For example: / (default "/")
          --config string      config file path
          --debug              output debug logs
          --dry-run            do not send the request to server
          --hostname string    hostname of the service (default "api:8080")
          --print-curl         print curl equivalent do not send the request to server
          --scheme string      Choose from: [http] (default "http")