Once a cluster is started, the cluster manager can be accessed from the workspace node.
Every cluster member has a certain capacity which is determined by the selected model for the cluster member.
The following complex values have been added to the SGE cluster environment and are requestable.
static_cores (default: 1)
static_mem (default: 2G)
These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.
To ensure stability of the system, some headroom is deducted from the total node capacity.
These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.
Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.
Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.
The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log
Submitting a single job
Submitting a job array
Do not limit the job concurrency amount as this will result in unused cluster members.
Listing all members of the cluster
listing all jobs in the cluster
Showing the details of a job.
Deleting a job.
Showing the details of an executed job.
SGE command line options and configuration details can be found .
qsub -l static_mem=1G -l static_cores=1 /data/myscript.shqsub -l static_mem=1G -l static_cores=1 -t 1-100 /data/myscript.shqhostqstat -fqstat -f -j <jobId>qdel <jobId>qacct -j <jobId>Running a Spark application in a Bench Spark Cluster
The JupyterLab environment is by default configured with 3 additional kernels
PySpark – Local
PySpark –
PySpark –
When one of the above kernels is selected, the spark context is automatically initialised and can be accessed using the sc object.
The PySpark - Local runtime environment launches the spark driver locally on the workspace node and all spark executors are created locally on the same node. It does not require a spark cluster to run and can be used for running smaller spark applications which don’t exceed the capacity of a single node.
The spark configuration can be found at /data/.spark/local/conf/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
The PySpark – Remote runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will not dynamically spin up executors, hence it will not trigger the cluster to auto scale when using a Dynamic Bench cluster.
The spark configuration can be found at /data/.spark/remote/conf/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
The PySpark – Remote - Dynamic runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will increase/decrease the required executors which will result into a cluster that auto scales using a Dynamic Bench cluster
The spark configuration can be found at /data/.spark/remote/conf-dynamic/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
Every cluster member has a certain capacity depending on the selection of the model for the member.
A spark application consists of 1 or more jobs. Each Job consists out of one or more stages. Each stage consists out of one or more tasks. Task are handled by executors and executors are run on a worker (cluster member).
The following setting define the amount of cpus needed per task
The following settings define the size of a single executor which handles the execution of a task
The above example allows an executor to handle 4 tasks concurrently and share a total capacity of 4Gb of memory. Depending on the resource model chosen (e.g. standard-2xlarge) a single cluster member (worker node) is able to run multiple executors concurrently (e.g. 32 cores, 128 Gb for 8 concurrent executors on a single cluster member)
The Spark UI can be accessed via the Cluster. The Web Access URL is displayed in the Workspace details page
This Spark UI will register all applications submitted when using one of the Remote Jupyter kernels. It will provide an overview of the registered workers (Cluster members) and the applications running in the Spark cluster.
See the website


spark.task.cpus 1spark.executor.cores 4
spark.executor.memory 4gThe main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.
You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.
For each workspace, you can see the status by the color.
Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.
You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.
If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.
Click Projects > Your_Project > Bench > Workspaces > + Create Workspace
Complete the following fields and save the changes.
(*1) URLs must comply with the following rules:
URLs can be between 1 and 263 characters including dot (.).
URLs can begin with a leading dot (
(*2) When you grant workspace access to multiple users, you need to provide an to access the workspace. Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. enter the API Key generated from the product dashboard. See for more information.
The following are example URLs which will be considered valid.
example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\
Example data science-specific whitelist compatible with restricted Bench workspaces. There are two required URLs to allow for Python pip installs:
pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\
The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.
When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.
Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.
The determines if someone is an administrator or contributor, while the dedicated indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.
For security reasons, the Tenant administrator and Project owner can always access the workspace.
If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".
The permissions that a Bench workspace can receive are the following:
Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)
Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.
If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.
Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.
To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.
The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.
The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.
As long as the workspace is running, the resources provided for this workspace will be charged.
To start the workspace, follow the next steps:
Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.
You can refresh the workspace status by selecting the round refresh symbol at the top right.
Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.
You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.
If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.
When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.
Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.
Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.
The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.
You can see who is using a workspace in the workspace list view.
Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.
The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.
To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).
The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.
In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.
Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).
The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.
Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.
From within the workspace it is possible to create a tool from the Docker image.
Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.
The building can take a while. When it has completed, the tool will be available in the Tool Repository.
To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.
For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.
Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).
The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.
In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.
Workspaces can have their own dedicated cluster which consists of a number of nodes. First the workspace node, which is used for interacting with the cluster, is started. Once the workspace node is started, the workspace cluster can be started.
The cluster consists of 2 components
The manager node which orchestrates the workload across the members.
Anywhere between 0 and up to maximum 50 member nodes.
Static - A static cluster has a manager node and a static number of members. At start-up of the cluster, the system ensures the predefined number of members are added to the cluster. These nodes will keep running as long as the entire cluster runs. The system will not automatically remove or add nodes depending on the job load. This gives the fastest resource availability, but at additional cost as unused nodes stay active, waiting for work.
Dynamic - A dynamic cluster has a manager node and a dynamic number of workers up to a predefined maximum (with a hard limit of 50). Based on the job load the system will scale the number of members up or down. This saves resources as only as much worker nodes as needed to perform the work are being used.
You manage Bench Clusters via the Illumina Connected Analytics UI in Projects > your_project > Bench > Workspaces > your_workspace > Details.
The following settings can be defined for a bench cluster:
Once the is started, the cluster can be started at Projects > your_project > Bench > Workspaces > your_workspace > Details and the cluster can be stopped without stopping the workspace. Stopping the workspace will also stop all clusters in that workspace.
Data in a bench workspace can be divided into three groups:
Workspace data is accessible in read/write mode and can be accessed from all workspace components (workspace node, cluster manager node, cluster member nodes ) at /data. The size of the workspace data is defined at the creation of the workspace but can be increased when editing a workspace in the Illumina connected analytics UI. This is persistent storage and data remains when a workspace is shut down.
Project data can be accessed from all workspace components at /data/project. Every component will have their own dedicated mount to the project. Depending on the project data permissions you will be able to access it in either Read-Only or Read-Write mode.
All mounts occur in /data/mounts/, see and .
Managing these mounts is done via the workspace cli /data/.local/bin/workspace-ctl in the workspace. Every node will have his dedicated mount.
For fast data access, bench offers a mount solution to expose project data on every component in the workspace. This mount provides read-only access to a given location in the project data and is optimized for high read throughput per single file with concurrent access to files. It will try to utilise the full bandwidth capacity of the node.
All mounts occur in path /data/mounts/
For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
This has the same effect as using the --mode read-only option because this is applied by default when using workspace-ctl data create-mount .
.Domain and Sub-domains:
Can include alphanumeric characters (Letters A-Z and digits 0-9). Case insensitive.
Can contain hyphens (-) and underscores (_), but not as a first or last character.
Length between 1 and 63 characters.
Dot (.) must be placed after a domain or sub-domain.
If you use a trailing slash like in the path ftp.example.net/folder/ then you will not be able to access the path ftp.example.net/folder without the trailing slash included.
Regex for URL : [(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=-]\{2,256}\.[a-z]\{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)
Base (No Access - Viewer - Contributor)
Click the Docker Build tab.
Here the image that accompanies the tool will be created.
Change the name for the image.
Change the version.
Replace the description to describe what the image does.
Below the line where it says “#Add your commands below.” write the code necessary for running this docker image.
Click the General tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the Tool creation section in the Flow documentation.
Click the Save button in the upper, right-hand corner to start the build process.

Include ephemeral storage
Select this to create scratch space for your nodes. Enabling it will make the storage size selector appear. The stored data in this space is deleted when the instance is terminated. When you deselect this option, the storage size is 0.
Storage size
How much storage space (1GB - 16 TB) should be reserved per node as dedicated scratch space, available at /scratch
/scratch and can be used to store intermediate results for a given job dedicated to that member. This is temporary storage, and all data is deleted when a cluster member is removed from the cluster.Web access
Enable or disable web access to the cluster manager.
Dedicated Cluster Manager
Use a dedicated node for the cluster manager. This means that an entire machine of the type defined at resource model is reserved for your cluster manager. If no dedicated cluster manager is selected, one core per cluster member will be reserved for scheduling. For example, if you have 2 nodes of standard-medium (4 cores) and no dedicated cluster manager, then only 6 (2x3) cores are available to run tasks as each node reserves 1 core for the cluster manager.
Type
Choose between Static and Dynamic cluster members
Scaling interval
For static, set the number of cluster member nodes (maximum 50), for dynamic, choose the minimum and maximum (up to 50) amount of cluster member nodes.
Resource model
Economy mode
Economy mode uses AWS spot instances. This halves many compute iCredit rates vs standard mode, but may be interrupted. See Pricing for a list of which resource models support economy pricing.
workspace-ctl data get-mountsworkspace-ctl data create-mount --mount-path /data/mounts/mydata --source /data/project/mydataworkspace-ctl data delete-mount --mount-path /data/mounts/mydata