1 of 4

Workspaces

The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.

You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.

For each workspace, you can see the status by the color.

Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.

You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.

Create Workspace

If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.

Click Projects > Your_Project > Bench > Workspaces > + Create Workspace
Complete the following fields and save the changes.

(*1) URLs must comply with the following rules:

URLs can be between 1 and 263 characters including dot (.).
URLs can begin with a leading dot (

(*2) When you grant workspace access to multiple users, you need to provide an to access the workspace. Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. enter the API Key generated from the product dashboard. See for more information.

Example URLs

The following are example URLs which will be considered valid.

example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\

Example data science-specific whitelist compatible with restricted Bench workspaces. There are two required URLs to allow for Python pip installs:

pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\

The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.

Workspace permissions

When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

Administrator vs Contributor

Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.

Setting Workspace Permissions

The determines if someone is an administrator or contributor, while the dedicated indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.

For security reasons, the Tenant administrator and Project owner can always access the workspace.

If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".

The permissions that a Bench workspace can receive are the following:

Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)

Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.

If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.

Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.

Delete workspace (Bench Administrators Only)

To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.

The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.

Use workspace

The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.

As long as the workspace is running, the resources provided for this workspace will be charged.

Start workspace

To start the workspace, follow the next steps:

Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.

You can refresh the workspace status by selecting the round refresh symbol at the top right.

Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.

You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.

If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.

Stop workspace

When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.

Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.

Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.

The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.

You can see who is using a workspace in the workspace list view.

Workspace Tabs

Access tab

Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.

The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.

Docker Builds tab (Bench Administrators only)

To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).

The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.

In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.

Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).

The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.

Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.

Tools (Bench Administrators Only)

From within the workspace it is possible to create a tool from the Docker image.

Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.

The building can take a while. When it has completed, the tool will be available in the Tool Repository.

Workspace Data

To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.

For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.

Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).

Activity tab

The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.

In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.

Bench Clusters

Managing a Bench cluster

Introduction

Workspaces can have their own dedicated cluster which consists of a number of nodes. First the workspace node, which is used for interacting with the cluster, is started. Once the workspace node is started, the workspace cluster can be started.

The cluster consists of 2 components

The manager node which orchestrates the workload across the members.
Anywhere between 0 and up to maximum 50 member nodes.

Clusters can run in two modes.

Static - A static cluster has a manager node and a static number of members. At start-up of the cluster, the system ensures the predefined number of members are added to the cluster. These nodes will keep running as long as the entire cluster runs. The system will not automatically remove or add nodes depending on the job load. This gives the fastest resource availability, but at additional cost as unused nodes stay active, waiting for work.
Dynamic - A dynamic cluster has a manager node and a dynamic number of workers up to a predefined maximum (with a hard limit of 50). Based on the job load the system will scale the number of members up or down. This saves resources as only as much worker nodes as needed to perform the work are being used.

Configuration

You manage Bench Clusters via the Illumina Connected Analytics UI in Projects > your_project > Bench > Workspaces > your_workspace > Details.

The following settings can be defined for a bench cluster:

Field

Description

Operations

Once the is started, the cluster can be started at Projects > your_project > Bench > Workspaces > your_workspace > Details and the cluster can be stopped without stopping the workspace. Stopping the workspace will also stop all clusters in that workspace.

Managing Data in a Bench cluster

Data in a bench workspace can be divided into three groups:

Workspace data is accessible in read/write mode and can be accessed from all workspace components (workspace node, cluster manager node, cluster member nodes ) at /data. The size of the workspace data is defined at the creation of the workspace but can be increased when editing a workspace in the Illumina connected analytics UI. This is persistent storage and data remains when a workspace is shut down.
Project data can be accessed from all workspace components at /data/project. Every component will have their own dedicated mount to the project. Depending on the project data permissions you will be able to access it in either Read-Only or Read-Write mode.

Fast Read-Only Access

All mounts occur in /data/mounts/, see and .

Managing these mounts is done via the workspace cli /data/.local/bin/workspace-ctl in the workspace. Every node will have his dedicated mount.

For fast data access, bench offers a mount solution to expose project data on every component in the workspace. This mount provides read-only access to a given location in the project data and is optimized for high read throughput per single file with concurrent access to files. It will try to utilise the full bandwidth capacity of the node.

All mounts occur in path /data/mounts/

Show mounts

Creating a mount

For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.

This has the same effect as using the --mode read-only option because this is applied by default when using workspace-ctl data create-mount .

Removing a mount

Sun Grid Engine (SGE) on ICA Bench

Running Jobs in a Bench SGE Cluster

Once a cluster is started, the cluster manager can be accessed from the workspace node.

Job resources

Every cluster member has a certain capacity which is determined by the selected model for the cluster member.

The following complex values have been added to the SGE cluster environment and are requestable.

static_cores (default: 1)
static_mem (default: 2G)

These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.

To ensure stability of the system, some headroom is deducted from the total node capacity.

Scaling

These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.

Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.

Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.

The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log

Submitting jobs

Submitting a single job

Submitting a job array

Do not limit the job concurrency amount as this will result in unused cluster members.

Monitoring members

Listing all members of the cluster

Managing running/pending jobs

listing all jobs in the cluster

Showing the details of a job.

Deleting a job.

Managing executed jobs

Showing the details of an executed job.

SGE Reference documentation

SGE command line options and configuration details can be found .

Spark on ICA Bench

Running a Spark application in a Bench Spark Cluster

Running a pyspark application

The JupyterLab environment is by default configured with 3 additional kernels

PySpark – Local
PySpark –
PySpark –

When one of the above kernels is selected, the spark context is automatically initialised and can be accessed using the sc object.

PySpark - Local

The PySpark - Local runtime environment launches the spark driver locally on the workspace node and all spark executors are created locally on the same node. It does not require a spark cluster to run and can be used for running smaller spark applications which don’t exceed the capacity of a single node.

The spark configuration can be found at /data/.spark/local/conf/spark-defaults.conf.

Making changes to the configuration requires a restart of the Jupyter kernel.

PySpark - Remote

The PySpark – Remote runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.

This configuration will not dynamically spin up executors, hence it will not trigger the cluster to auto scale when using a Dynamic Bench cluster.

The spark configuration can be found at /data/.spark/remote/conf/spark-defaults.conf.

Making changes to the configuration requires a restart of the Jupyter kernel.

PySpark – Remote - Dynamic

The PySpark – Remote - Dynamic runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.

This configuration will increase/decrease the required executors which will result into a cluster that auto scales using a Dynamic Bench cluster

The spark configuration can be found at /data/.spark/remote/conf-dynamic/spark-defaults.conf.

Making changes to the configuration requires a restart of the Jupyter kernel.

Job resources

Every cluster member has a certain capacity depending on the selection of the model for the member.

A spark application consists of 1 or more jobs. Each Job consists out of one or more stages. Each stage consists out of one or more tasks. Task are handled by executors and executors are run on a worker (cluster member).

The following setting define the amount of cpus needed per task

The following settings define the size of a single executor which handles the execution of a task

The above example allows an executor to handle 4 tasks concurrently and share a total capacity of 4Gb of memory. Depending on the resource model chosen (e.g. standard-2xlarge) a single cluster member (worker node) is able to run multiple executors concurrently (e.g. 32 cores, 128 Gb for 8 concurrent executors on a single cluster member)

Spark User Interface

The Spark UI can be accessed via the Cluster. The Web Access URL is displayed in the Workspace details page

This Spark UI will register all applications submitted when using one of the Remote Jupyter kernels. It will provide an overview of the registered workers (Cluster members) and the applications running in the Spark cluster.

Spark Reference documentation

See the website

Workspaces

For each workspace, you can see the status by the color.

You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.

Create Workspace

Click Projects > Your_Project > Bench > Workspaces > + Create Workspace
Complete the following fields and save the changes.

(*1) URLs must comply with the following rules:

URLs can be between 1 and 263 characters including dot (.).
URLs can begin with a leading dot (

Example URLs

The following are example URLs which will be considered valid.

Example data science-specific whitelist compatible with restricted Bench workspaces. There are two required URLs to allow for Python pip installs:

pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\

The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.

Workspace permissions

When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

Administrator vs Contributor

Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.

Setting Workspace Permissions

For security reasons, the Tenant administrator and Project owner can always access the workspace.

The permissions that a Bench workspace can receive are the following:

Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)

Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.

Delete workspace (Bench Administrators Only)

To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.

Use workspace

As long as the workspace is running, the resources provided for this workspace will be charged.

Start workspace

To start the workspace, follow the next steps:

Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.

You can refresh the workspace status by selecting the round refresh symbol at the top right.

You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.

Stop workspace

Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.

You can see who is using a workspace in the workspace list view.

Workspace Tabs

Access tab

Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.

Docker Builds tab (Bench Administrators only)

In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.

Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).

Tools (Bench Administrators Only)

From within the workspace it is possible to create a tool from the Docker image.

Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.

The building can take a while. When it has completed, the tool will be available in the Tool Repository.

Workspace Data

For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.

Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).

Workspaces

hashtagCreate Workspace

hashtagWorkspace permissions

hashtagAdministrator vs Contributor

hashtagSetting Workspace Permissions

hashtagDelete workspace (Bench Administrators Only)

hashtagUse workspace

hashtagStart workspace

hashtagStop workspace

hashtagWorkspace Tabs

hashtagAccess tab

hashtagDocker Builds tab (Bench Administrators only)

hashtagTools (Bench Administrators Only)

hashtagWorkspace Data

hashtagActivity tab

Bench Clusters

hashtagManaging a Bench cluster

hashtagIntroduction

hashtagClusters can run in two modes.

hashtagConfiguration

hashtagOperations

hashtagManaging Data in a Bench cluster

hashtagFast Read-Only Access

hashtagShow mounts

hashtagCreating a mount

hashtagRemoving a mount

Sun Grid Engine (SGE) on ICA Bench

hashtagRunning Jobs in a Bench SGE Cluster

hashtagJob resources

hashtagScaling

hashtagSubmitting jobs

hashtagMonitoring members

hashtagManaging running/pending jobs

hashtagManaging executed jobs

hashtagSGE Reference documentation

Spark on ICA Bench

hashtagRunning a pyspark application

hashtagPySpark - Local

hashtagPySpark - Remote

hashtagPySpark – Remote - Dynamic

hashtagJob resources

hashtagSpark User Interface

hashtagSpark Reference documentation

Sun Grid Engine (SGE) on ICA Bench

hashtagRunning Jobs in a Bench SGE Cluster

hashtagJob resources

hashtagScaling

hashtagSubmitting jobs

hashtagMonitoring members

hashtagManaging running/pending jobs

hashtagManaging executed jobs

hashtagSGE Reference documentation

Spark on ICA Bench

hashtagRunning a pyspark application

hashtagPySpark - Local

hashtagPySpark - Remote

hashtagPySpark – Remote - Dynamic

hashtagJob resources

hashtagSpark User Interface

hashtagSpark Reference documentation

Workspaces

hashtagCreate Workspace

hashtagWorkspace permissions

hashtagAdministrator vs Contributor

hashtagSetting Workspace Permissions

hashtagDelete workspace (Bench Administrators Only)

hashtagUse workspace

hashtagStart workspace

hashtagStop workspace

hashtagWorkspace Tabs

hashtagAccess tab

hashtagDocker Builds tab (Bench Administrators only)

hashtagTools (Bench Administrators Only)

hashtagWorkspace Data

hashtagActivity tab

Bench Clusters

hashtagManaging a Bench cluster

hashtagIntroduction

hashtagClusters can run in two modes.

hashtagConfiguration

Create Workspace

Workspace permissions

Administrator vs Contributor

Setting Workspace Permissions

Delete workspace (Bench Administrators Only)

Use workspace

Start workspace

Stop workspace

Workspace Tabs

Access tab

Docker Builds tab (Bench Administrators only)

Tools (Bench Administrators Only)

Workspace Data

Activity tab

Managing a Bench cluster

Introduction

Clusters can run in two modes.

Configuration

Operations

Managing Data in a Bench cluster

Fast Read-Only Access

Show mounts

Creating a mount

Removing a mount

Running Jobs in a Bench SGE Cluster

Job resources

Scaling

Submitting jobs

Monitoring members

Managing running/pending jobs

Managing executed jobs

SGE Reference documentation

Running a pyspark application

PySpark - Local

PySpark - Remote

PySpark – Remote - Dynamic

Job resources

Spark User Interface

Spark Reference documentation

Running Jobs in a Bench SGE Cluster

Job resources

Scaling

Submitting jobs

Monitoring members

Managing running/pending jobs

Managing executed jobs

SGE Reference documentation

Running a pyspark application

PySpark - Local

PySpark - Remote

PySpark – Remote - Dynamic

Job resources

Spark User Interface

Spark Reference documentation

Create Workspace

Workspace permissions

Administrator vs Contributor

Setting Workspace Permissions

Delete workspace (Bench Administrators Only)

Use workspace

Start workspace

Stop workspace

Workspace Tabs

Access tab

Docker Builds tab (Bench Administrators only)

Tools (Bench Administrators Only)

Workspace Data

Activity tab

Managing a Bench cluster

Introduction

Clusters can run in two modes.

Configuration

Operations

Managing Data in a Bench cluster

Fast Read-Only Access

Show mounts

Creating a mount

Removing a mount