Bench workspaces require setting a docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default docker image with JupyterLab installed.
JupyterLab supports Jupyter Notebook documents (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.
The JupyterLab docker image contains the following environment variables:
ICA_URL
set to the ICA server URL https://ica.illumina.com/ica
ICA_PROJECT
(OBSOLETE) set to the current ICA project ID
ICA_PROJECT_UUID
set to the current ICA project UUID
ICA_SNOWFLAKE_ACCOUNT
set to the ICA Snowflake (Base) Account ID
ICA_SNOWFLAKE_DATABASE
set to the ICA Snowflake (Base) Database ID
ICA_PROJECT_TENANT_NAME
set to the tenant name of the owning tenant of the project where the workspace is created
ICA_STARTING_USER_TENANT_NAME
set to the tenant name of the tenant of the user which last started the workspace
ICA_COHORTS_URL
set to the URL of the Cohorts web application used to support the Cohorts view
Note: To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data.
Included in the default JupyterLab docker image is a python library with APIs to perform actions in ICA, such as add data, launch pipelines, and operate on Base tables. The python library is generated from the ICA Open API specification using openapi-generator.
The ICA Python library API documentation can be found in folder /etc/ica/data/ica_v2_api_docs
within the JupyterLab docker image.
See the Bench ICA Python Library Tutorial for examples on using the ICA Python library.
Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.
As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:
Copy project data
Delete project data
Mount project data (CLI only)
Unmount project data (CLI only)
When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.
WARNING: This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.
The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa. There is a file size limit of 500 GB per file for the FUSE driver.
The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.
WARNING: Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.
Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.
MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.
Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.
WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
❗️ Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.
Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.
Some examples of other actions or commands that will not work because of the above mentioned limitations:
Save a jupyter notebook or R script on the /project location
Add/remove a file from an existing zip file
Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt
Rename a file due to the existing association between ICA and AWS
Move files or folders.
Using vi or another editor
A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.
Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.
To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.
This functionality won't work for old workspaces unless you enable the permissions for that old workspace.
ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.
Having access to Bench depends on the following conditions:
Bench needs to be included in your ICA subscription.
The project owner needs to enable Bench for their project.
Individual users of that project need to be given access to Bench.
After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. If you do not see this option, then either your tenant subscription does not include Bench or you belong to a tenant different from the one where the project was created. Users from other tenants cannot enable the Bench module, but can create workspaces. Once enabled, every user who has the correct permissions has access to the Bench module in that project.
Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.
Tenant administrators and project owners are always able to access Bench and perform all actions.
The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.
No Access means you have no access to the Bench workspace for that project.
Contributor gives you the right to start and stop the Bench workspace and to access the workspace contents, but not to create or edit the workspace.
Administrator gives you the right to create, edit, delete, start and stop the Bench workspace, and to access the actual workspace contents. In addition, the administrator can also build new derived Bench images and tools.
Finally, a verification is done of your user rights against the required workspace permissions. You will only have access when your user rights meet or exceed the required workspace permissions. The possible required Workspace permissions include:
Upload / Download rights (Download rights are mandatory for technical reasons)
Project Level (No Access / Data Provider / Viewer / Contributor)
Flow (No Access / Viewer / Contributor)
Base (No Access / Viewer / Contributor)
Bench images are Docker containers tailored to run in ICA with the necessary permissions, configuration and resources. For more information of Docker images, please refer to https://docs.docker.com/reference/dockerfile/
The following steps are needed to get your bench image running in ICA.
You need to have Docker installed in order to build your images.
For your Docker bench image to work in ICA, they must run on Linux X86 architecture, have the correct user id and initialization script in the Docker file.
For easy reference, you can find examples of preconfigured Bench images on the Illumina website which you can copy to your local machine and edit to suit your needs.
Bench-console provides an example to build a minimal image compatible with ICA Bench to run a SSH Daemon.
Bench-web provides an example to build a minimal image compatible with ICA Bench to run a Web Daemon.
Bench-rstudio provides an example to build a minimal image compatible with ICA Bench to run a rStudio Open Source.
These examples come with information on the available parameters.
The following scripts must be part of your Docker bench image. Please refer to the examples from the Illumina website for more details.
This script copies the ica_start.sh
file which takes care of the Initialization and termination of your workspace to the location in your project from where it can be started by ICA when you request to start your workspace.
The user settings must be set up so that bench runs with UID 1000.
To do a clean shutdown, you can capture the sigterm which is transmitted 30 seconds before the workspace is terminated.
Once you have Docker installed and completed the configuration of your Docker files, you can build your bench image.
Open the command prompt on your machine.
Navigate to the root folder of your Docker files.
Execute docker build -f Dockerfile -t mybenchimage:0.0.1 .
with mybenchimage being the name you want to give to your image and 0.0.1 replaced with the version number which you want your bench image to be. For more information on this command, see https://docs.docker.com/reference/cli/docker/buildx/build/
Once the image has been built, save it as docker tar file with the command docker save mybenchimage:0.0.1 | bzip2 > ../mybenchimage-0.0.1.tar.bz2
The resulting tar file will appear next to the root folder of your docker files.
If you want to build on a mac with Apple Silicon, then the build command is docker buildx build --platform linux/amd64 -f Dockerfile -t mybenchimage:0.0.1 .
Open ICA and log in.
Go to Projects > your_project > Data.
For small Docker images, upload the docker image file which you generated in the previous step. For large Docker images use the service connector to better performance and reliability to import the Docker image.
Select the uploaded image file and perform Manage > Change Format.
From the format list, select DOCKER and save the change.
Go to System Settings > Docker Repository > Create > Image.
Select the uploaded docker image and fill out the other details.
Name: The name by which your docker image will be seen in the list
Version: A version number to keep track of which version you have uploaded. In our example this was 0.0.1
Description: Provide a description explaining what your docker images does or is suited for.
Type: The type of this image is Bench. The Tool type is reserved for tool images.
Cluster compatible: [For Future Use, not currently supported] Indicates if this docker images is suited for cluster computing
Access: This setting must match the available access options of your Docker image. You can choose web access, console access or both. What is selected here becomes available on the + New Workspace screen. Enabling an option here which your Docker image does not support, will result in access denied errors when trying to run the workspace.
Regions: If your tenant has access to multiple regions, you can select to which regions to replicate the docker image.
Once the settings are entered, select Save. The creation of the Docker image typically takes between 5 and 30 minutes. The status of your docker image will be partial during creation and available once completed.
Navigate to Projects > your_project > Bench > Workspaces.
Create a new workspace with + New Workspace or edit an existing workspace.
Fill in the bench workspace details according to Workspaces.
Save your changes.
Select Start Workspace
Wait for the workspace to be started and you can access it either via console or the GUI.
Once your bench image has been started, you can access it via console, web or both, depending on your configuration.
Web access is done from either Projects > your_project > Bench > Workspaces > your_Workspace > Access tab or from the link provided at provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
Console access is performed from your command prompt by going to the path provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
The bench image will be instantiated as a container which will be forcedly started as user with UID 1000 and GID 100.
You cannot elevate your permissions in a running workspace.
Do not run containers as root as this is bad security practice.
Only the following folders are writeable:
/data
/tmp
All other folders are mounted as read-only.
For inbound access, the following ports on the container are publicly exposed, depending on the selection made at startup.
Web: TCP/8888
Console: TCP/2222
For outbound access, a workspace can be started in two modes:
Public: Access to public IP’s is allowed using TCP protocol.
Restricted: Access to list of URLs are allowed.
At runtime, the following Bench-specific environment variables are made available to the workspace instantiated from the Bench image.
Following files and folders will be provided to the workspace and made accessible for reading at runtime.
At runtime, ICA-related software will automatically be made available at /data/.software in read-only mode.
New versions of ICA software will be made available after a restart of your workspace.
When a bench workspace is instantiated from your selected bench image, the following script is invoked: /usr/local/bin/ica_start.sh
This script needs to be available and executable otherwise your workspace will not boot.
This script is the main process in your running workspace and cannot run to completion as it will stop the workspace and instantiate a restart (see init script).
This script can be used to invoke other scripts.
When you stop a workspace, a TERM signal is sent to the main process in your bench workspace. You can trap this signal to handle the stop gracefully (see shutdown script) and shut down child processes of the main process. The workspace will be forcedly shut down after 30 seconds if your main process hasn’t stopped within the given period.
If you get the error "docker buildx build" requires exactly 1 argument when trying to build your docker image, then a possible cause is missing the last .
of the command.
When you stop the workspace when users are still actively using it, they will receive a message showing a Server Connection Error.
The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.
You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.
If this is the first time you are using a workspace in a Project, click Enable
to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.
Click Projects > Your_Project > Bench > Workspaces > + New Workspace
Complete the following fields:
Name: (required) must be a unique name.
Docker image: (required) The list of docker images includes base images from ICA and images uploaded to the docker repository for that domain.
Storage size: (required) Represents the size of the storage available on the workspace. A storage from 10GB to 64TB can be provided.
Resource model: (required) Size of the machine on which the workspace will run and whether or not the machine should contain a Graphics Processing Unit (GPU).
Resource Model | CPU | Memory (GB) | GPU | GPU Memory (GB) |
---|
Access: The options here are determined by the . The options you select will become available on the details tab of the Workspace when it is running.
Web acces allows to interact with the workspace via a browser.
Console access provides a terminal to interact with the workspace.
Access mode: (required) Type of access to the internet which should be provided for this workspace
Open: Internet access is allowed
Restricted: Creates a workspace with no internet access. Access to the ICA Project Data is still available in this mode.
Whitelisted URLs: Specify URLs and paths that are allowed in a restricted workspace. Separate URLS with a new line. Only domains and subdomains in the specified URL will be allowed.
URLs must comply with the following:
URLs can be between 1 and 263 characters including dot (.
).
URLs can begin with a leading dot (.
).
Domain and Sub-domains:
Can include alphanumeric characters (Letters A-Z and digits 0-9). Case insensitive.
Can contain hyphens (-
) and underscores (_
), but not as a first or last character.
Length between 1 and 63 characters.
Dot (.
) must be placed after a domain or sub-domain.
Note that if you use a trailing slash like in the path ftp.example.net/folder/ then you will not be able to access the path ftp.example.net/folder without the trailing slash included.
Regex for URL : [(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=-]\{2,256}\.[a-z]\{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)
Example URLs accepted:
example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\
Example data science specific whitelist compatible with restricted Bench workspaces. Note there are two required URLs to allow for Python pip installs:\
pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\
Workspace Permissions: Your workspace will operate with these permissions. For security reasons, users will need to have the permissions matching what you set here to run the workspace, regardless of their role.
Description: A place to provide additional information about the workspace
Click “Save”
The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.
Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.
For security reasons, the Tenant administrator and Project owner can always access the workspace.
If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".
The permissions that a Bench workspace can receive are the following:
Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)
Flow (No Access - Viewer - Contributor)
Base (No Access - Viewer - Contributor)
Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.
If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.
In case of an old workspace, you have the possibility to enable these workspace permissions. If enabled, you will see the same behavior as described above. If not, the workspace will continue working as before.
To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.
The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.
The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.
As long as the workspace is running, the resources provided for this workspace will be charged.
To start the workspace, follow the next steps:
Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.
Wait until the status is “Running” and the “Access” tab can be opened. This can take some time because the necessary resources have to be provisioned.
If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.
When you exit a workspace, a choice will be given if you want to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right. The workspace will be stopped if it is not accessed for more than 7 days to avoid unnecessary costs.
Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored. Storage will continue to be charged until the workspace is deleted.
NOTE: Administrators will also see a delete option for the workspace in the exit screen.
The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.
To see if somebody is actively working in a workspace, go to Projects > your_project > Bench > Workspaces and look at the workspace. If there is a user in the workspace, they will be shown on the bottom right of the workspace.\
Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.
The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.
To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).
NOTE: The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.
In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.
Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).
Code: The Docker file commands must be provided in this section.
The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.
Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.
From within the workspace it is possible to create a docker image and a tool from it at the same time.
Click the + Create Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.
Add a version number for the tool.
Click the Image tab.
Here the IMAGE that accompanies the TOOL will be created.
Change the name for the image.
Change the version.
Replace the description to describe what the image does.
Below the line where it says “#Add your commands below.” write the code necessary for running this docker image.
Click the General Tool tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the Tool creation section in the Flow documentation.
Click the Save button in the upper, right-hand corner to start the build process.
The building can take a while. When it has completed, the tool will be available in the Tool Repository.
To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data.
Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).
The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.
In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.
Name | Description | Example Values |
---|---|---|
Name | Description |
---|---|
Name | Description |
---|---|
create/edit | delete | start/stop | access contents |
---|
The determines if someone is an administrator or contributor, while the dedicated permissions you set on the workspace level indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.
Alternatively, you can see who is using a workspace in the workspace list view or in the workspace itself, near the workspace actions at the top.
ICA_WORKSPACE
The unique identifier related to the started workspace. This value is bound to a workspace and will never change.
32781195
ICA_CONSOLE_ENABLED
Whether Console access is enabled for this running workspace.
true, false
ICA_WEB_ENABLED
Whether Web access is enabled for this running workspace.
true, false
ICA_SERVICE_ACCOUNT_USER_API_KEY
An API key that allows interaction with ICA using the ICA CLI and is bound to the permissions defined at startup of the workspace.
ICA_BENCH_URL
The host part of the public URL which provides access to the running workspace.
use1-bench.platform.illumina.com
ICA_PROJECT_UUID
The unique identifier related to the ICA project in which the workspace was started.
ICA_URL
The ICA Endpoint URL.
HTTP_PROXY
HTTPS_PROXY
The proxy endpoint in case the workspace was started in restricted mode.
HOME
The home folder.
/data
/etc/workspace-auth
Contains the SSH rsa public/private keypair which is required to be used to run the workspace SSHD.
/data
This folder contains all data specific to your workspace.
Data in this folder is not persisted in your project and will be removed at deletion of the workspace.
/data/project
This folder contains all your project data.
/data/.software
This folder contains ICA-related software.
Contributor | - | - | X | when permissions match those of the workspace |
Administrator | X | X | X | when permissions match those of the workspace |
standard-small | 2 | 8 |
standard-medium | 4 | 16 |
standard-large | 8 | 32 |
standard-xlarge | 16 | 64 |
standard-2xlarge | 32 | 128 |
gpu-small | 8 | 61 | 1 | 16 |
gpu-medium | 32 | 244 | 4 | 64 |
gpu-large | 64 | 488 | 8 | 128 |
hicpu-small | 16 | 32 |
hicpu-medium | 36 | 72 |
hicpu-large | 72 | 144 |
himem-small | 8 | 64 |
himem-medium | 16 | 128 |
himem-large | 48 | 384 |
himem-xlarge | 96 | 768 |
hiio-small | 2 | 16 |
hiio-medium | 4 | 32 |