Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.
As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:
Copy project data
Delete project data
Mount project data (CLI only)
Unmount project data (CLI only)
When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.
WARNING: This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.
The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa. There is a file size limit of 500 GB per file for the FUSE driver.
The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.
WARNING: Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.
Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.
MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.
Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.
WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
❗️ Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.
Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.
Some examples of other actions or commands that will not work because of the above mentioned limitations:
Save a jupyter notebook or R script on the /project location
Add/remove a file from an existing zip file
Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt
Rename a file due to the existing association between ICA and AWS
Move files or folders.
Using vi or another editor
A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.
Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.
To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.
This functionality won't work for old workspaces unless you enable the permissions for that old workspace.