The data associated with a sequencing run can consist of thousands of files, including base call (BCL) data, images, metrics, log files, and samplesheets. Once the data has been processed into FASTQ files, the raw run data is almost never accessed again.
We have started to roll out an automated process in BaseSpace Sequence Hub that zips the BCL and image files associated with old runs. This feature will be rolled out in stages across different user groups and different regions. Any run that hasn’t been modified for more than 90 days will be eligible for zipping. Metrics, log files, samplesheets, and other files in the run folder will be left untouched, and the ability to review run metrics remains unchanged.
The zip files generated can be downloaded and unzipped using the desktop or command line tools available by default on common operating systems. Zipping is not expected to reduce file size considerably, as BCL files are already compressed. However, by reducing the number of files associated with each run, it will be faster and more reliable to move runs in and out of archival storage. Downloading runs is also quicker, as is transferring them to other Illumina systems.
When a run is zipped, it will not be available for re-queue, or to use as an input to applications launched using the BaseSpace website. If you need to re-process a run that has been zipped, you will need to unzip it first using the Unzip Run option. The option is available in the File menu while viewing that run in BaseSpace. Depending on the size of the run, unzipping can take a few minutes to a few hours. The file status will change to Active when unzipping is complete, and it can then be used as input to applications.
It probably doesn’t. Older runs are automatically zipped to reduce the number of files that the BaseSpace system must keep track of. All details of the run can be viewed as before. If you need to process the run using an application, you will need to unzip it first.
It probably doesn’t. Your run has been selected to be automatically zipped by the BaseSpace system. While a run is being actively zipped, most editing operations are disabled, and the run may not be deleted. The run should transition to a status of zipped automatically after a period of a few minutes to a few hours, depending on the size of the run.
In the BaseSpace website, navigate to the zipped run. In the File menu, select Unzip, and then Unzip Run. The run’s file status will change to unzipping and then Active, when unzipping completes, which could take minutes or hours depending on the size of the run. Only a run’s owner can unzip the run.
The zipping process only processes run data files. FASTQ files are considered the results of a secondary analysis and will not be modified.
Some small and useful files from the run directory are not zipped. Here’s a breakdown of how files are treated:
| BCL files in the Data/ directory | Zipped into Data/{runId}_Data.zip | | Images in the Thumbnails/ directory, if it exists | Zipped into Thumbnails/{runId}_Thumbnails.zip | | Samplesheet.csv | Left unzipped, useful to inspect | | Metrics files in InterOp/ directory | Left unzipped, to provide data for charts | | Log files, other top-level files | Left unzipped, for inspection |
In the context of BaseSpace Sequence Hub, all these terms have distinct meanings:
Zipping: Old runs are zipped automatically by the BaseSpace system. 1000s of BCL files will be combined into a single Zip file that makes most run operations (download, deletion, transfer) more efficient. Some small disk space savings are expected.
Archiving: Runs and datasets can be archived, which means that the associated files are moved to a cheaper storage tier. This can dramatically lower your storage bill. See the archival documentation for more information.
Compression: Compression is used to make files take up less space on disk. Some file types are automatically compressed by the BaseSpace system. The process is transparent to end users.
Definitely. If you need to keep your run data for an extended period, and you don’t need frequent access to the run’s raw data, you should consider archiving it to lower your storage bill. Archiving a run before it has been zipped will stop automated zipping from happening. You are welcome to do this, but archiving a zipped run makes any subsequent retrieval cost less and be more efficient.
Neither zipping nor unzipping cost the run owner anything. Archiving a zipped run will give you dramatic savings on your storage bill, and zipping before archiving will make the archiving process quicker.
As a first step, wait for 24 hours. A large run make take several hours to zip, and the BaseSpace system may make several attempts to zip the run in the event of a failure. If a run has a status of Zipping for more than 48 hours, please contact Illumina technical support ( techsupport@illumina.com ) with the ID of the problematic run so the issue can be resolved.
If automated run zipping is interfering with your workflows, please contact Illumina technical support ( techsupport@illumina.com ) who can mark your account as exempt from the automated zipping process.
Run zipping is a continuous background process co-ordinated by BaseSpace Sequence Hub. The process zips the oldest available runs first and must clear a long backlog of runs. There may be other reasons to, for example if the run was previously archived and un-archived, was uploaded directly to ICA storage, or is too small to be worth zipping. Finally, run zipping is being rolled out in stages across different regions and user groups, so it possibly hasn't been enabled for the specific run you're interested in yet.