Data
Last updated
Was this helpful?
Last updated
Was this helpful?
The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.
ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the .)
See the list of supported
On the Projects > your_project > Data page, you can view file information and preview files.
To view file details click on the filename to see the file details.
Run input tags identify the last 100 pipelines which used this file as input.
Connector tags indicate if the file was added via browser upload or connector.
To view file contents, select the checkbox at the begining of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click view to preview the file.
To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.
For folders, the list of ongoing actions is displayed on top left of the folder details. When clicking the list, the data job details are shown of the most recent created data job of all actions.
When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).
To hyperlink to data, use the following syntax:
ServerURL
see browser addres bar
projectID
At YourProject > Details > URN > urn:ilmn:ica:project:ProjectID#MyProject
FolderID
At YourProject > Data > folder > folder details > ID
AnalysisID
At YourProject > Flow > Analyses > YourAnalysis > ID
Uploading data to the platform makes it available for consumption by analysis workflows and tools.
To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either
Drag a file from your system into the Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.
Your files are added to the Data page with status partial during upload and become available when upload completes.
You can copy data from the same project to a different folder or from another project to which you have access.
In order to copy data, the following rights must be assigned to the person copying the data:
Within a project
Contributor rights
Upload and Download rights
Contributor rights
Upload and Download rights
Between different projects
Download rights
Viewer rights
Upload rights
Contributor rights
The following restrictions apply when copying data:
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
No partial data
No archived data
Within the same region
No linked data
Within the same region
Data in the "Partial" or "Archived" state will be skipped during a copy job.
To use data copy:
Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.
Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).
Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).
Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.
The outcome can be
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are copied.
PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.
FAILED - None of the files and folders could be copied.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.
You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.
Move From is used when you are in the destination location.
Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.
When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.
There are a number of rights and restrictions related to data move as this will delete the data in the source location.
Within a project
Contributor rights
Contributor rights
Between different projects
Download rights
Contributor rights
Upload rights
Viewer rights
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
Data owned by user's tenant
No linked data
No partial data
No archived data
No externally managed projects
Within the same region
No linked data
Within same region
Move jobs will fail if any data being moved is in the "Partial" or "Archived" state.
Move Data From is used when you are in the destination location.
Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.
Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move To.
Select your target project and location.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are moved.
PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.
FAILED - None of the files and folders could be moved.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
Single files can be downloaded directly from within the UI.
Select the checkbox next to the file which you want to download, followed by Download > Download file.
Files for which ICA can display the contents can be viewed by clicking on the filename, followed by the View tab. Select the download action on the view tab to download the file. Note that larger files may take some time to load.
You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.
Select a file or files to download.
Select Download > Download files or folders using a service connector. This will display a list of all available connectors.
Select a connector, and then select Schedule for Download. If you do not find the connector you need or you do not have a connector, you can click the Don't have a connector yet?
option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.
You can view the progress of the download or stop the download on the Activity page for the project.
The data records contained in a project can be exported in CSV, JSON, and excel format.
Select one or more files to export.
Select Export.
Select the following export options:
To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.
To export only the columns present for the file, select the Visible columns as the Columns to export option.
Select the export format.
To manually archive or delete files, do as follows:
Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
Archive — Move the file or files to long-term storage (event code ICA_DATA_110).
Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).
Delete — Remove the file completely (event code ICA_DATA_106).
When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.
To archive or delete files programmatically, you can use ICA's API endpoints:
Modify the dates of the file to be deleted/archived.
Linking a folder creates a dynamic read-only view of the source data. You can use this to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required.
You can recognise linked data by the green color and see the owning project as part of the details.
Since this is read-only access, you cannot perform actions on linked data that need to write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.
You can perform analysis on data from other projects by linking data from that project.
Select Projects > your_project > Data > Manage, and then select Link.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.
Select Select Data.
Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.
If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.
To see more details, double-click the batch job.
To see how many individual files are already linked, double-click the item.
To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.
The GUI considers non-indexed folders as a single object. You can access the contents from a non-indexed folder
as Analysis input/output
in Bench
via the API
Creation
Yes
Deletion
Yes
You can delete non-indexed folders by selecting them at Projects > your_project > Data > select the folder > Manage > Delete.
or with the /api/projects/{projectId}/data/{dataId}:delete
endpoint
Uploading Data
API Bench Analysis
Use non-indexed folders as normal folders for Analysis runs and bench. Different methods are available with the API such as creating temporary credentials to upload data to S3 or using /api/projects/{projectId}/data:createFileWithUploadUrl
Downloading Data
Yes
Use non-indexed folders as normal folders for Analysis runs and bench. Use temporary credentials to list and download data with the API.
Analysis Input/Output
Yes
Non-indexed files can be used as input for an analysis and the non-indexed folder can be used as output location. You will not be able to view the contents of the input and output in the analysis details screen.
Bench
Yes
Non-indexed folders can be used in Bench and the output from Bench can be written to non-indexed folders. Non-indexed folders are accessible across Bench workspaces within a project.
Viewing
No
The folder is a single object, you can not view the contents.
Linking
No
You cannot see non-indexed folder contents.
Copying
No
Prohibited to prevent storage issues.
Moving
No
Prohibited to prevent storage issues.
Managing tags
No
You cannot see non-indexed folder contents.
Managing format
No
You cannot see non-indexed folder contents.
Use as Reference Data
No
You cannot see non-indexed folder contents.
Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the .
See
Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or when uploading large amounts of data.
For instructions on uploading/downloading data via CLI, see .
Copying data from your own S3 storage requires additional configuration. See and ..
This partial move may cause data at the destination to become unsynchronized between the object store (S3) and ICA. To resolve this, users can create a folder session on the parent folder of the destination directory by following the steps in the API: and then . Ensure the Move job is already aborted before submitting the folder session create and complete requests. Wait for the session status t
Note: You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename an existing folder, please see .
the file's information.
the updated information back in ICA.
Non-indexed folders () are designed for optimal performance in situations where no file actions are needed. They serve as fast storage in situations like temporary analysis file storage where you don't need access or searches via the GUI to individual files or subfolders within the folder. Think of a non-indexed folder as a data container. You can access the container which contains all the data, but you can not access the individual data files within the container from the GUI. As non-indexed folders contain data, they count towards your total project storage.
You can create non-indexed folders at Projects > your_project > Data > Manage > Create non-indexed folder. or with the /api/projects/{projectId}/data:createNonIndexedFolder
To add filters, select the funnel/filter symbol at the top right, next to the search field.
To change which columns are displayed, select the three columns symbol and select which columns should be shown.
You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column.
Replace
Overwrites the existing data. Folders will copy their data in an existing folder with existing files. Existing files will be replaced when a file with the same name is copied and new files will be added. The remaining files in the target folder will remain unchanged.
Don't copy
The original files are kept. If you selected a folder, files that do not yet exist in the destination folder are added to it. Files that already exist at the destination are not copied over and the originals are kept.
Keep both
Files have a number appended to them if they already exist. If you copy folders, the folders are merged, with new files added to the destination folder and original files kept. New files with the same name get copied over into the folder with a number appended.