Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Connected Insights - Cloud supports data uploads from Connected Analytics (ICA) and BaseSpace Sequence Hub (BSSH) for defined workflows.
You must be a member of the workgroup to which the data is uploaded. A project not linked to the workgroup must be given permissions at the level of Contributor in the Team tab of ICA to be accessible from Connected Insights. User must have "Lab Director" role for Connected Insights to run the ici-uploader. For more information on adding users to a workgroup and adding permissions, refer to .
You must be a member of the workgroup to which the data is uploaded in both Connected Insights and BSSH. The user must have permission at the level of Has Access in BSSH to be accessible from Connected Insights.
This section provides instructions for adding a supported ICA pipeline.
In Connected Insights, select the gear in the top-right corner to open the Configuration page.
Select the General tab.
Select Data Upload.
Select the From Illumina Connected Analytics tab.
For Cloud Pipelines, select Add.
For Choose compatible pipeline from the catalog, select the applicable pipeline from the drop-down list, refer to .
For Test Definition, select the applicable definition.
Select Save.
ℹ️ For the configured Illumina Connected Analytics Pipeline, Connected Insights will automatically upload new analyses in ICA for the specified workflow. When data is uploaded, the status to the right of the workflow name will show the last analysis upload date. If a new project is created or linked after the configuration is made, it will need to be refreshed by editing and saving the configuration before it will begin to pick-up data from that project.
Locate the pipeline, and then select ...
Select Delete.
When prompted, select Yes, Remove.
This section provides instructions for manually uploading a completed analysis for a supported ICA pipeline.
From the Case List, select + New Case.
Select the Import from Connected Analytics option and click the button for Import from Connected Analytics.
Ensure the correct project is selected under Select Project.
Select the desired analysis to upload and click the button to Continue.
Click Review.
Review the information and click Submit to begin processing the selected analysis.
❗ The user must first configure the pipeline in Data Upload Configuration, see above, before manually uploading analyses from ICA.
This section provides instructions for manually uploading a completed analysis for a supported BSSH pipeline.
From the Case List, select + New Case.
Select the Import from BaseSpace Sequence Hub option and click the button for Import from BaseSpace Sequence Hub.
Select the desired analysis to upload and click the button to Continue.
Select the desired test definition to apply to the selected analysis.
[Optional] Select a case metadata file to upload (up to five files may be uploaded at one time), refer to Custom Case Data Upload
Click Review.
Review the information and click Submit to begin processing the selected analysis.
[Optional] Select a case metadata file to upload (up to five files may be uploaded at one time), refer to
Use an API to upload data to Connected Insights - Cloud. For more information, refer to .
Connected Insights - Cloud Data Uploader tool supports uploading of VCF files and analysis output for user-provided machines and storage devices and can be downloaded from Configuration -> Data Upload section of the application.
Connected Insights - Local Connected Insights - Local includes the Data Uploader tool that is installed on the DRAGEN server v4 as part of the Connected Insights - Local installation and reads secondary analysis results from the external storage drive that is configured. The Data Uploader tool is detected and identified as Default-CLI-Installation in the Data Upload section of the Configuration page. If Default-CLI-Installation does not show as Online, refer to Software Errors and Corrective Actions.
The Data Uploader logs can be found at /staging/ici/logs/tss-cli/
. With Connected Insights - Local, these logs can also be downloaded from the Activity tab in the Case Details pane on the Cases page. For more information, refer to Case Details.
The secondary analysis input logs can be found /ExternalStorageDriveMountPath>/d53e4b2d-0428-4b3e-92bf-955f7153c360/d53e4b2d-0428-4b3e-92bf-955f7153c360/upload-logs/<DataUploadConfiguredMonitoringLocation>/<SecondaryAnalysisRunFolder>/run_<Timestamp>.json
This section identifies the requirements for uploading data from user storage, enabling ingestion from the network drive, and adding an existing pipeline. For Connected Insights - Cloud, this section also covers how to download and install the Data Uploader tool and has instructions for generating an API key.
To upload data from user storage, the following requirements must be met:
Make sure you have Java 8 or a newer version installed on your computer. You can check by opening a terminal or command prompt and typing java -version. If you don't have it installed, you can download and install it from the official Java website.
Data Uploader with Java Virtual Machine (JVM) 8+ that is compatible with Mac, Windows, or Linux CentOS.
An API key and access to the workgroup to which the data is uploaded. The API key comes with the installer. To generate a newAPI key, refer to [Optional] Generate an API Key.
Administrator access on the computer where Data Uploader is installed.
Before you begin, perform the following actions:
Create custom case data.
Create test components.
Create test definitions.
For more information, refer to Configuration.
Download and install the Data Uploader tool as follows.
In Connected Insights, select the gear at the top right of the page.
In the General tab, under Data Upload, select the From Local Storage tab.
Under Download and Launch the Data Uploader, select the storage device operating system from the drop-down list.
Select Download Data Uploader. A progress bar displays during the download.
Copy the installation directory of the storage device. Make sure that Data Uploader is in a location that has access to your secondary analysis output folder.
Use the following tar command to extract the files: Replace {ici uploader script}
with the applicable file name. tar xvzf {ici uploader script}.tar.gz
❗ Extracting files can vary depending on the operating system. Most terminals support the tar command. For Windows, you can use a zip file extraction application (eg, 7-Zip) to extract the content of the tar folder.
Make sure that the files in the following table are in the unzipped Data Uploader file.
Component
Description
ici-uploader-daemon.jar
Used to run Data Uploader as a daemon.
ici-uploader.jar
Used to run Data Uploader on demand.
uploader-config.json
This file contains the configuration for Data Uploader to allow it to connect to Connected Insights.
ici-uploader.exe
The installer for ici-uploader on Mac or Windows.
Mac: com.illumina.isi.daemon.plist
The installer for ici-uploader on Mac or Windows.
Linux: com.illumina.isi.daemon.service
The installer for ici-uploader on Mac or Windows.
ici-uploader or ici-uploader.exe
The installer for ici-uploader on Mac or Windows.
wrapper.exe
This file is used to set up Data Uploader as a system service on Windows.
README.txt
Third-party licensing information for Data Uploader.
Install Data Uploader as follows:
a. [Windows] Start the command prompt as an administrator and run ici-uploader.exe install
.
❗ For windows environment, user must be in the installation directory to install/uninstall the application.
ici-uploader.exe start-daemon
b. [Mac and Linux] Run the following command on the terminal ./ici-uploader install
❗ If the installation fails, start the Connected Insights installation manually using
ici-uploader start-daemon
c. Follow any prompts in Data Uploader. d. Enter your API key and press Enter.
❗ If you did not download with an auto-generated API key, provide one to the installer when prompted. If you are installing Data Uploader for the first time, you must generate an API key. For more information, refer to [Optional] Generate an API Key.
e. Under Define and Monitor Data Uploads, make sure that your machine displays.
Data Uploader is now set up for auto and manual ingestion. You can also change the name of the installed Data Uploader by selecting ... and Edit Server Name . If you must download any logs associated with the Data Uploader from the last 24 hours, select ... and Download Logs.
❗ If the user is not a system administrator, the daemon can be started with the
ici-uploader start-daemon
command for Mac/Linux or with theici-uploader.exe start-daemon
command for Windows. This command does not run the process as a system service and requires you to start and stop the service manually. To stop the daemon, run the commandici-uploader stop-daemon
After Data Uploader has been downloaded and installed on the user storage, you can set up configurations for the tool to upload data into Connected Insights automatically. Each configured pipeline monitors user storage for new molecular data.
f. User can stop the ici-uploader (running as system service) using the following command ici-uploader stop-service
command for Mac/Linux or with the ici-uploader.exe stop-service
command for Windows. User can start the uploader via ici-uploader start-service
.
❗If data-uploader is already running as system-service, and user also runs the uploader manually via
ici-uploader start-daemon
, it may cause issues. User must stop already running data-uploader withici-uploader stop-service
before starting uploader manually.
If you are using Linux, automatic case upload is already enabled and this section is not applicable. For Mac, contact Illumina Technical Support. For Windows, run the "illumina ICI Auto Launch Service" to make the network drive available for Data Uploader as follows:
When the Data Uploader is running, open the Services application from the Start menu and locate the "illumina ICI Auto Launch Service".
Right-click "illumina ICI Auto Launch Service" and select Properties from the drop-down menu.
In Properties, select the Log On tab and enter your account ID and password. Confirm the password.
Select the General tab.
Select Stop, then select Start to start the service. The network drive is available for ingestion for Data Uploader.
After completing these steps, the Data Uploader details are visible in Connected Insights.
If the computer where the Data uploader is installed is behind a proxy server, then Uploader proxy setting must be enabled. Before you install the Data Uploader, run the following command on the terminal (Linux).
❗ export JDK_JAVA_OPTIONS='-Dhttps.proxyHost=<IP address of the proxy server e.g. 1.2.3.4> -Dhttps.proxyPort=<Port of the proxy server e.g. 8080>'
For Windows, it can be set via following command.
❗ setx JDK_JAVA_OPTIONS "-Dhttps.proxyHost=<IP Address of the proxy server e.g. 1.2.3.4> -Dhttps.proxyPort=<Port of the proxy server e.g. 8080>"
Remember to replace and with the actual values you need.
In Configuration Settings, select the radio button next to Choose compatible pipeline from catalog, refer to Supported Pipelines.
Select a pipeline from the drop-down list (for example, DRAGEN TruSight Oncology 500 Analysis Software v2.5.2).
❗ When running the DRAGEN Somatic Whole Genome pipeline in the Tumor-Only mode, you must set
--output-file-prefix
to matchsample-id
(the RGSM of the sample in the FASTQ list) of the run.
For Test Definition, select the applicable definition.
For Choose a folder to monitor for case metadata (optional), enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.
If you are using Data Uploader for the first time, then you must generate a new API key. All Data Uploader operations require an API key for authentication. The Data Uploader bundle can be downloaded with an autogenerated API key. If the bundle includes an API key, skip this section.
❗ If you are installing Data Uploader on multiple machines, manually create and track your API key by running the following auto-updating and manual run command:
ici-uploader configure --api-key={apiKey}
Make sure that the API key is within single quotation marks (for example,'{SYSTEM_API-KEY}'
)
In Connected Insights, select Manage API Keys from the Account drop-down menu.
Select Generate.
Enter a name for the API key.
To generate a global API key, select All workgroups and roles.
Select Generate.
In the API Key Generated window, select one of the following options:
Show — Reveals the API key.
Download API Key — Downloads the API key in .TXT file format.
Select Close after you have stored the API key.
❗ The API key cannot be viewed again after closing this window. Download the API key or save it in a secure location.
The API key is added to the Manage API keys list.
Perform any of the following actions in the Manage API Keys list:
Select Regenerate to generate a new API with the existing API key name.
Select Edit to edit the API key name or change the workgroups and roles selection.
Select Delete to delete the API key.
The following information is applicable to both Connected Insights - Local and Connected Insights - Cloud. Create each configuration as follows.
Input the {monitoring location} for secondary analysis output.
The monitoring location is the full path to the location where secondary analysis output data is deposited in user's storage.
For example:
/rest/of/storage path/{monitoring location}/{runFolder}/{sample folders, sample sheet, inputs, tumor_fastq_list.csv}
For DRAGEN server v4 standalone results:
/rest/of/storage path/{monitoring location}/{sample name folders}/{sample sheet, inputs, VCFs, .bams}
Associate the workflow schema for this pipeline. a. Under Choose compatible pipeline from the catalog, select the applicable pipeline from the drop-down list (for example, DRAGEN WGS Somatic v4.2). b. If you are running a custom workflow, then upload a workflow schema that corresponds to the data that the configuration uploads. For more information, refer to Custom Pipeline Configuration.
Select the test definition to be associated with cases created by this configuration. For more information, refer to Test Definition Setup.
[Optional] Input the location where custom case data is stored. This location is the full path to where custom case data is deposited in user storage. For more information, refer to Custom Case Data Upload.
Select Save to complete the configuration. If Data Uploader is running, it monitors this configuration for data upload.
Perform any of the following actions below:
Edit You can modify the pipeline configuration by clicking "Edit" against a configuration.
Note: The change in configuration does not impact the Case that are already ingested.
Delete You can delete the pipeline configuration by clicking "Delete" against a configuration.
Requeue (Connected Insights - Local) You can resume or reupload a secondary analysis output folder by clicking "Requeque" against a configuration. Upon clicking "Requeue", application will re-attempt to create the case from the run folders which previoulsy failed due to one of the below errors:
SampleSheet validation error
Case ingestion has stopped due to zero GEs balance
Case ingestion has stopped due to low space on external mounted storage or /staging
Upload an analysis output by running the following command:
ici-uploader analysis upload --folder={path-to-analysis} --pipeline-name={pipelineName}
--folder {path-to-analysis}
— The absolute path to the analysis output to upload into Connected Insights. This folder contains the sample sheet.
--pipeline-name={pipelinename}
— The name of the pipeline created in Connected Insights to apply to cases uploaded from this analysis. Pipeline names must include only letters, numbers, underscores, and dashes. The name cannot include spaces or special characters.
[Optional] --runId={path-to-config}
— The id of the run to be created in place of the run ID determined by the run folder name.
[Optional] --pair-id={pair-id}
— The pair-id of the analysis to upload from the Sample Sheet when limiting upload to a single analysis.
[Optional] --case-display-id={case-display-id}
— The id of the case to be created in place of the pair-id when uploading a single analysis with --pair-id={pair-id}
.
[Optional] ici-uploader logs show
— Run to display the logs in ICI_Data_Upload_log.json.
[Optional] Downloads > ici-uploader logs download
— Download Data Uploader logs in a zipped folder by running the command prompt.
Upload the custom case data file associated to one or more cases by running the following command:
ici-uploader case-data --filePath={absolute-path-of-csv-file}
For more information on custom case data files, refer to Custom Case Data Upload.
The following table shows the approximate time it takes for example datasets to upload and receive the Ready for Interpretation status. The duration is evaluated by analyzing a batch of samples ingested through the Connected Insights tertiary analysis in conjunction with the DRAGEN secondary analysis performed on similar datasets on the DRAGEN server v4.
Sample Size
Duration
CPU Usage
Memory Usage
Network I/O
8 Samples TSO 500 DNA cases with an average of 90,000 variants
55 minutes
88.4%
26.04%
write: 22573.098 kb/s, read: 25442.099 kb/s
8 Samples TSO 500 ctDNA
1h 5m
96.65%
51.0%
write: 288181.985 kb/s, read: 176519.153 kb/s
8 Samples WGS Tumor Normal
15m per sample
97.05%
46.32%
write: 22336.569 kb/s, read: 35373.43 kb/s
8 Samples WGS Tumor Only
3h 41m 21s per sample
94.05%
49.9.32%
write: 377878.867.569 kb/s, read: 197353.179.43 kb/s
This section describes how to upload data with Connected Insights and includes setup instructions. Connected Insights requires the following data types:
Secondary analysis output data — Connected Insights is compatible with a broad range of analysis pipeline outputs. To configure the input file format, select a compatible pipeline (for example, DRAGEN TruSight Oncology 500 v2.5.2) or configure a custom pipeline.
Case, subject, and sample data — For more information on custom case data, refer to Custom Case Data Upload
Connected Insights - Cloud allows data upload from the following sources:
User storage — Data ingestion is managed by the Data Uploader tool that supports uploading variant call format (VCF) files and other analysis output files. For more information, refer to Data Upload from User Storage (Connected Insights - Local Storage).
Cloud storage on ICA and BSSH — Data ingestion can be configured through the Data Upload page or API. For more information on using the Data Upload tab or API, refer to Data Upload from ICA (Connected Insights - Cloud).
Before uploading, create the following items:
Custom case data (if applicable)
Test components
Test definitions For more information on creating test components and definitions, refer to Test Definition Setup.
Connected Insights directly supports the following pipelines:
Pipeline Name and YAML
Support (Local Storage / ICA)
Local Storage and ICA
Local Storage and ICA
Local Storage
Local Storage
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage and ICA
Local Storage
Local Storage
Connected Insights imports variant calls for the following variant types in the Variant Call File (VCF) file format (v4.1 and later):
Small variants (SNVs, MNVs, and small indels)
Structural variants (SVs)
Copy number variants (CNVs)
RNA fusion variants
RNA splice variants
❗ Imported VCF files must contain at least one sample and be sorted correctly to ensure valid display of results in Connected Insights.
The following sample fields are supported for each variant type:
¹ The following GT values are interpreted as an absence of the reported variant and are not imported:
.
./.
0
0/0
¹ The following GT values are expected given the CN of the variant:
0
: The copy number is normal in a region expected to be haploid.
1
: The copy number differs from normal in a region expected to be haploid.
0/0
: The copy number is normal in a region expected to be diploid.
0/1
: The copy number differs from normal and is not a complete loss in a region expected to be diploid.
1/1
: The copy number is a complete loss in a region expected to be diploid.
¹ The following GT values are interpreted as an absence of the reported variant and are not imported:
.
./.
0
0/0
Sample Field
VCF Fields
Details
Allele Depths
AD
The read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Total Depth
DP
The total read support for all alleles at this position. Will be calculated as the sum of all allele depths if not provided.
Variant Read Frequency / Variant Allele Frequency
VF (or derived from AD)
The proportion of reads supporting each alternate allele. Expected as a comma separated list of values for each alternate allele. Will be calculated based on allele depths and total depth if not provided.
Genotype
GT¹
The genotype of the sample at the given position.
Sample Field
VCF Fields
Details
Fold Change
FC, SM
Estimated fold change for the copy number variant.
Copy Number
CN
Estimated absolute copy number for the copy number variant.
Minor-haplotype Copy Number
MCN
Estimated absolute copy number for the minor-haplotype of a copy number variant. When MCN is zero the copy number variant can be determined to be LOH.
Genotype
(Derived from CN when available)¹
The genotype of the sample at the given position.
Sample Field
VCF Fields
Details
Paired Reads
PR
The paired read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Split Reads
SR
The split read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Supporting Reads
(Derived from PR and SR)
The cumulative read support from split reads and paired reads for variants called at this position.
Total Depth
(Derived from PR and SR)
The total reads for all alleles called at this position.
Variant Read Frequency / Variant Allele Frequency
(Derived from PR and SR)
The proportion of reads supporting each alternate allele. Calculated based on supporting reads and total depth.
Genotype
GT¹ (or derived from PR and SR)
The genotype of the sample at the given position.
Tumor Mutational Burden (TMB)
JSON
*.tmb.json
{"TmbPerMb": 3.15823318}
CSV
*.tmb.metrics.csv
TMB, 5.51
Microsatellite Instability (MSI)
JSON
.msi.json
{ "PercentageUnstableSites": 3.19 }
Genomic Instability Score (GIS)
JSON
.gis.json
{ "MYRIAD": { "score": { "GIS": "3" } } }
Connected Insights accepts metadata information about case, subject, and the sample in CSV format to use it for the case creation, display, and reporting.\
Overview of uploading custom case data. Each step is further detailed below:
Download the Case Metadata template file from the Connected Insights Cases page.
Edit the Case Metadata template file to include the desired data.
Upload the Case Metadata file via the Connected Insights user-interface, Data Uploader, or API.
Download the Case Metadata template file from Connected Insights:
Navigate to the Connected Insights Cases page.
Select Upload Case Metadata (top-right corner of the page).
Select Upload CSV.
Click attached template to download the Case Metadata template CSV file.
❗ Starting from the template can help guide formatting, however, any CSV file that follows the content formatting requirements detailed below can be used to upload case metadata. Files containing non-English characters must be encoded as UTF-8.
Edit the case metadata file to add and correctly format the desired information. See the example at the bottom of this subsection:
Open the CSV file with software capable of editing CSV files (for example, a text editor; if using Excel, be cautious of potential unexpected formatting and character additions).
Ensure the following formatting requirements are met:
The first row must contain the headers of the fields to be updated. Each subsequent row contains data.
Each row must contain information in the Sample_ID
and Case_ID
columns. Sample_ID
values are case-sensitive.
Fields that require a date must be in yyyy-mm-dd format.
Tumor_Type
values must be the SNOMEDCT ID for the disease.
The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the Case Details or assertion form. It can also be found by navigating to the Configuration page, Disease Configuration section, clicking New +, then searching in Associated Disease Term(s). Lastly, the ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser website.
When the tumor type is unknown, SNOMETCT ID 363346000
("Malignant neoplastic disease") or 255052006
("Malignant tumor of unknown origin") can be used. However, the accuracy of actionability will be higher the more specific the tumor type provided is.
All other columns must follow formatting requirements specified in the Case Metadata template file.
Data in columns for fields defined in the Custom Case Data Definition section of the Configuration page must match the formatting requirements based on whether the data type is text, a number, or a date (for details, refer to Custom Case Data Definition).
Once all rows are added, save. \
Refer to the following formatting example:
Case metadata can be uploaded in three ways. See below for instructions on each method:
Upload from local storage via Connected Insights user-interface.
Upload from local storage via the Data Uploader.
Upload from local storage via an API. \
Upload Case Metadata files via the Connected Insights user-interface\
Navigate to the Connected Insights Cases page.
Select Upload Case Metadata (top-right corner of the page).
Select Upload CSV.
Upload the file. \
Upload Case Metadata files via the Data Uploader\
Install the Data Uploader in a location with access to the Case Metadata file (refer to Data Upload from User Storage). Proceed to the next step if it is already installed.
Ensure the Case Metadata file is in a location accessible by the Data Uploader.
Create a new pipeline or edit an existing pipeline and set the Choose a folder to monitor for case metadata (optional) field to the file path of the directory containing the Case Metadata file (refer to Data Upload from User Storage).
Next time the Data Uploader daemon runs, the file will be ingested (may be ~10-15 minutes). \
Upload Case Metadata files via an API\
Refer to Case APIs, subsection "Upload Metadata files via an API". \
Update case metadata of an existing case\
To update an existing case, use the Sample_ID
and Case_ID
of the existing case and add updated information in additional fields. It may be useful to use an API in the Case Metadata section of the API page to retrieve the existing case metadata for a case (for details on using APIs, see APIs). The overwriting and preservation logic is as follows:
Overwriting Logic: If data conflicts, new data will overwrite existing data. For the Tags
field, new data will be added as an additional tag and will not overwrite any existing tag(s).
Preservation Logic: Existing data will not be overwritten if the data for the field is the same as existing or no data is entered. Additionally, fields not included in the Case Metadata file will not be affected (for example, if the Case Metadata file does not include the Date of Birth
field, existing data in that field will not be updated).
\
Create a new case by uploading a Case Metadata file\
To create a new case, use a new Sample_ID
and a new Case_ID
and upload the Case Metadata file before ingestion of molecular data (for example, VCF files or other secondary analysis output files).
In order for the molecular data to associate with the correct case metadata, the Sample_ID
and Case_ID
must match (for example, the Sample_ID
value used in the Case Metadata file should match the Sample_ID
value in the sample sheet).
Cases created this way will have a Status column value of Awaiting Molecular Data and a Workflow Name column value of N/A in the Case List on the Cases page. The Status and Workflow Name will update after completely uploading molecular data.
The Case Metadata Uploads page displays Case Metadata file upload history and error messages:
Navigate to the Connected Insights Cases page.
Select Upload Case Metadata (top-right corner of the page).
Select View Past Uploads.
The table displays Case Metadata file upload history, status, and details.
If errors occur, the Details column will state this and provide a link to download a copy of the Case Metadata file annotated with error messages for each row.
The custom pipeline option is designed to make Connected Insights understand the structure of the secondary analysis output files produced by a pipeline that is not yet compatible with the software. This option also requires the creation of a workflow schema file that describes the content and location of the secondary analysis output files. For an example of a how to configure a custom pipeline for TSO 500 Analysis Module v2.2, refer to Custom Pipeline Configuration Example.
In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.
To set up data upload for secondary analysis output data that is not yet compatible with Connected Insights, create a workflow schema file (.yaml format). This file specifies the files in the secondary analysis output data that Connected Insights analyzes. This file is only used when configuring a custom pipeline.
Download a workflow schema file template from Connected Insights as follows.
On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)
This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:
Required
successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.
Optional
sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.
This section specifies the sample sheet file path found in the analysis folder, the data header row marker, and column aliases. The following information is used to create cases in Connected Insights:
Required
filePath — Adding a file path to the sample sheet for the cases.
Optional
columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
sampleId — Appears in the Case ID column of the Cases page.
caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
sex — Aliased to the header title of the column containing the sex of each sample.
Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).
Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:
{pairId}
{sampleId.DNA}
{sampleId.RNA}
When using the workflow scheme file template downloaded from the Configuration page, lines for files that are not uploaded can be deleted. The , Optional
designation can be removed unless the file is an optional file for the pipeline.
File
Compatibility
gisFile
JSON containing genomic instability score data.
msiFile
JSON containing microsatellite instability data.
tmbFile
JSON or CSV file containing tumor mutational burden data.
purityPloidyFiles
TSV or VCF file containing purity and ploidy estimates.
snvFiles
VCF files containing small variant calls.
cnvFiles
VCF files containing copy number variant calls.
svFiles
VCF files containing structural variant calls. The structural variant caller can also call longer small variant insertion/deletion/delins events and can duplicate calls from the small variant caller.
rnaSpliceFiles
VCF files containing RNA splice variant calls.
rnaFusionFiles
VCF files containing RNA fusion variant calls.
metricsQCFile
TSV file containing QC metrics data.
The following table shows specific sample visualization files used for IGV. File formats include .bam and .bam.bai. For more information, refer to IGV Visualizations. Under alignment Files, the , Optional
designation can be removed unless the file is an optional file for the pipeline.
File
Compatibility
dnaBamFile
BAM file for the DNA alignment (under alignmentFiles).
dnaBaiFile
BAI file for the DNA alignment (under alignmentFiles).
rnaBamFile
BAM file for the RNA alignment (under alignmentFiles).
rnaBaiFile
BAI file for the RNA alignment (under alignmentFiles).
coverageFiles
TSV file containing coverage data (under visualizationFiles).
balleleFiles
BEDGraph containing b-allele data (under visualizationFiles).
The following example shows the custom pipeline configuration process using Local Run Manager TruSight Oncology 500 Analysis Module v2.2. For details on this process, refer to Custom Pipeline Configuration.
Uploaded data is organized as cases that provide details about the sample. A case is a secondary analysis result that has been imported and annotated.These files include VCF files for genetic variants (or CSV files for TruSight Oncology 500 RNA Fusion variants). The cases page lists all cases for your account or workgroup. The following files can be uploaded, but are not required:
BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics
Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data]
section of the sample sheet:
Using this example, Connected Insights creates the following cases:
Case ID
Workflow Type
Disease
Sample ID
Sample Type
Control-Case
DNA and RNA
Malignant tumor of unknown origin (SNOMEDCT ID255052006)
DNA_Control RNA_Control
DNA RNA
Lung_001
DNA and RNA
Non-small cell lung cancer (SNOMEDCT ID 254637007)
Lung_DNA_001 Lung_RNA_001
DNA RNA
Breast_002
DNA
Malignant tumor of breast (SNOMEDCT ID 254837009)
Breast_DNA_002
DNA
Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:
For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.
The following example shows the workflow schema file structure:
❗ If
, Optional
is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.