The custom pipeline option is designed to make Connected Insights understand the structure of the secondary analysis output files produced by a pipeline that is not yet compatible with the software. This option also requires the creation of a workflow schema file that describes the content and location of the secondary analysis output files. For an example of a how to configure a custom pipeline for TSO 500 Analysis Module v2.2, refer to Custom Pipeline Configuration Example.
In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.
To set up data upload for secondary analysis output data that is not yet compatible with Connected Insights, create a workflow schema file (.yaml format). This file specifies the files in the secondary analysis output data that Connected Insights analyzes. This file is only used when configuring a custom pipeline.
Download a workflow schema file template from Connected Insights as follows.
On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)
This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:
Required
successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.
Optional
sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.
This section specifies the sample sheet file path found in the analysis folder, the data header row marker, and column aliases. The following information is used to create cases in Connected Insights:
Required
filePath — Adding a file path to the sample sheet for the cases.
Optional
columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
sampleId — Appears in the Case ID column of the Cases page.
caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
sex — Aliased to the header title of the column containing the sex of each sample.
Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).
Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:
{pairId}
{sampleId.DNA}
{sampleId.RNA}
When using the workflow scheme file template downloaded from the Configuration page, lines for files that are not uploaded can be deleted. The , Optional
designation can be removed unless the file is an optional file for the pipeline.
The following table shows specific sample visualization files used for IGV. File formats include .bam and .bam.bai. For more information, refer to IGV Visualizations. Under alignment Files, the , Optional
designation can be removed unless the file is an optional file for the pipeline.
The following example shows the custom pipeline configuration process using Local Run Manager TruSight Oncology 500 Analysis Module v2.2. For details on this process, refer to Custom Pipeline Configuration.
Uploaded data is organized as cases that provide details about the sample. A case is a secondary analysis result that has been imported and annotated.These files include VCF files for genetic variants (or CSV files for TruSight Oncology 500 RNA Fusion variants). The cases page lists all cases for your account or workgroup. The following files can be uploaded, but are not required:
BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics
Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data]
section of the sample sheet:
Using this example, Connected Insights creates the following cases:
Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:
For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.
The following example shows the workflow schema file structure:
❗ If
, Optional
is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
File
Compatibility
gisFile
JSON containing genomic instability score data.
msiFile
JSON containing microsatellite instability data.
tmbFile
JSON or CSV file containing tumor mutational burden data.
purityPloidyFiles
TSV or VCF file containing purity and ploidy estimates.
snvFiles
VCF files containing small variant calls.
cnvFiles
VCF files containing copy number variant calls.
svFiles
VCF files containing structural variant calls. The structural variant caller can also call longer small variant insertion/deletion/delins events and can duplicate calls from the small variant caller.
rnaSpliceFiles
VCF files containing RNA splice variant calls.
rnaFusionFiles
VCF files containing RNA fusion variant calls.
metricsQCFile
TSV file containing QC metrics data.
File
Compatibility
dnaBamFile
BAM file for the DNA alignment (under alignmentFiles).
dnaBaiFile
BAI file for the DNA alignment (under alignmentFiles).
rnaBamFile
BAM file for the RNA alignment (under alignmentFiles).
rnaBaiFile
BAI file for the RNA alignment (under alignmentFiles).
coverageFiles
TSV file containing coverage data (under visualizationFiles).
balleleFiles
BEDGraph containing b-allele data (under visualizationFiles).
Case ID
Workflow Type
Disease
Sample ID
Sample Type
Control-Case
DNA and RNA
Malignant tumor of unknown origin (SNOMEDCT ID255052006)
DNA_Control RNA_Control
DNA RNA
Lung_001
DNA and RNA
Non-small cell lung cancer (SNOMEDCT ID 254637007)
Lung_DNA_001 Lung_RNA_001
DNA RNA
Breast_002
DNA
Malignant tumor of breast (SNOMEDCT ID 254837009)
Breast_DNA_002
DNA
Connected Insights imports variant calls for the following variant types in the Variant Call File (VCF) file format (v4.1 and later):
Small variants (SNVs, MNVs, and small indels)
Structural variants (SVs)
Copy number variants (CNVs)
RNA fusion variants
RNA splice variants
❗ Imported VCF files must contain at least one sample and be sorted correctly to ensure valid display of results in Connected Insights.
The following sample fields are supported for each variant type:
¹ The following GT values are interpreted as an absence of the reported variant and are not imported:
.
./.
0
0/0
¹ The following GT values are expected given the CN of the variant:
0
: The copy number is normal in a region expected to be haploid.
1
: The copy number differs from normal in a region expected to be haploid.
0/0
: The copy number is normal in a region expected to be diploid.
0/1
: The copy number differs from normal and is not a complete loss in a region expected to be diploid.
1/1
: The copy number is a complete loss in a region expected to be diploid.
¹ The following GT values are interpreted as an absence of the reported variant and are not imported:
.
./.
0
0/0
Sample Field
VCF Fields
Details
Allele Depths
AD
The read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Total Depth
DP
The total read support for all alleles at this position. Will be calculated as the sum of all allele depths if not provided.
Variant Read Frequency / Variant Allele Frequency
VF (or derived from AD)
The proportion of reads supporting each alternate allele. Expected as a comma separated list of values for each alternate allele. Will be calculated based on allele depths and total depth if not provided.
Genotype
GT¹
The genotype of the sample at the given position.
Sample Field
VCF Fields
Details
Fold Change
FC, SM
Estimated fold change for the copy number variant.
Copy Number
CN
Estimated absolute copy number for the copy number variant.
Minor-haplotype Copy Number
MCN
Estimated absolute copy number for the minor-haplotype of a copy number variant. When MCN is zero the copy number variant can be determined to be LOH.
Genotype
(Derived from CN when available)¹
The genotype of the sample at the given position.
Sample Field
VCF Fields
Details
Paired Reads
PR
The paired read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Split Reads
SR
The split read support for variants called at this position. Expected as a comma separated list of values for the reference allele followed by each alternate allele.
Supporting Reads
(Derived from PR and SR)
The cumulative read support from split reads and paired reads for variants called at this position.
Total Depth
(Derived from PR and SR)
The total reads for all alleles called at this position.
Variant Read Frequency / Variant Allele Frequency
(Derived from PR and SR)
The proportion of reads supporting each alternate allele. Calculated based on supporting reads and total depth.
Genotype
GT¹ (or derived from PR and SR)
The genotype of the sample at the given position.