1 of 2

Custom Pipeline Configuration

The custom pipeline option is designed to make Connected Insights understand the structure of the secondary analysis output files produced by a pipeline that is not yet compatible with the software. This option also requires the creation of a workflow schema file that describes the content and location of the secondary analysis output files. For an example of a how to configure a custom pipeline for TSO 500 Analysis Module v2.2, refer to Custom Pipeline Configuration Example.

Select Custom Pipeline

In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.

Create a Workflow Schema File

To set up data upload for secondary analysis output data that is not yet compatible with Connected Insights, create a workflow schema file (.yaml format). This file specifies the files in the secondary analysis output data that Connected Insights analyzes. This file is only used when configuring a custom pipeline.

Download a workflow schema file template from Connected Insights as follows.

On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)

Pipeline

This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:

Required

successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.

Optional

sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.

Sample Sheet

This section specifies the sample sheet file path found in the analysis folder, the data header row marker, and column aliases. The following information is used to create cases in Connected Insights:

Required

filePath — Adding a file path to the sample sheet for the cases.

Optional

columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
- sampleId — Appears in the Case ID column of the Cases page.
- caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
- Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
- sex — Aliased to the header title of the column containing the sex of each sample.
- Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
- id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
- name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).

Joint Files

Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:

{pairId}
{sampleId.DNA}
{sampleId.RNA}

When using the workflow scheme file template downloaded from the Configuration page, lines for files that are not uploaded can be deleted. The , Optional designation can be removed unless the file is an optional file for the pipeline.

Sample Files

The following table shows specific sample visualization files used for IGV. File formats include .bam and .bam.bai. For more information, refer to IGV Visualizations. Under alignment Files, the , Optional designation can be removed unless the file is an optional file for the pipeline.

Custom Pipeline Configuration Example

The following example shows the custom pipeline configuration process using Local Run Manager TruSight Oncology 500 Analysis Module v2.2. For details on this process, refer to Custom Pipeline Configuration.

Uploaded Data

Uploaded data is organized as cases that provide details about the sample. A case is a secondary analysis result that has been imported and annotated.These files include VCF files for genetic variants (or CSV files for TruSight Oncology 500 RNA Fusion variants). The cases page lists all cases for your account or workgroup. The following files can be uploaded, but are not required:

BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics

Example Sample Sheet

Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data] section of the sample sheet:

[Data]
Sample_ID, Sample_Type, Pair_ID, Tumor_Type
DNA_Control, DNA, Control-Case, 255052006
RNA_Control, RNA, Control-Case, 255052006
Lung_DNA_001, DNA, Lung_001, 254637007
Lung_RNA_001, RNA, Lung_001, 254637007
Breast_DNA_002, DNA, Breast_002, 254837009

Using this example, Connected Insights creates the following cases:

Example Analysis Results

Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:

Example Workflow Schema File

For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.

The following example shows the workflow schema file structure:

pipeline:
  successMarkerFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv
  failureMarkerFile: "./failure.txt"
sampleSheet:
  filePath: SampleSheet.csv
jointFiles:
  msiFile: Logs_Intermediates/Msi/{sampleId.DNA}/{sampleId.DNA}.msi.json, Optional
  tmbFile: Logs_Intermediates/Tmb/{sampleId.DNA}/{sampleId.DNA}.tmb.json, Optional
  snvFiles: Logs_Intermediates/VariantMatching/{sampleId.DNA}/{sampleId.DNA}_MergedSmallVariants.genome.vcf, Optional
  cnvFiles:
    - Logs_Intermediates/CnvCaller/{sampleId.DNA}/{sampleId.DNA}_CopyNumberVariants.vcf, Optional
  rnaSpliceFiles: Logs_Intermediates/RnaSpliceVariantCalling/{sampleId.RNA}/{sampleId.RNA}_SpliceVariants.vcf, Optional
  rnaFusionFiles:
    - Logs_Intermediates/FusionCalling/{sampleId.RNA}/{sampleId.RNA}.vcf.gz, Optional
    - Logs_Intermediates/RnaFusionMerge/{sampleId.RNA}/{sampleId.RNA}_AllFusions.csv, Optional
  metricsQCFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv, Optional
sampleFiles:
  tumorPairId:
    alignmentFiles:
      dnaBamFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam, Optional
      dnaBaiFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam.bai, Optional
      rnaBamFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam, Optional
      rnaBaiFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam.bai, Optional

❗ If , Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.

VCF Input Requirement

Connected Insights imports variant calls for the following variant types in the Variant Call File (VCF) file format (v4.1 and later):

Small variants (SNVs, MNVs, and small indels)
Structural variants (SVs)
Copy number variants (CNVs)
RNA fusion variants
RNA splice variants

❗ Imported VCF files must contain at least one sample and be sorted correctly to ensure valid display of results in Connected Insights.

The following sample fields are supported for each variant type:

Small variants

¹ The following GT values are interpreted as an absence of the reported variant and are not imported:

.
./.
0
0/0

Copy number variants

¹ The following GT values are expected given the CN of the variant:

0: The copy number is normal in a region expected to be haploid.
1: The copy number differs from normal in a region expected to be haploid.
0/0: The copy number is normal in a region expected to be diploid.
0/1: The copy number differs from normal and is not a complete loss in a region expected to be diploid.
1/1: The copy number is a complete loss in a region expected to be diploid.

Structural variants and RNA fusion variants

¹ The following GT values are interpreted as an absence of the reported variant and are not imported:

.
./.
0
0/0

Custom Pipeline Configuration

Select Custom Pipeline

In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.

Create a Workflow Schema File

Download a workflow schema file template from Connected Insights as follows.

On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)

Pipeline

This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:

Required

successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.

Optional

sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.

Sample Sheet

Required

filePath — Adding a file path to the sample sheet for the cases.

Optional

columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
- sampleId — Appears in the Case ID column of the Cases page.
- caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
- Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
- sex — Aliased to the header title of the column containing the sex of each sample.
- Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
- id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
- name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).

Joint Files

Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:

{pairId}
{sampleId.DNA}
{sampleId.RNA}

Sample Files

Custom Pipeline Configuration Example

Uploaded Data

BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics

Example Sample Sheet

Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data] section of the sample sheet:

[Data]
Sample_ID, Sample_Type, Pair_ID, Tumor_Type
DNA_Control, DNA, Control-Case, 255052006
RNA_Control, RNA, Control-Case, 255052006
Lung_DNA_001, DNA, Lung_001, 254637007
Lung_RNA_001, RNA, Lung_001, 254637007
Breast_DNA_002, DNA, Breast_002, 254837009

Using this example, Connected Insights creates the following cases:

Example Analysis Results

Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:

Example Workflow Schema File

For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.

The following example shows the workflow schema file structure:

pipeline:
  successMarkerFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv
  failureMarkerFile: "./failure.txt"
sampleSheet:
  filePath: SampleSheet.csv
jointFiles:
  msiFile: Logs_Intermediates/Msi/{sampleId.DNA}/{sampleId.DNA}.msi.json, Optional
  tmbFile: Logs_Intermediates/Tmb/{sampleId.DNA}/{sampleId.DNA}.tmb.json, Optional
  snvFiles: Logs_Intermediates/VariantMatching/{sampleId.DNA}/{sampleId.DNA}_MergedSmallVariants.genome.vcf, Optional
  cnvFiles:
    - Logs_Intermediates/CnvCaller/{sampleId.DNA}/{sampleId.DNA}_CopyNumberVariants.vcf, Optional
  rnaSpliceFiles: Logs_Intermediates/RnaSpliceVariantCalling/{sampleId.RNA}/{sampleId.RNA}_SpliceVariants.vcf, Optional
  rnaFusionFiles:
    - Logs_Intermediates/FusionCalling/{sampleId.RNA}/{sampleId.RNA}.vcf.gz, Optional
    - Logs_Intermediates/RnaFusionMerge/{sampleId.RNA}/{sampleId.RNA}_AllFusions.csv, Optional
  metricsQCFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv, Optional
sampleFiles:
  tumorPairId:
    alignmentFiles:
      dnaBamFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam, Optional
      dnaBaiFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam.bai, Optional
      rnaBamFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam, Optional
      rnaBaiFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam.bai, Optional

❗ If , Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.

VCF Input Requirement

Connected Insights imports variant calls for the following variant types in the Variant Call File (VCF) file format (v4.1 and later):

Small variants (SNVs, MNVs, and small indels)
Structural variants (SVs)
Copy number variants (CNVs)
RNA fusion variants
RNA splice variants

❗ Imported VCF files must contain at least one sample and be sorted correctly to ensure valid display of results in Connected Insights.

The following sample fields are supported for each variant type:

Small variants

¹ The following GT values are interpreted as an absence of the reported variant and are not imported:

.
./.
0
0/0

Copy number variants

¹ The following GT values are expected given the CN of the variant:

0: The copy number is normal in a region expected to be haploid.
1: The copy number differs from normal in a region expected to be haploid.
0/0: The copy number is normal in a region expected to be diploid.
0/1: The copy number differs from normal and is not a complete loss in a region expected to be diploid.
1/1: The copy number is a complete loss in a region expected to be diploid.

Structural variants and RNA fusion variants

¹ The following GT values are interpreted as an absence of the reported variant and are not imported:

.
./.
0
0/0