Custom Pipeline Configuration

The custom pipeline option is designed to make Connected Insights understand the structure of the secondary analysis output files produced by a pipeline that is not yet compatible with the software. This option also requires the creation of a workflow schema file that describes the content and location of the secondary analysis output files. For an example of a how to configure a custom pipeline for TSO 500 Analysis Module v2.2, refer to Custom Pipeline Configuration Example.

Select Custom Pipeline

In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.

Create a Workflow Schema File

To set up data upload for secondary analysis output data that is not yet compatible with Connected Insights, create a workflow schema file (.yaml format). This file specifies the files in the secondary analysis output data that Connected Insights analyzes. This file is only used when configuring a custom pipeline.

Download a workflow schema file template from Connected Insights as follows.

On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)

Pipeline

This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:

Required

successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.

Optional

sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.

Sample Sheet

This section specifies the sample sheet file path found in the analysis folder, the data header row marker, and column aliases. The following information is used to create cases in Connected Insights:

Required

filePath — Adding a file path to the sample sheet for the cases.

Optional

columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
- sampleId — Appears in the Case ID column of the Cases page.
- caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
- Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
- sex — Aliased to the header title of the column containing the sex of each sample.
- Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
- id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. If no value is specified for "id", then column-name is defaulted to "Tumor_Type". The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
- name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).

sampleSheet:
  filePath: "SampleSheet.csv"
  columnAliases:
    sampleId: Sample_ID
    sampleType: Manifest
    caseId: Pair_ID
    disease:
      id: Tumor_Type
      name: Sample_Description

dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).

Joint Files

Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:

{pairId}
{sampleId.DNA}
{sampleId.RNA}

When using the workflow scheme file template downloaded from the Configuration page, lines for files that are not uploaded can be deleted. The , Optional designation can be removed unless the file is an optional file for the pipeline.

File

Compatibility

gisFile

JSON containing genomic instability score data.

msiFile

JSON containing microsatellite instability data.

tmbFile

JSON or CSV file containing tumor mutational burden data.

purityPloidyFiles

TSV or VCF file containing purity and ploidy estimates.

snvFiles

VCF files containing small variant calls.

cnvFiles

VCF files containing copy number variant calls.

svFiles

VCF files containing structural variant calls. The structural variant caller can also call longer small variant insertion/deletion/delins events and can duplicate calls from the small variant caller.

rnaSpliceFiles

VCF files containing RNA splice variant calls.

rnaFusionFiles

VCF files containing RNA fusion variant calls.

metricsQCFile

TSV file containing QC metrics data.

Sample Files

The following table shows specific sample visualization files used for IGV. File formats include .bam and .bam.bai. For more information, refer to IGV Visualizations. Under alignment Files, the , Optional designation can be removed unless the file is an optional file for the pipeline.

File

Compatibility

dnaBamFile

BAM file for the DNA alignment (under alignmentFiles).

dnaBaiFile

BAI file for the DNA alignment (under alignmentFiles).

rnaBamFile

BAM file for the RNA alignment (under alignmentFiles).

rnaBaiFile

BAI file for the RNA alignment (under alignmentFiles).

coverageFiles

TSV file containing coverage data (under visualizationFiles).

balleleFiles

BEDGraph containing b-allele data (under visualizationFiles).

Custom Pipeline Configuration Example

The following example shows the custom pipeline configuration process using Local Run Manager TruSight Oncology 500 Analysis Module v2.2. For details on this process, refer to Custom Pipeline Configuration.

Uploaded Data

Uploaded data is organized as cases that provide details about the sample. A case is a secondary analysis result that has been imported and annotated.These files include VCF files for genetic variants (or CSV files for TruSight Oncology 500 RNA Fusion variants). The cases page lists all cases for your account or workgroup. The following files can be uploaded, but are not required:

BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics

Example Sample Sheet

Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data] section of the sample sheet:

[Data]
Sample_ID, Sample_Type, Pair_ID, Tumor_Type
DNA_Control, DNA, Control-Case, 255052006
RNA_Control, RNA, Control-Case, 255052006
Lung_DNA_001, DNA, Lung_001, 254637007
Lung_RNA_001, RNA, Lung_001, 254637007
Breast_DNA_002, DNA, Breast_002, 254837009

Using this example, Connected Insights creates the following cases:

Case ID

Workflow Type

Disease

Sample ID

Sample Type

Control-Case

DNA and RNA

Malignant tumor of unknown origin (SNOMEDCT ID255052006)

DNA_Control RNA_Control

DNA RNA

Lung_001

DNA and RNA

Non-small cell lung cancer (SNOMEDCT ID 254637007)

Lung_DNA_001 Lung_RNA_001

DNA RNA

Breast_002

DNA

Malignant tumor of breast (SNOMEDCT ID 254837009)

Breast_DNA_002

DNA

Example Analysis Results

Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:

Example Workflow Schema File

For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.

The following example shows the workflow schema file structure:

pipeline:
  successMarkerFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv
  failureMarkerFile: "./failure.txt"
sampleSheet:
  filePath: SampleSheet.csv
jointFiles:
  msiFile: Logs_Intermediates/Msi/{sampleId.DNA}/{sampleId.DNA}.msi.json, Optional
  tmbFile: Logs_Intermediates/Tmb/{sampleId.DNA}/{sampleId.DNA}.tmb.json, Optional
  snvFiles: Logs_Intermediates/VariantMatching/{sampleId.DNA}/{sampleId.DNA}_MergedSmallVariants.genome.vcf, Optional
  cnvFiles:
    - Logs_Intermediates/CnvCaller/{sampleId.DNA}/{sampleId.DNA}_CopyNumberVariants.vcf, Optional
  rnaSpliceFiles: Logs_Intermediates/RnaSpliceVariantCalling/{sampleId.RNA}/{sampleId.RNA}_SpliceVariants.vcf, Optional
  rnaFusionFiles:
    - Logs_Intermediates/FusionCalling/{sampleId.RNA}/{sampleId.RNA}.vcf.gz, Optional
    - Logs_Intermediates/RnaFusionMerge/{sampleId.RNA}/{sampleId.RNA}_AllFusions.csv, Optional
  metricsQCFile: Logs_Intermediates/MetricsOutput/MetricsOutput.tsv, Optional
sampleFiles:
  tumorPairId:
    alignmentFiles:
      dnaBamFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam, Optional
      dnaBaiFile: Logs_Intermediates/DnaAlignment/{sampleId.DNA}/{sampleId.DNA}.bam.bai, Optional
      rnaBamFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam, Optional
      rnaBaiFile: Logs_Intermediates/RnaAlignment/{sampleId.RNA}/{sampleId.RNA}.bam.bai, Optional

❗ If , Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.

PreviousLocal Run Manager TruSight Tumor 15 Analysis Module v2.1 NextVCF Input Requirement

Last updated 2 months ago

Was this helpful?