Custom Pipeline Configuration
Custom Pipeline Configuration
The custom pipeline option is designed to make Connected Insights understand the structure of the secondary analysis output files produced by a pipeline that is not yet compatible with the software. This option also requires the creation of a workflow schema file that describes the content and location of the secondary analysis output files. For an example of a how to configure a custom pipeline for TSO 500 Analysis Module v2.2, refer to Custom Pipeline Configuration Example.
Select Custom Pipeline
In Configuration Settings, select the radio button next to Configure custom pipeline.
If necessary, create a workflow schema file by selecting Download the template file. For more information on setting up the template file, refer to Create a Workflow Schema File
Select Choose File to upload your template file.
For Custom Pipeline Name, enter a name for the pipeline.
For Test Definition, select the applicable definition.
For the Choose a folder to monitor for case metadata (optional) field, enter the path for the folder in the secondary analysis folder created by Data Uploader.
Select Save.
Create a Workflow Schema File
To set up data upload for secondary analysis output data that is not yet compatible with Connected Insights, create a workflow schema file (.yaml format). This file specifies the files in the secondary analysis output data that Connected Insights analyzes. This file is only used when configuring a custom pipeline.
Download a workflow schema file template from Connected Insights as follows.
On the top toolbar, select Configuration.
Select the General tab.
Select Data Upload.
Select From Local Storage.
For Define and Monitor Data Uploads, select Add Path.
For Configuration Settings, select the radio button next to Configure custom pipeline.
Select Download a template file to download the workflow schema template file. If you do not want to create a pipeline, select Cancel. When prompted, select Yes, clear.
Edit the file as needed to reflect the files for upload. Refer to the following topics that pertain to the workflow schema template file sections:
❗ If, Optional is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
After the workflow schema file is edited, create a pipeline. Then, select Configure manually under Configuration Settings.
Select Choose File and upload the edited workflow schema file.
Complete the remaining fields and save the pipeline.
❗ If this pipeline is used for manual uploading, make sure that the pipeline name only consists of numbers, letters, underscores, and dashes. The name cannot include spaces or special characters. This name is used in the --pipeline-name= command listed in On-Demand Data Upload from User Storage (Connected Insights - Cloud Only)
Pipeline
This section of the file can be partially or completely deleted if uploading does not entail any (or all) of the following aspects:
Required
successMarkerFile and failureMarkerFile: Specify a success marker file or failure marker file. When this file is present in the specified location, upload begins or stops, respectively.
Optional
sampleType — If the given analysis output belongs to only DNA or RNA, you can override the samples with the sampleType. If the sample Type is not specified, the system determines it from the analysis output.
Sample Sheet
This section specifies the sample sheet file path found in the analysis folder, the data header row marker, and column aliases. The following information is used to create cases in Connected Insights:
Required
filePath — Adding a file path to the sample sheet for the cases.
Optional
columnAliases — Specify the column aliases. These aliases must match the sample information column headers. Some aliases are required and others are optional.
sampleId — Appears in the Case ID column of the Cases page.
caseId — Appears in the Case ID column of the Cases page. For DNA-RNA paired samples, both the DNA and RNA sample rows have the same value in the column whose header is aliased to caseId. If the caseId is aliased to column header Pair_ID, a DNA-RNA sample must contain the same value in the Pair_ID column in both the DNA and RNA sample rows in Sample Sheet.
Sample_Type — No alias can be made for Sample_Type. The sample sheet must include a column header titled Sample_Type with all sample rows containing DNA or RNA in this column.
sex — Aliased to the header title of the column containing the sex of each sample.
Disease aliases — Determine the list of Key Genes used for this sample. For more information, refer to Overview Tab. If the disease name or ID is not provided, then the Status column on the Cases page displays Missing Required Data. This message displays until the disease name or ID is added. You can add a disease by uploading disease information as custom case data.You can also open the case in Connected Insights and enter the disease for an individual case.
id — Can be optionally aliased to the header title of the column containing sample disease ID number according to SNOMEDCT. The SNOMEDCT ID can be found by navigating to an existing case and searching for the disease in the CaseDetails or assertion form. The ID can also be found by using the International Edition browser at the SNOMED International SNOMED CT Browser.
name — Can be optionally aliased to the header title of the column containing sample disease name according to SNOMEDCT.If a disease ID is specified, a name is required. If you would not like to specify a name while using a disease ID, enter a null, or any non-exist column for the name field.
dataHeaderRowMarker - Specify the sample sheet data header row marker. The default value is [Data]. This specifies that the next row (one row below) contains the sample information headers and that the rows below that (two rows below and beyond) contain the sample information values for each sample. This should be the sample sheet cell text in the first column (furthest left) one row above the row containing the column headers describing the types of sample information listed for each sample (two rows above the first row containing sample information).
Joint Files
Specifies the file paths for biomarkers and metrics to be included for interpretation. File names can include symbolic references to the files that depend on the Sample ID or Pair ID:
{pairId}
{sampleId.DNA}
{sampleId.RNA}
When using the workflow scheme file template downloaded from the Configuration page, lines for files that are not uploaded can be deleted. The , Optional
designation can be removed unless the file is an optional file for the pipeline.
File
Compatibility
gisFile
JSON containing genomic instability score data.
msiFile
JSON containing microsatellite instability data.
tmbFile
JSON or CSV file containing tumor mutational burden data.
purityPloidyFiles
TSV or VCF file containing purity and ploidy estimates.
snvFiles
VCF files containing small variant calls.
cnvFiles
VCF files containing copy number variant calls.
svFiles
VCF files containing structural variant calls. The structural variant caller can also call longer small variant insertion/deletion/delins events and can duplicate calls from the small variant caller.
rnaSpliceFiles
VCF files containing RNA splice variant calls.
rnaFusionFiles
VCF files containing RNA fusion variant calls.
metricsQCFile
TSV file containing QC metrics data.
Sample Files
The following table shows specific sample visualization files used for IGV. File formats include .bam and .bam.bai. For more information, refer to IGV Visualizations. Under alignment Files, the , Optional
designation can be removed unless the file is an optional file for the pipeline.
File
Compatibility
dnaBamFile
BAM file for the DNA alignment (under alignmentFiles).
dnaBaiFile
BAI file for the DNA alignment (under alignmentFiles).
rnaBamFile
BAM file for the RNA alignment (under alignmentFiles).
rnaBaiFile
BAI file for the RNA alignment (under alignmentFiles).
coverageFiles
TSV file containing coverage data (under visualizationFiles).
balleleFiles
BEDGraph containing b-allele data (under visualizationFiles).
Custom Pipeline Configuration Example
The following example shows the custom pipeline configuration process using Local Run Manager TruSight Oncology 500 Analysis Module v2.2. For details on this process, refer to Custom Pipeline Configuration.
Uploaded Data
Uploaded data is organized as cases that provide details about the sample. A case is a secondary analysis result that has been imported and annotated.These files include VCF files for genetic variants (or CSV files for TruSight Oncology 500 RNA Fusion variants). The cases page lists all cases for your account or workgroup. The following files can be uploaded, but are not required:
BAM files
JSON, TSV, and CSV files for TMB, MSI, and GIS biomarkers or for QC metrics
Example Sample Sheet
Make sure that the sample sheet is included in the secondary analysis results folder. The following example shows the structure of the [Data]
section of the sample sheet:
Using this example, Connected Insights creates the following cases:
Case ID
Workflow Type
Disease
Sample ID
Sample Type
Control-Case
DNA and RNA
Malignant tumor of unknown origin (SNOMEDCT ID255052006)
DNA_Control RNA_Control
DNA RNA
Lung_001
DNA and RNA
Non-small cell lung cancer (SNOMEDCT ID 254637007)
Lung_DNA_001 Lung_RNA_001
DNA RNA
Breast_002
DNA
Malignant tumor of breast (SNOMEDCT ID 254837009)
Breast_DNA_002
DNA
Example Analysis Results
Open the secondary analysis results folder and find the files that must be identified in the workflow schema file. The following example shows the secondary analysis results folder structure:
Example Workflow Schema File
For more information, refer to Create a Workflow Schema File in Custom Pipeline Configuration.
The following example shows the workflow schema file structure:
❗ If
, Optional
is after the file name, then Connected Insights uploads the file if it is available or moves on to the next available file.
Last updated