1 of 10

Analysis Output

When the analysis run completes, the DRAGEN TruSight Oncology 500 Analysis Software generates an analysis output folder in a specified location.

To view analysis output, navigate to the analysis output folder and select the files that you want to view.

Single Node Analysis Output Folder Structure

Single output folder structure is as follows.

Logs_Intermediates
- AdditionalSarjMetrics— Contains per pair ID calculations to support the PCT_TARGET_250X metric.
- Annotation—Contains outputs for small variant annotation.
  - Subfolders per sample ID—Contains the aligned small variants JSON.
- CombinedVariantOutput
  - Subfolders per pair ID—Contains the combined variant output TSV files.
  - A combined output log file.
- Contamination
  - Subfolders per DNA sample ID—Contains the contamination metrics JSON file and output logs.
- DnaDragenCaller
  - Subfolders per sample ID—Contains the aligned BAM and index files, small variant VCF and gVCF, copy number variant VCF, MSI JSON, exon coverage report bed, and QC outputs in CSV format.
- DnaDragenExonCNVCaller
  - Subfolders per DNA sample ID—Contains the exon-level CNV JSON,the supporting calculation, and the QC files.
- DnaFastqValidation—Contains the FASTQ validation output log for DNA samples.
- FastqDownsample
  - Subfolders per RNA sample ID—Contains FASTQ files and output logs.
  - FastqDownsample output
- FastqGeneration
- Gis—Contains GIS-related files for HRD samples.
  - Subfolders per HRD sample ID—Contains the GIS JSON, the supporting calculation, and the QC files.
  - Also contains the annotated CNV VCF and gene level TSV file with absolute copy number and minor copy number information
- LrAnnotation
  - Subfolders per DNA sample ID—Contains the annotated exon-level CNV JSON.
- LrCalculator
  - Subfolders per DNA sample ID—Contains the exon-level CNV VCF.
- MetricsOutput
  - Subfolders per pair ID—Contains the metrics output TSV files.
  - A combined output log file.
- ResourceVerification—Contains the resource file checksum verification logs.
- RnaAnnotation
  - Subfolders per RNA sample ID—Contains the annotated splice variant JSON.
- RnaDragenCaller
  - Subfolders per sample ID—Contains the aligned BAM, fusion candidates CSV, exon coverage report bed and QC outputs in CSV format.
- RnaFastqValidation—Contains the FASTQ validation output log for RNA samples.
- RnaFusion
  - Subfolders per RNA sample ID—Contains the All Fusions CSV and Fusion Processor logs.
- RnaQcMetrics
  - Subfolders per RNA sample ID—Contains the RNA QC metrics JSON.
- RnaSpliceVariantCalling
  - Subfolders per RNA sample ID—Contains the splice variants VCF.
- Run QC—Contains the Run QC metrics JSON, Intermediate Run QC metrics JSON, and log file.
- SampleAnalysisResults
  - Subfolders per pair ID—Contains the Sample Analysis Results JSON and detailed log file.
  - SampleSheetValidation—Contains the Intermediate sample sheet and validation log.
- Tmb
  - Subfolders per DNA sample ID—Contains the TMB metrics CSV, TMB trace TSV, and related files and logs. passing_sample_steps.json —Contains the steps passed for each sample ID. pipeline_trace.txt—Contains a summary and troubleshooting file that lists each Nextflow task executed and the status (for example, COMPLETED or FAILED). run.log—Contains a complete trace-level log file describing the Nextflow pipeline execution. run_report.html—Contains high-level run statistics (performance, usage, etc.) run_timeline.html —Contains timeline-related information about the analysis run.
Results
- Metrics Output TSV (all pair IDs)
- Pair ID—The following outputs are produced for each sample:
  - Combined Variant Output TSV
    Metrics Output TSV
    TMB Trace TSV
    Small Variant Genome VCF
    Small Variant Genome Annotated JSON
    Copy Number Variant VCF
    GIS JSON
    MSI JSON
    Large Rearrangements CNV VCF
    Large Rearrangements CNV Annotated JSON
    All Fusion CSV
    Splice Variant VCF
    Splice Variant Annotated JSON
    Exon Coverage Report TSV
    Gene Coverage Report TSV

Multiple Node Analysis Output Folder Structure

Multiple output folder structure is as follows.

Demultiplex Output
- A Logs_Intermediates folder containing FASTQ files per sample.
Node(X) Output—The following outputs are produced for each node used:
- A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in the analysis pipeline for the sample run on the node.
- A Results folder containing results only for the sample run on the node.
Gathered Output
- A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in each analysis pipeline on every node—this contains outputs for all samples and pairs ran across all nodes in the analysis.
- A Results folder containing results for all samples and pairs ran across all nodes—results are organized by Pair_ID, then Sample_ID. This folder also contains summary files which contain information on all samples.

ICA Output Folder Structure

This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed. The same output folder structure and content exist in ICA and BaseSpace Sequence Hub.

High-Level Folder Structure

Run ID
- TSO500_Nextflow_logs
  - _manifest.json
- Results
  - _tags.json
- Logs_intermediates
- Errors—This folder is only present when analysis fails

TSO500_Nextflow_logs Folder Structure

The TSO_500_Nextflow_Logs provides information related to the execution of the pipeline on ICA as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the nextflow execution on those nodes.

TSO_500_Nextflow_Logs
- _manifest.json

Results Folder Structure

Contains the aggregated MetricsOutput.tsv file at the root level. Additionally, the Results folder contains a subfolder for each pair ID.

Results
- MetricsOutput.tsv
- Sample_1
- Sample_2
- Sample_<#>
- _tags.json

The Results subfolder contains the following files:

Results
- MetricsOutput.tsv
- <Pair_id>
  - CombinedVariantOutput.tsv
  - <SampleName>_MetricsOutput.tsv
- <DNA_Sample_id>
  - CopyNumberVariants.vcf
  - DNAMergedSmallVariants_Annotated.json.gz
  - MergedSmallVariants.genome.vcf
  - MergedSmallVariants.vcf
  - microstat_output.json
  - TMB_Trace.tsv
- <RNA_Sample_id>
  - AllFusions.csv
  - RNA_Annotated.json.gz
  - SpliceVariants.vcf

Logs_intermediates Folder Structure

Contains folders for each submodule in the DRAGEN TSO 500 on ICA pipeline. The folders contain a copy of all the relevant files required to create the metric output files and report files, as well as the combined log files at the root level and subfolders for each sample.

Logs_intermediates
- DnaDragenCaller
- AdditionalSarjMetrics
- CombinedVariantOutput
- FastqGeneration
- MetricsOutput
- DnaDragenExonCnvCaller
- DnaFastqValidation
- DNACoverageReport
- Gis
- Tmb
- SampleAnalysisResults
- SampleSheetValidation
- passing_sample_steps.json
- RnaFusion
- Contamination
- Annotation
- RnaAnnotation
- RnaDragenCaller
- RnaSpliceVariantCalling
- RunQc
- FastqDownsample
- PassingSampleSteps
- ResourceVerification
- LrCalculator
- LrAnnotation
- RnaQcMetrics
- RnaFastqValidation
- RNACoverageReport

Errors Folder Structure

Contains Errors.tsv. This file contains the summary of all the errors encountered during pipeline execution.

Errors
- Errors.tsv

NovaSeq 6000Dx Analysis Application Output Folder Structure

The following files and folders are created during analysis by NovaSeq 6000Dx Analysis Application:

analysisResults.json
CopyComplete.txt
edgeos.nextflow.config
inputs/
- sampleMapping.json
- SampleSheet.csv
- SampleSheet.json
Manifest.tsv
params.json
Results/
workflowLogs/
- nf-main-***.log

When the analysis run completes, the analysis application generates an analysis output in a specified location. To view analysis output, follow the steps below:

On the “Completed” runs tab, select the run
Review the run details page, and this will give the information to access the output folder
External Location: is the input for the run
Analysis Output Folder: is where the output is stored. To navigate to this page, follow the “server location” and the gds analysis output folder
Navigate to the directory that contains the analysis output folder
Open the folder, and then select the files that you want to view

DNA Output

Refer to DNA Analysis Methods for more information.

Small Variant gVCF

File name: {SAMPLE_ID}_hard-filtered.gvcf.gz

The small variant genome variant call file contains information on all candidate small variants evaluated, including complex variants up to 15 bp from phased variant calling across the entire TSO 500 panel.

The variant status is determined by the FILTER column in the genome VCF as follows.

Filter

Note

PASS

PASS variants.

base_quality

Site filtered because median base quality of alt reads at this locus does not meet threshold.

filtered_reads

Site filtered because the fraction of reads is too large.

fragment_length

Site filtered because absolute difference between the median fragment length of alt reads and median fragment length of ref reads at this locus exceeds threshold.

low_depth

Site filtered because the read depth is too low.

low_frac_info_reads

Site filtered because the fraction of informative reads is below threshold.

long_indel

Site filtered because the indel length is too long.

mapping_quality

Site filtered because median mapping quality of alt reads at this locus does not meet threshold.

multiallelic

Site filtered because more than two alt alleles pass tumor LOD.

no_reliable_supporting_read

Site filtered because no reliable supporting somatic read exists.

read_position

Site filtered because median of distances between start/end of read and this locus is below threshold.

str_contraction

Site filtered due to suspected PCR error where the alt allele is one repeat unit less than the reference.

too_few_supporting_reads

Site filtered because there are too few supporting reads in the tumor sample.

weak_evidence

Somatic variant score (SQ) does not meet threshold.

systematic_noise

Site filtered based on evidence of systematic noise in normal sample.

excluded_regions

Site overlaps with VC excluded regions bed.

Small Variant Annotated JSON

File name: {SAMPLE_ID}_DNAVariants_Annotated.json.gz

The small variants annotated file provides variant annotation information for all nonreference positions from the genome VCF including pass and nonpass variants.

TMB Trace

The TMB trace file provides comprehensive information on how the TMB value is calculated for a given sample. All passing small variants from the small variant filtering step are included in this file. To calculate the numerator of the TmbPerMb value in the TMB JSON, set the TSV file filter to use the IncludedInTMBNumerator with a value of True.

The TMB trace file is not intended to be used for variant inspections. The filtering statuses are exclusively set for TMB calculation purposes. Setting a filter does not translate into the classification of a variant as somatic or germline.

Column

Description

Chromosome

Position

Position of variant

RefCall

Reference base

AltCall

Alternate base

VAF

Variant allele frequency

Depth

Coverage of position

CytoBand

Cytoband of variant

GeneName

Name of gene if applicable. A semicolon delimited list is used for multiple genes.

VariantType

Type of the variant: SNV, insertion, deletion, MNV

CosmicIDs

Cosmic IDs, if multiple concatenated by “;”

MaxCosmicCount

Maximum Cosmic study count

AlleleCountsGnomadExome

Variant allele count in gnomAD exome database

AlleleCountsGnomadGenome

Variant allele count in gnomAD genome database

AlleleCounts1000Genomes

Variant allele count in 1000 genomes database

MaxDatabaseAlleleCounts

Maximum variant allele count over the three databases

GermlineFilterDatabase

TRUE if variant was filtered by the database filter

GermlineFilterProxi

TRUE if variant was filtered by the proxi filter

CodingVariant

TRUE if variant is in the coding region

Nonsynonymous

TRUE if variant has any transcript annotations with nonsynonymous consequences

IncludedinTMBNumberator

TRUE if variant is used in the TMB calculation

Copy Number VCF

The copy number VCF file contains CNV calls for DNA libraries of the amplification genes targeted by DRAGEN TruSight Oncology 500 Analysis Software. The CNV call indicates fold change results for each gene classified as reference, deletion, or amplification.

The value in the QUAL column of the VCF is a Phred transformation of the p-value where Q=-10xlog10(p-value). The p-value is derived from the t-test between the fold change of the gene against the rest of the genome. Higher Q-scores indicate higher confidence in the CNV call.

In the VCF notation, <DUP> indicates the detected fold change (FC) is greater than a predefined amplification cutoff. <DEL> indicates the detected FC is less than a predefined deletion cutoff for that gene. This cutoff can vary from gene to gene.

In analysis versions prior to v2.5, <DEL> calls in the VCF are marked as LowValidation. The LowValidation filter indicates that the calls have been validated only with in silico data sets and are provided as information only.

Each copy number variant is reported as a fold change on normalized read depth in a testing sample relative to the normalized read depth in diploid genomes. Given tumor purity, you can infer the ploidy of a gene in the sample from the reported fold change.

Given tumor purity X%, for a reported fold change Y, you can calculate the copy number n using the following equation:

For example, a tumor purity at 30% and a MET with fold change of 2.2x indicates that 10 copies of MET DNA are observed.

RNA Output

Refer to RNA Analysis Methods for more information.

Splice Variant VCF

The splice variant VCF contains all candidate splice variants targeted by the analysis panel identified by the RNA analysis pipeline. You can apply the following filters for each variant call:

Filter Name

Description

LowQ

Splice variant score < passing quality score threshold value of 1.

PASS

Splice variant score ≥ passing quality score threshold value of 1.

LowUniqueAlignments

All splice junction supporting reads map to a unique genomic interval near at least one of the two splice sites.

Refer to the headers in the output for more information about each column.

Splice Variant Annotated JSON

If available, each splice variant is annotated using the Illumina Annotation Engine. The following information is captured in the JSON:

HGNC Gene
Transcript
Exons
Introns
Canonical
Consequence

All Fusions CSV

The all fusions CSV file contains all candidate fusions identified by the DRAGEN RNA pipeline. Two output columns in the file describe the candidate fusions: Filter and KeepFusion.

The following table describes the semicolon-separated output found in the Filter columns. The output is either a confidence filter or information only as indicated. If none of the confidence filters are triggered, the Filter column contains the output PASS, else it contains the output FAIL.

Filter Column Output

Filter

Filter Type

Description

DOUBLE_BROKEN_EXON

Confidence filter

If both breakpoints are distant from annotated exon boundaries, the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads).

LOW_MAPQ

Confidence filter

All fusion supporting read alignments at either of the breakpoints have MAPQ < 20.

LOW_UNIQUE_ALIGNMENTS

Confidence filter

All fusion supporting read alignments map to a unique genomic interval at either of the breakpoints.

LOW_SCORE

Confidence filter

The fusion candidate has probabilistic score as determined by the features of the candidate.

MIN_SUPPORT

Confidence filter

The fusion candidate has very few fusion supporting reads (< 5 supporting read pairs).

READ_THROUGH

Confidence filter

The breakpoints are cis neighbors (< 200 kbp) on the reference genome.

ANCHOR_SUPPORT

Information only

Read alignments of fusion supporting reads are not long enough (12 bp) at either of the two breakpoints.

HOMOLOGOUS

Information only

The candidate is likely a false candidate generated because the two genes involved have high gene homology.

LOW_ALT_TO_REF

Information only

The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints.

LOW_GENE_COVERAGE

Information only

Each breakpoint in an enriched gene has fewer than 125 bp with nonzero read coverage.

NO_COMPLETE_SPLIT_READS

Confidence filter

For every fusion-supporting split read, the total number of aligned bases across two breakpoints is less 60% of the read length.

UNENRICHED_GENE

Confidence filter

Neither of the two parent genes is in the enrichment panel.

The KeepFusion column of the output has a value of TRUE when none of the confidence filters are triggered.

Refer to the headers in the output for more information about each column.

Fusion Columns

Fusion Object Field

Source

Gene A

The gene associated with the A side of the fusion. A semicolon delimited list is used for multiple genes.

Gene B

The gene associated with the B side of the fusion. A semicolon delimited list is used for multiple genes.

Gene A Breakpoint

[Information only] The chromosome and offset of the Gene A side of the fusion.

Gene A Location

Location of the breakpoint within Gene A: - IntactExon—Matches exon boundary - BrokenExon—Inside an exon - Intronic—Within an intron - Intergenic—No gene overlap (currently excluded) If multiple genes are in Gene A, then semicolon separated list of locations. This column is used internally to identify genes to report when a breakpoint occurs in a region overlapping multiple genes. Occasionally, additional values are listed for genes that were excluded from the GeneA list.

Gene A Sense

Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene A. If multiple genes are in Gene A, then semicolon separated list of bools.

Gene A Strand

Strand of Gene A, + for forward, - for reverse.

Gene B Breakpoint

[Information only] The chromosome and offset of the Gene B side of the fusion.

Gene B Location

Location of the breakpoint within Gene B: - IntactExon—Matches exon boundary - BrokenExon—Inside an exon - Intronic—Within an intron - Intergenic—No gene overlap (currently excluded) If multiple genes in Gene B, then semicolon separated list of locations. This column is used internally to identify genes to report when a breakpoint occurs in a region overlapping multiple genes. Occasionally, additional values are listed for genes that were excluded from the GeneB list.

Gene B Sense

Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene B. If multiple genes are in Gene B, then semicolon separated list of bools.

Gene B Strand

Strand of Gene B, + for forward, - for reverse.

Score

The quality of fusion as determined by DRAGEN server.

Filter

The filter associated with the fusion as determined by the respective caller. Results from different callers are not equivalent.

Ref A Dedup

Gene A uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.

Ref B Dedup

Gene B uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.

Alt Split Dedup

Uniquely mapping reads split by the junction. Supports fusion. Duplicate reads are not included.

Alt Pair Dedup

Uniquely mapping reads paired across junction. Supports fusion. Duplicate reads are not included.

KeepFusion

The determination whether the fusion should be kept or dropped from the list of fusions.

Fusion Directionality Known

Whether fusion directionality is known and indicated by gene order.

When using Microsoft Excel to view this report, genes that are convertible to dates (such as MARCH1 automatically convert to dd-mm format (1 Mar) by Excel. The following are fusion allow list genes:

ABL1
AKT3
ALK
AR
AXL
BCL2
BRAF
BRCA1
BRCA2
CDK4
CSF1R
EGFR
EML4
ERBB2
ERG
ESR1
ETS1
ETV1
ETV4
ETV5
EWSR1
FGFR1
FGFR2
FGFR3
FGFR4
FLI1
FLT1
FLT3
JAK2
KDR
KIF5B
KIT
KMT2A
MET
MLLT3
MSH2
MYC
NOTCH1
NOTCH2
NOTCH3
NRG1
NTRK1
NTRK2
NTRK3
PAX3
PAX7
PDGFRA
PDGFRB
PIK3CA
PPARG
RAF1
RET
ROS1
RPS6KB1
TMPRSS2

Combined Variant Output

File name: {Pair_ID}_CombinedVariantOutput.tsv

The combined variant output file contains the variants and biomarkers in a single file that is based on a single sample. If using pair ID, the file is based on paired DNA and RNA samples from the same individual. The output contains the following variant types and biomarkers:

Small variants
Copy number variants (CNV) (with absolute copy number when HRD Assay is run)
TMB
MSI
Fusions
Splice variants
GIS (when HRD Assay is run)
Gene-level Loss of Heterozygosity (when HRD Assay is run)
Large Rearrangements

The combined variant output file also contains Analysis Details and Sequencing Run Details sections. The details of each are listed in the following table:

Analysis Details

Sequencing Run Details

- Pair ID - DNA sample ID (if DNA is run) - RNA sample ID (if RNA is run)

- Library Prep Kit - Output date - Output time - Module version - Pipeline version (Docker image version #)

- Run name - Run date - DNA sample index ID (if DNA is run) - RNA sample index ID (if RNA is run) - [HRD] Sample feature - Instrument ID - Instrument control software version - Instrument type

- RTA version - Reagent cartridge lot number

Combined variant output produces small variants with blank fields in the following situations:

The variant has been matched to a canonical RefSeq transcript on an overlapping gene not targeted by TruSight Oncology 500.
The variant is located in a region designated iSNP, indel, or Flanking in the TST500_Manifest.bed file located in the Resources folder.

Variant Filtering Rules

Small Variants - All variants with the FILTER field marked as PASS in the hard-filtered genome VCF are present in the combined variant output.
- Gene information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
- Transcript information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
Copy Number Variants - Copy number variants must meet the following conditions:
- FILTER field marked as PASS.
- ALT field is <DUP or <DEL> .
Fusion Variants - Fusion variants must meet the following conditions:
- Passing variant call (KeepFusion field is true).
- Contains at least one gene on the fusion allow list.
- Genes separated by a dash (-) indicate that the fusion directionality could be determined. Genes separated by a slash (/) indicate that the fusion directionality could not be determined.
Biomarkers TMB/MSI - Always present when DNA sample is processed.
Splice Variants - Passing splice variants that are contained on genes EGFR, MET, and AR.
Biomarker GIS - Present only if TruSight Oncology 500 HRD analysis is performed
Loss of Heterozygosity - Present only when TruSight Oncology 500 HRD is run. Loss of heterozygosity (LOH) must meet the following condition:
- MCN field is equal to 0
Large Rearrangements CNV - Large Rearrangements CNVs must meet the following conditions:
- BRCA1 or BRCA2 contains at least one affected exon.
- ALT field is <DUP> or <LOSS> .

Metrics Output

The MetricsOutput.tsv file contains the following quality control metrics for all samples:

DNA library QC metrics for:
- Small variant calling
- TMB
- MSI
- CNV
- [HRD] GIS
RNA library QC metrics
Run QC metrics, analysis status, and contamination

This TSV file also includes expanded DNA library QC metrics per sample, based on total reads, collapsed reads, chimeric reads, and on-target reads. Analysis using RNA samples also produces RNA library QC metrics and expanded RNA library QC metrics per sample based on total reads and coverage.

The MetricsOutput.tsv file is a final combined metrics report with sample status, key analysis metrics, and metadata. Sample metrics within the report include suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run.

For troubleshooting information, refer to Troubleshooting

DNA Expanded Metrics

DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric

Description

Troubleshooting

TOTAL_PF_READS (count)

Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.

Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.

A low value for an individual sample indicates poor pooling of this library into the final pool.

MEAN_FAMILY_SIZE (count)

A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families.

The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.

As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.

UMI family size = 1 is not ideal as it is harder to correct for errors.

UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.

MEDIAN_TARGET_COVERAGE (count)

Median depth across all the unique loci occurring in all regions of the manifest file.

Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.

PCT_CHIMERIC_READS (%)

Chimeric reads occur when one sequencing read aligns to two distinct portions of the genome with little or no overlap. Metric is proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

While this can be indicative of large-scale structural rearrangement of the genome, values that are elevated above the usual baseline may indicate enrichment probe contamination during library preparation. A suggested metric USL is 8% (those that are higher might see decrease performance in small variant and tmb scores).

PCT_EXON_100X (%)

Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.

Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.

PCT_READ_ENRICHMENT (%)

Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest.

Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.

PCT_USABLE_UMI_READS (%)

Percentage of reads that have valid UMI sequences associated with them.

As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.

MEAN_TARGET_COVERAGE (count)

Mean depth across all the unique loci defined in the manifest file.

Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.

PCT_ALIGNED_READS (%)

Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.

PCT_CONTAMINATION_EST (%)

This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.

If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)

High contamination estimates may be due to any of the following:

Inter-sample contamination caused by mixing of samples during extraction or library preparation.

Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.

PCT_TARGET_0.4X_MEAN (%)

Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).

Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.

PCT_TARGET_50X (%)

Percentage of target bases with 50X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_100X (%)

Percentage of target bases with 100X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_250X (%)

Percentage of target bases with 250X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_SOFT_CLIPPED_BASES (%)

percentage of based that were not used for alignment but retained as part of the alignment file

Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.

PCT_Q30_BASES (%)

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).

An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

ALLELE DOSAGE_RATIO (with HRD add-on)

Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.

MEDIAN TARGET HRD (with HRD add-on)

Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.

RNA Expanded Metrics

RNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric

Description

Units

PCT_CHIMERIC_READS

Percentage of reads that are aligned as two segments which map to nonconsecutive regions in the genome.

PCT_ON_TARGET_READS

Percentage of reads that cross any part of the target region versus total reads. A read that partially maps to a target region is counted as on target.

SCALED_MEDIAN_GENE_COVERAGE

Median of median base coverage of genes scaled by length. An indication of median coverage depth of genes in the panel.

Count

TOTAL_PF_READS

Total number of reads passing filter.

Count

GENE_MEDIAN_COVERAGE

The median coverage depth of all genes in the panel.

Count

GENE_ABOVE_MEDIAN_CUTOFF

Number of genes above the median coverage cutoff.

Count

PER_GENE_MEDIAN_COVERAGE

Median deduped coverage across each gene (available in Logs_Intermediates only)

Count

PCT_SOFT_CLIPPED_BASES

percentage of based that were not used for alignment but retained as part of the alignment file

RNA_PCT_030_BASES

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score). Troubleshooting: An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

HRD Metrics Report

The Illumina DRAGEN TruSight Oncology 500 Analysis Software allows for analysis of sequencing data generated from the TruSight Oncology 500 HRD assay. When HRD samples are analyzed new results and metrics are included in the CombinedVariantOutput and MetricsOutput files respectively. The following tables detail how these scores and QC metrics are derived.

Metric

Description

GIS Score*

Proprietary Genomic Instability Score (GIS) indicating level of genomic instability in sample genome. Combination of Loss of Heterozygosity (LOH), Telomeric allelic imbalance and Large-scale State Transitions (LST) scores. The GIS scores provided by TruSight Oncology 500 HRD show good correlation (R2= 0.98) with Myriad Genetics GIS however they are not identical (Refer to TruSight Oncology 500 HRD Product Data Sheet Doc# M-GL-00748 for more details). GIS from alternative HRD assays should be not be considered equivalent to Illumina/Myriad GIS.

*The GIS algorithm within the TSO500 pipeline (which does not have a cell line mode due to the TSO500 pipeline being non-configurable) is only intended for FFPE samples. Cell line samples will not accurately report GIS results as the tumor fraction (>90%) is too high to reliably distinguish tumor vs germline variants.

HRD Metrics Added to Metrics Output File

Metric

Description

Section in Metrics Output

PCT_TARGET_HRD_50X

Percent of HRD probe SNP panel covered by at least 50X coverage

DNA Library QC Metrics for GIS

EXCESSIVE_TF

EXCESSIVE TF indicates if there is excessive tumor content in sample. Troubleshooting: Samples with pure tumor fraction >90% are outside the design for GIS estimation (this includes pure tumor cell lines)

DNA Library QC Metrics for GIS

ALLELE_DOSAGE_RATIO

DNA Expanded Metrics

MEDIAN_TARGET_HRD_COVERAGE

Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.

DNA Expanded Metrics

Coverage Reports

The gene and exon coverage report files are tab-separated value (TSV) files with coverage values matching respectively the exons and genes for both DNA and RNA samples specified in the manifest file.

Block List

The block list represents high noise regions in the panel where false positive variant calls are likely produced. As a result, all positions in the gVCF are marked as Filter=excluded_regions to indicate variant call results are not reliable in such regions.

The block list includes the following genes:

HLA A
HLA B
HLA C
KMT2B
KMT2C
KMT2D
chrY
Any position with VAF 1% occurrence in six or more of the 60 baseline samples.

Analysis Output

When the analysis run completes, the DRAGEN TruSight Oncology 500 Analysis Software generates an analysis output folder in a specified location.

To view analysis output, navigate to the analysis output folder and select the files that you want to view.

Single Node Analysis Output Folder Structure

Single output folder structure is as follows.

Logs_Intermediates
- AdditionalSarjMetrics— Contains per pair ID calculations to support the PCT_TARGET_250X metric.
- Annotation—Contains outputs for small variant annotation.
  - Subfolders per sample ID—Contains the aligned small variants JSON.
- CombinedVariantOutput
  - Subfolders per pair ID—Contains the combined variant output TSV files.
  - A combined output log file.
- Contamination
  - Subfolders per DNA sample ID—Contains the contamination metrics JSON file and output logs.
- DnaDragenCaller
  - Subfolders per sample ID—Contains the aligned BAM and index files, small variant VCF and gVCF, copy number variant VCF, MSI JSON, exon coverage report bed, and QC outputs in CSV format.
- DnaDragenExonCNVCaller
  - Subfolders per DNA sample ID—Contains the exon-level CNV JSON,the supporting calculation, and the QC files.
- DnaFastqValidation—Contains the FASTQ validation output log for DNA samples.
- FastqDownsample
  - Subfolders per RNA sample ID—Contains FASTQ files and output logs.
  - FastqDownsample output
- FastqGeneration
- Gis—Contains GIS-related files for HRD samples.
  - Subfolders per HRD sample ID—Contains the GIS JSON, the supporting calculation, and the QC files.
  - Also contains the annotated CNV VCF and gene level TSV file with absolute copy number and minor copy number information
- LrAnnotation
  - Subfolders per DNA sample ID—Contains the annotated exon-level CNV JSON.
- LrCalculator
  - Subfolders per DNA sample ID—Contains the exon-level CNV VCF.
- MetricsOutput
  - Subfolders per pair ID—Contains the metrics output TSV files.
  - A combined output log file.
- ResourceVerification—Contains the resource file checksum verification logs.
- RnaAnnotation
  - Subfolders per RNA sample ID—Contains the annotated splice variant JSON.
- RnaDragenCaller
  - Subfolders per sample ID—Contains the aligned BAM, fusion candidates CSV, exon coverage report bed and QC outputs in CSV format.
- RnaFastqValidation—Contains the FASTQ validation output log for RNA samples.
- RnaFusion
  - Subfolders per RNA sample ID—Contains the All Fusions CSV and Fusion Processor logs.
- RnaQcMetrics
  - Subfolders per RNA sample ID—Contains the RNA QC metrics JSON.
- RnaSpliceVariantCalling
  - Subfolders per RNA sample ID—Contains the splice variants VCF.
- Run QC—Contains the Run QC metrics JSON, Intermediate Run QC metrics JSON, and log file.
- SampleAnalysisResults
  - Subfolders per pair ID—Contains the Sample Analysis Results JSON and detailed log file.
  - SampleSheetValidation—Contains the Intermediate sample sheet and validation log.
- Tmb
  - Subfolders per DNA sample ID—Contains the TMB metrics CSV, TMB trace TSV, and related files and logs. passing_sample_steps.json —Contains the steps passed for each sample ID. pipeline_trace.txt—Contains a summary and troubleshooting file that lists each Nextflow task executed and the status (for example, COMPLETED or FAILED). run.log—Contains a complete trace-level log file describing the Nextflow pipeline execution. run_report.html—Contains high-level run statistics (performance, usage, etc.) run_timeline.html —Contains timeline-related information about the analysis run.
Results
- Metrics Output TSV (all pair IDs)
- Pair ID—The following outputs are produced for each sample:
  - Combined Variant Output TSV
    Metrics Output TSV
    TMB Trace TSV
    Small Variant Genome VCF
    Small Variant Genome Annotated JSON
    Copy Number Variant VCF
    GIS JSON
    MSI JSON
    Large Rearrangements CNV VCF
    Large Rearrangements CNV Annotated JSON
    All Fusion CSV
    Splice Variant VCF
    Splice Variant Annotated JSON
    Exon Coverage Report TSV
    Gene Coverage Report TSV

Multiple Node Analysis Output Folder Structure

Multiple output folder structure is as follows.

Demultiplex Output
- A Logs_Intermediates folder containing FASTQ files per sample.
Node(X) Output—The following outputs are produced for each node used:
- A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in the analysis pipeline for the sample run on the node.
- A Results folder containing results only for the sample run on the node.
Gathered Output
- A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in each analysis pipeline on every node—this contains outputs for all samples and pairs ran across all nodes in the analysis.
- A Results folder containing results for all samples and pairs ran across all nodes—results are organized by Pair_ID, then Sample_ID. This folder also contains summary files which contain information on all samples.

ICA Output Folder Structure

High-Level Folder Structure

Run ID
- TSO500_Nextflow_logs
  - _manifest.json
- Results
  - _tags.json
- Logs_intermediates
- Errors—This folder is only present when analysis fails

TSO500_Nextflow_logs Folder Structure

TSO_500_Nextflow_Logs
- _manifest.json

Results Folder Structure

Contains the aggregated MetricsOutput.tsv file at the root level. Additionally, the Results folder contains a subfolder for each pair ID.

Results
- MetricsOutput.tsv
- Sample_1
- Sample_2
- Sample_<#>
- _tags.json

The Results subfolder contains the following files:

Results
- MetricsOutput.tsv
- <Pair_id>
  - CombinedVariantOutput.tsv
  - <SampleName>_MetricsOutput.tsv
- <DNA_Sample_id>
  - CopyNumberVariants.vcf
  - DNAMergedSmallVariants_Annotated.json.gz
  - MergedSmallVariants.genome.vcf
  - MergedSmallVariants.vcf
  - microstat_output.json
  - TMB_Trace.tsv
- <RNA_Sample_id>
  - AllFusions.csv
  - RNA_Annotated.json.gz
  - SpliceVariants.vcf

Logs_intermediates Folder Structure

Logs_intermediates
- DnaDragenCaller
- AdditionalSarjMetrics
- CombinedVariantOutput
- FastqGeneration
- MetricsOutput
- DnaDragenExonCnvCaller
- DnaFastqValidation
- DNACoverageReport
- Gis
- Tmb
- SampleAnalysisResults
- SampleSheetValidation
- passing_sample_steps.json
- RnaFusion
- Contamination
- Annotation
- RnaAnnotation
- RnaDragenCaller
- RnaSpliceVariantCalling
- RunQc
- FastqDownsample
- PassingSampleSteps
- ResourceVerification
- LrCalculator
- LrAnnotation
- RnaQcMetrics
- RnaFastqValidation
- RNACoverageReport

Errors Folder Structure

Contains Errors.tsv. This file contains the summary of all the errors encountered during pipeline execution.

Errors
- Errors.tsv

NovaSeq 6000Dx Analysis Application Output Folder Structure

The following files and folders are created during analysis by NovaSeq 6000Dx Analysis Application:

analysisResults.json
CopyComplete.txt
edgeos.nextflow.config
inputs/
- sampleMapping.json
- SampleSheet.csv
- SampleSheet.json
Manifest.tsv
params.json
Results/
workflowLogs/
- nf-main-***.log

When the analysis run completes, the analysis application generates an analysis output in a specified location. To view analysis output, follow the steps below:

On the “Completed” runs tab, select the run
Review the run details page, and this will give the information to access the output folder
External Location: is the input for the run
Analysis Output Folder: is where the output is stored. To navigate to this page, follow the “server location” and the gds analysis output folder
Navigate to the directory that contains the analysis output folder
Open the folder, and then select the files that you want to view

RNA Output

Refer to RNA Analysis Methods for more information.

Splice Variant VCF

The splice variant VCF contains all candidate splice variants targeted by the analysis panel identified by the RNA analysis pipeline. You can apply the following filters for each variant call:

Filter Name

Description

LowQ

Splice variant score < passing quality score threshold value of 1.

PASS

Splice variant score ≥ passing quality score threshold value of 1.

LowUniqueAlignments

All splice junction supporting reads map to a unique genomic interval near at least one of the two splice sites.

Refer to the headers in the output for more information about each column.

Splice Variant Annotated JSON

If available, each splice variant is annotated using the Illumina Annotation Engine. The following information is captured in the JSON:

HGNC Gene
Transcript
Exons
Introns
Canonical
Consequence

All Fusions CSV

The all fusions CSV file contains all candidate fusions identified by the DRAGEN RNA pipeline. Two output columns in the file describe the candidate fusions: Filter and KeepFusion.

Filter Column Output

Filter

Filter Type

Description

DOUBLE_BROKEN_EXON

Confidence filter

If both breakpoints are distant from annotated exon boundaries, the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads).

LOW_MAPQ

Confidence filter

All fusion supporting read alignments at either of the breakpoints have MAPQ < 20.

LOW_UNIQUE_ALIGNMENTS

Confidence filter

All fusion supporting read alignments map to a unique genomic interval at either of the breakpoints.

LOW_SCORE

Confidence filter

The fusion candidate has probabilistic score as determined by the features of the candidate.

MIN_SUPPORT

Confidence filter

The fusion candidate has very few fusion supporting reads (< 5 supporting read pairs).

READ_THROUGH

Confidence filter

The breakpoints are cis neighbors (< 200 kbp) on the reference genome.

ANCHOR_SUPPORT

Information only

Read alignments of fusion supporting reads are not long enough (12 bp) at either of the two breakpoints.

HOMOLOGOUS

Information only

The candidate is likely a false candidate generated because the two genes involved have high gene homology.

LOW_ALT_TO_REF

Information only

The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints.

LOW_GENE_COVERAGE

Information only

Each breakpoint in an enriched gene has fewer than 125 bp with nonzero read coverage.

NO_COMPLETE_SPLIT_READS

Confidence filter

For every fusion-supporting split read, the total number of aligned bases across two breakpoints is less 60% of the read length.

UNENRICHED_GENE

Confidence filter

Neither of the two parent genes is in the enrichment panel.

The KeepFusion column of the output has a value of TRUE when none of the confidence filters are triggered.

Refer to the headers in the output for more information about each column.

Fusion Columns

Fusion Object Field

Source

Gene A

The gene associated with the A side of the fusion. A semicolon delimited list is used for multiple genes.

Gene B

The gene associated with the B side of the fusion. A semicolon delimited list is used for multiple genes.

Gene A Breakpoint

[Information only] The chromosome and offset of the Gene A side of the fusion.

Gene A Location

Gene A Sense

Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene A. If multiple genes are in Gene A, then semicolon separated list of bools.

Gene A Strand

Strand of Gene A, + for forward, - for reverse.

Gene B Breakpoint

[Information only] The chromosome and offset of the Gene B side of the fusion.

Gene B Location

Gene B Sense

Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene B. If multiple genes are in Gene B, then semicolon separated list of bools.

Gene B Strand

Strand of Gene B, + for forward, - for reverse.

Score

The quality of fusion as determined by DRAGEN server.

Filter

The filter associated with the fusion as determined by the respective caller. Results from different callers are not equivalent.

Ref A Dedup

Gene A uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.

Ref B Dedup

Gene B uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.

Alt Split Dedup

Uniquely mapping reads split by the junction. Supports fusion. Duplicate reads are not included.

Alt Pair Dedup

Uniquely mapping reads paired across junction. Supports fusion. Duplicate reads are not included.

KeepFusion

The determination whether the fusion should be kept or dropped from the list of fusions.

Fusion Directionality Known

Whether fusion directionality is known and indicated by gene order.

When using Microsoft Excel to view this report, genes that are convertible to dates (such as MARCH1 automatically convert to dd-mm format (1 Mar) by Excel. The following are fusion allow list genes:

ABL1
AKT3
ALK
AR
AXL
BCL2
BRAF
BRCA1
BRCA2
CDK4
CSF1R
EGFR
EML4
ERBB2
ERG
ESR1
ETS1
ETV1
ETV4
ETV5
EWSR1
FGFR1
FGFR2
FGFR3
FGFR4
FLI1
FLT1
FLT3
JAK2
KDR
KIF5B
KIT
KMT2A
MET
MLLT3
MSH2
MYC
NOTCH1
NOTCH2
NOTCH3
NRG1
NTRK1
NTRK2
NTRK3
PAX3
PAX7
PDGFRA
PDGFRB
PIK3CA
PPARG
RAF1
RET
ROS1
RPS6KB1
TMPRSS2