File name: {SampleID}_CombinedVariantOutput.tsv
The combined variant file contains the variants and biomarkers in a single file. The output contains the following variant types and biomarkers:
Small variants (including EGFR complex variants)
Copy number variants
Tumor Mutational Burden (TMB)
MSI
DNA Fusions
The combined variant output file also contains Analysis Details and Sequencing Run Details sections. The details of each are listed in the following table:
Combined variant output produces small variants with blank fields in the following situations:
The variant has been matched to a canonical RefSeq transcript on an overlapping gene not targeted by TruSight Oncology 500 ctDNA.
The variant is located in a region designated iSNP, indel, or Flanking in the TST500_Manifest.bed file located in the Resources folder.
Small Variants - All variants with the FILTER field marked as PASS and which have a canonical RefSeq transcript are present in the combined variant output.
Gene and transcript information is only present for variants belonging to canonical transcripts that are within the Gene list–Small Variants.
Copy Number Variants - Copy number variants must meet the following conditions:
FILTER field marked as PASS.
ALT field is <DUP or <DEL> .
Gene is part of the copy number variant gene list
Fusion Variants - Fusion variants must meet the following conditions:
Passing fusion filtering criteria with "PASS" from DNAFF module
Contains at least one gene on the fusion allow list.
Genes separated by a dash (-) indicate that the fusion directionality could be determined. Genes separated by a slash (/) indicate that the fusion directionality could not be determined.
- Sample ID - Output date - Output time - Pipeline version (Docker image version number)
- Run name - Run date - Sample index ID - Instrument ID - Instrument control software version - Instrument type - RTA version - SBS reagent cartridge lot number - Cluster reagent cartridge lot number
When the analysis run completes, the DRAGEN TruSight Oncology 500 ctDNA Analysis Software generates an analysis output folder in a specified location.
To view analysis output, navigate to the analysis output folder and select the files that you want to view.
Single output folder structure is as follows.
Logs_Intermediates
AdditionalSarjMetrics
Annotation—Contains outputs for small variant annotation.
Subfolders per sample ID—Contains the aligned small variants JSON.
Results
Metrics Output TSV (all Sample IDs)
Sample ID—The following outputs are produced for each sample:
This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed. The same output folder structure and content exist in ICA and BaseSpace Sequence Hub.
Run ID
TSO500_Nextflow_logs
_manifest.json
The TSO_500_Nextflow_Logs provides information related to the execution of the pipeline on ICA as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the nextflow execution on those nodes.
TSO_500_Nextflow_Logs
_manifest.json
Contains the aggregated MetricsOutput.tsv file at the root level. Additionally, the Results folder contains a subfolder for each sample ID.
Results
MetricsOutput.tsv
Sample_1
The Results subfolder contains the following files:
Results
MetricsOutput.tsv
<Sample_id>
Contains folders for each submodule in the DRAGEN TSO 500 ctDNA on ICA pipeline. The folders contain a copy of all the relevant files required to create the metric output files and report files, as well as the combined log files at the root level and subfolders for each sample.
Logs_intermediates
AdditionalSarjMetrics
Annotation
All logs in Logs_Intermediates are generated from the running analysis software. Inputs to the running Docker container (for example, the run folder, sample sheet, and FASTQ folder) are mapped from native locations on the server to the following locations in the container:
The paths in the log messages refer to paths within the running docker container, not paths on the server.
Contains Errors.tsv. This file contains the summary of all the errors encountered during pipeline execution.
Errors
Errors.tsv
The following files and folders are created during analysis by NovaSeq 6000Dx Analysis Application:
analysisResults.json
CopyComplete.txt
edgeos.nextflow.config
inputs/
When the analysis run completes, the analysis application generates an analysis output in a specified location. To view analysis output, follow the steps below:
On the “Completed” runs tab, select the run
Review the run details page, and this will give the information to access the output folder
External Location: is the input for the run
CombinedVariantOutput
Subfolders per sample ID—Contains the combined variant output TSV files.
A combined output log file.
Contamination
Subfolders per sample ID—Contains the contamination metrics JSON file and output logs.
CoverageReports
DnaFusionFiltering
DragenCaller
Subfolders per sample ID—Contains the aligned BAM and index files, small variant VCF and gVCF, copy number variant VCF, MSI JSON, exon coverage report bed, and QC outputs in CSV format.
FastqValidation—Contains the FASTQ validation output log for the samples.
FastqGeneration
MetricsOutput
Subfolders per sample ID—Contains the metrics output TSV files.
A combined output log file.
ResourceVerification—Contains the resource file checksum verification logs.
Run QC—Contains the Run QC metrics JSON, Intermediate Run QC metrics JSON, and log file.
SampleAnalysisResults
Subfolders per sample ID—Contains the Sample Analysis Results JSON and detailed log file. The sample analysis results file (SARJ) is an aggregated results file created for each sample. The SARJ file is used for the generation of downstream outputs. The file contains passing variants and passing variant annotations.
SampleSheetValidation—Contains the Intermediate sample sheet and validation log.
Passing Sample Steps - JSON file that contains the steps passed for each Sample ID
Tmb
Subfolders per sample ID—Contains the TMB metrics CSV, TMB trace TSV, and related files and logs.
pipeline_trace.txt—Contains a summary and troubleshooting file that lists each Nextflow task executed and the status (for example, COMPLETED or FAILED).
run.log—Contains a complete trace-level log file describing the Nextflow pipeline execution.
run_report.html—Contains high-level run statistics (performance, usage, etc.)
run_timeline.html —Contains timeline-related information about the analysis run.
Metrics Output TSV
TMB Trace TSV
Small Variant Genome VCF
Small Variant VCF
Small Variant Annotated JSON
Copy Number Variant VCF
MSI JSON
Fusions CSV
Exon Coverage Report TSV
Gene Coverage Report TSV
_tags.json
Logs_intermediates
Errors—This folder is only present when analysis fails
Sample_<#>
Fusions.csv
tmb.trace.tsv
hard-filtered.gvcf
hard-filtered.vcf
SmallVariants_Annotated.json.gz
cnv.vcf
exon_cov_report.tsv
gene_cov_report.tsv
MetricsOutput.tsv
microsat_output.json
Contamination
CoverageReports
DnaFusionFiltering
DragenCaller
FastqValidation
FastqGeneration
MetricsOutput
PassingSampleSteps
ResourceVerification
Run QC
SampleAnalysisResults
SampleSheetValidation
Tmb
sampleMapping.json
SampleSheet.csv
SampleSheet.json
Logs_Intermediates
Manifest.tsv
params.json
Results/
workflowLogs/
nf-main-***.log
Navigate to the directory that contains the analysis output folder
Open the folder, and then select the files that you want to view
Run folder
/opt/illumina/run-folder
Sample sheet
/opt/illumina/SampleSheet.csv
FASTQ folder
/opt/illumina/fastq-folder
Resources
/opt/illumina/resources
Analysis output folder
/opt/illumina/analysis-folder
The gene and exon coverage report files are tab-separated value (TSV) files with coverage values matching respectively the exons and genes specified in the manifest file.
File Name: MetricsOutput.tsv
The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline‑suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run.
One metrics output file is generated for the entire run. An additional file is generated for each sample.
All metrics and guidelines are applicable to all versions of DRAGEN TSO 500 ctDNA analysis software (v2.1 and above).
Run metrics from the analysis module indicate the quality of the sequencing run.
Review the following metrics to assess run data quality:
The values in the Run Metrics section are listed as NA in the following situations:
The analysis was started from FASTQ files.
The analysis was started from BCL files and the InterOp files are missing or corrupt.
[NovaSeqX Plus only] There is no PCT_PF_READS value in NovaSeqX Plus runs, so the PCT_PF_READS value will always be NA.
Review the following metrics to assess sample data quality:
*The recommended threshold of 0.059 for GENE_SCALED_MAD only applies to real cell‑free DNA.
For troubleshooting information, refer to
GENE_SCALED_MAD
The median of absolute deviations normalized by gene fold change.
≤ 0.059*
CNV
MEDIAN_BIN_COUNT_CNV_TARGET
The median raw bin count per CNV target.
≥ 6.0
CNV
PCT_PF_READS
Percentage of reads on the sequencing flow cell that pass the filter.
≥ 55.0
(No lower specification limit for NovaSeq X Plus)
PCT_Q30_R1
Percentage of bases with a quality score ≥ 30 from Read 1.
≥ 80.0
(≥ 85.0 for NovaSeq X Plus)
PCT_Q30_R2
Percentage of bases with a quality score ≥ 30 from Read 2.
≥ 80.0
(≥ 85.0 for NovaSeq X Plus)
CONTAMINATION_SCORE
The contamination score based on VAF distribution of SNPs.
≤ 1227
All
MEDIAN_EXON_COVERAGE
Median exon fragment coverage across all exon bases.
≥ 1300
Small variant, TMB, fusion, MSI
PCT_EXON_1000X
Percent exon bases with 1000X fragment coverage.
≥ 80.0
Small variant, TMB
DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.
TOTAL_PF_READS (count)
Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.
Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.
A low value for an individual sample indicates poor pooling of this library into the final pool.
MEAN_FAMILY_SIZE (count)
A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families.
The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.
As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.
UMI family size = 1 is not ideal as it is harder to correct for errors.
UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.
MEDIAN_TARGET_COVERAGE (count)
Median depth across all the unique loci occurring in all regions of the manifest file.
Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.
PCT_CHIMERIC_READS (%)
Chimeric reads occur when one sequencing read aligns to two distinct portions of the genome with little or no overlap. Metric is proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
While this can be indicative of large-scale structural rearrangement of the genome, values that are elevated above the usual baseline may indicate enrichment probe contamination during library preparation. A suggested metric USL is 8% (those that are higher might see decrease performance in small variant and tmb scores).
PCT_EXON_500X (%)
Percentage of exon bases with 500X fragment coverage. Calculated against all regions in manifest containing _exon in name.
Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.
PCT_EXON_1500X (%)
Percentage of exon bases with 1500X fragment coverage. Calculated against all regions in manifest containing _exon in name.
Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons
PCT_READ_ENRICHMENT (%)
Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest.
Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.
PCT_USABLE_UMI_READS (%)
Percentage of reads that have valid UMI sequences associated with them.
As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.
MEAN_TARGET_COVERAGE (count)
Mean depth across all the unique loci defined in the manifest file.
Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.
PCT_ALIGNED_READS (%)
Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.
PCT_CONTAMINATION_EST (%)
This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.
If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)
High contamination estimates may be due to any of the following:
Inter-sample contamination caused by mixing of samples during extraction or library preparation.
Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.
PCT_TARGET_0.4X_MEAN (%)
Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).
Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.
PCT_TARGET_500X (%)
Percentage of target bases with 500X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_1000X (%)
Percentage of target bases with 1000X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_1500X (%)
Percentage of target bases with 1500X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_DUPLEXFAMILIES (%)
Percent of collapsed reads that are duplex (e.g. composed or original forward strand and original reverse strand reads). Number of families that are merged as duplex over total number of families.
Higher is more desirable, lower family depth leads to lower precent duplex families. If low check for under clustering or chemistry concerns.
MEDIAN_INSERT_SIZE (bp)
Median fragment size for sample.
A low median insert size could be a sign of low sample quality or degradation
MAX_SOMATIC_AF
Max somatic allele frequency of a variant; a proxy for tumor fraction. The TMB step flags the variants by potential somatic status using database, VAF and clonal hematopoiesis information. The remaining variants are ranked by variant allele frequency in descending order. The variant allele frequency of first COSMIC hotspot (count >50) or confident somatic variant (having significantly shorter fragment size) is reported as the MaxSomaticVaf for each sample. If no such variant exists, the 4th variant is reported.
This metric is driven by sample tumor fraction
PCT_SOFT_CLIPPED_BASES (%)
Percentage of based that were not used for alignment but retained as part of the alignment file
Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.
PCT_Q30_BASES (%)
Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).
An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.
The following table lists the genes that have associated block listed sites. For the exact location of the block listed site, contact Illumina Technical Support.
ABL1
5
FGFR2
144
\
PAX7
5
AKT2
5
FGFR3
1
PAX8
275
AKT3
20
FGFR4
36
PBRM1
3
ALK
90
FLCN
2
PDCD1
2
ANKRD11
6
FLI1
36
PDGFRA
5
ANKRD26
9
FLT1
91
PDGFRB
2
AR
81
FLT4
3
PDK1
1
ARID1A
40
FOXA1
48
PDPK1
6
ARID1B
87
FOXL2
4
PGR
5
ARID2
1
FOXO1
2
PHF6
2
ASXL1
3
FOXP1
3
PHOX2B
15
ASXL2
5
FUBP1
1
PIK3C2G
2
ATM
2
GATA4
6
PIK3CA
18
ATR
3
GATA6
12
PIK3CB
42
ATRX
17
GEN1
1
PIK3R1
6
AURKA
1
GID4
3
PIK3R2
2
AXIN2
4
GNAQ
4
PLCG2
3
AXL
74
GNAS
11
PLK2
2
BBC3
2
GPR124
3
PMAIP1
7
BCL10
2
GRM3
1
PMS2
1
BCL2L11
16
H3F3A
1
POLE
3
BCOR
2
H3F3C
2
PPARG
446
BCORL1
1
HGF
1
PRDM1
1
BCR
64
HIST1H1C
2
PRKCI
2
BIRC3
1
HLA-A
72
PRKDC
5
BLM
4
HNF1A
2
PTCH1
13
BMPR1A
4
HNRNPK
9
PTEN
41
BRAF
283
HOXB13
1
PTPRS
14
BRCA1
49
HSP90AA1
4
PTPRT
2
BRCA2
21
ICOSLG
6
QKI
2
BRD4
16
IFNGR1
2
RAD21
1
CARD11
4
iIndel
91
RAD50
5
CASP8
2
INHBA
4
RAD51
18
CBL
8
INPP4A
1
RAD51B
8
CCND1
25
INPP4B
1
RAF1
98
CCND3
49
IRS1
9
RANBP2
12
CCNE1
72
IRS2
19
RARA
2
CD74
50
iSNP
4
RASA1
1
CDH1
4
JAK2
4
RB1
5
CDK12
3
JUN
7
RBM10
13
CDK4
46
KAT6A
5
RECQL4
3
CDK6
13
KDM5A
7
REL
3
CDK8
4
KDM5C
2
RET
3
CDKN2B
2
KDM6A
2
RFWD2
22
CEBPA
12
KDR
1
RICTOR
1
CHD2
5
KIF5B
7
ROS1
287
CHD4
12
KIT
5
RPS6KA4
3
CHEK1
75
KMT2B
51
RPS6KB1
109
CHEK2
64
KMT2C
118
RUNX1
3
chrY
93
KMT2D
108
SDHA
18
CIC
2
KRAS
44
SDHB
3
CREBBP
4
LAMP1
64
SDHD
17
CSNK1A1
4
LATS1
1
SETBP1
7
CTNNB1
1
LATS2
4
SETD2
26
CUL3
1
LoH
85
SF3B1
1
CUX1
9
LRP1B
3
SH2B3
4
DAXX
5
LZTR1
1
SH2D1A
2
DDR2
1
MAGI2
2
SLIT2
1
DDX41
1
MALT1
4
SLX4
2
DIS3
2
MAP2K2
1
SMARCA4
4
DNAJB1
6
MAP2K4
5
SMC1A
1
DNMT1
1
MAP3K1
8
SMC3
8
DNMT3A
4
MAP3K14
2
SMO
2
DOT1L
2
MAP3K4
10
SOX10
7
E2F3
70
MAPK1
6
SOX17
1
EGFR
304
MAPK3
6
SOX9
14
EIF4E
12
MCL1
1
SPEN
4
EML4
9
MDC1
23
STAG1
5
EP300
1
MDM2
53
STAG2
2
ERBB2
14
MDM4
67
STAT4
1
ERBB3
62
MED12
28
STAT5A
1
ERCC1
53
MGA
6
STAT5B
4
ERCC2
57
MLL
9
SUFU
5
ERCC3
4
MLLT3
18
SUZ12
9
ERCC5
4
MRE11A
5
TAF1
9
ERG
2
MSH3
10
TBX3
1
ESR1
32
MSH6
2
TCEB1
1
ETS1
45
MSI
148
TCF3
2
ETV1
862
MST1
18
TCF7L2
6
ETV4
502
MYB
402
TERT
2
ETV5
11
MYC
78
TET1
1
ETV6
187
MYCL1
28
TET2
23
EWSR1
364
MYCN
69
TFE3
299
EZH2
2
MYOD1
3
TFRC
33
FANCA
1
NAB2
10
TGFBR1
6
FANCD2
11
NCOA3
28
TGFBR2
2
FANCG
10
NCOR1
9
TMEM127
5
FANCI
1
NF1
3
TMPRSS2
236
FANCL
1
NKX2-1
4
TOP2A
1
FAT1
2
NOTCH1
4
TP53
22
FBXW7
4
NOTCH3
7
TRAF7
4
FGF1
25
NOTCH4
9
TSC1
4
FGF10
17
NPM1
5
TSC2
1
FGF14
15
NRAS
29
U2AF1
1
FGF19
102
NRG1
47
VEGFA
7
FGF2
26
NTRK1
134
WISP3
2
FGF23
38
NTRK2
145
WT1
10
FGF3
60
NTRK3
13
XIAP
1
FGF4
25
NUTM1
134
XPO1
2
FGF5
14
PAK1
68
XRCC2
1
FGF6
9
PAK3
8
YAP1
1
FGF7
9
PALB2
1
ZBTB7A
11
FGF8
30
PARK2
23
ZFHX3
56
FGF9
21
PARP1
2
ZNF703
7
FGFR1
26
PAX3
156
ZRSR2
2