Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The MetricsOutput.tsv
file contains the following quality control metrics for all samples:
DNA library QC metrics for:
Small variant calling
TMB
MSI
CNV
[HRD] GIS
RNA library QC metrics
Run QC metrics, analysis status, and contamination
This TSV file also includes expanded DNA library QC metrics per sample, based on total reads, collapsed reads, chimeric reads, and on-target reads. Analysis using RNA samples also produces RNA library QC metrics and expanded RNA library QC metrics per sample based on total reads and coverage.
The MetricsOutput.tsv
file is a final combined metrics report with sample status, key analysis metrics, and metadata. Sample metrics within the report include suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run.
For troubleshooting information, refer to Troubleshooting
DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.
TOTAL_PF_READS (count)
Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.
Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.
A low value for an individual sample indicates poor pooling of this library into the final pool.
MEAN_FAMILY_SIZE (count)
A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families.
The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.
As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.
UMI family size = 1 is not ideal as it is harder to correct for errors.
UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.
MEDIAN_TARGET_COVERAGE (count)
Median depth across all the unique loci occurring in all regions of the manifest file.
Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.
PCT_CHIMERIC_READS (%)
Chimeric reads occur when one sequencing read aligns to two distinct portions of the genome with little or no overlap. Metric is proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
While this can be indicative of large-scale structural rearrangement of the genome, values that are elevated above the usual baseline may indicate enrichment probe contamination during library preparation. A suggested metric USL is 8% (those that are higher might see decrease performance in small variant and tmb scores).
PCT_EXON_100X (%)
Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.
Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.
PCT_READ_ENRICHMENT (%)
Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest.
Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.
PCT_USABLE_UMI_READS (%)
Percentage of reads that have valid UMI sequences associated with them.
As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.
MEAN_TARGET_COVERAGE (count)
Mean depth across all the unique loci defined in the manifest file.
Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.
PCT_ALIGNED_READS (%)
Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.
PCT_CONTAMINATION_EST (%)
This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.
If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)
High contamination estimates may be due to any of the following:
Inter-sample contamination caused by mixing of samples during extraction or library preparation.
Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.
PCT_TARGET_0.4X_MEAN (%)
Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).
Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.
PCT_TARGET_50X (%)
Percentage of target bases with 50X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_100X (%)
Percentage of target bases with 100X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_250X (%)
Percentage of target bases with 250X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
ALLELE DOSAGE_RATIO (with HRD add-on)
Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.
MEDIAN TARGET HRD (with HRD add-on)
Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.
Refer to RNA Analysis Methods for more information.
The splice variant VCF contains all candidate splice variants targeted by the analysis panel identified by the RNA analysis pipeline. You can apply the following filters for each variant call:
LowQ
Splice variant score < passing quality score threshold value of 1.
PASS
Splice variant score ≥ passing quality score threshold value of 1.
LowUniqueAlignments
All splice junction supporting reads map to a unique genomic interval near at least one of the two splice sites.
Refer to the headers in the output for more information about each column.
If available, each splice variant is annotated using the Illumina Annotation Engine. The following information is captured in the JSON:
HGNC Gene
Transcript
Exons
Introns
Canonical
Consequence
The all fusions CSV file contains all candidate fusions identified by the DRAGEN RNA pipeline. Two output columns in the file describe the candidate fusions: Filter and KeepFusion.
The following table describes the semicolon-separated output found in the Filter columns. The output is either a confidence filter or information only as indicated. If none of the confidence filters are triggered, the Filter column contains the output PASS, else it contains the output FAIL.
Filter Column Output
DOUBLE_BROKEN_EXON
Confidence filter
If both breakpoints are distant from annotated exon boundaries, the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads).
LOW_MAPQ
Confidence filter
All fusion supporting read alignments at either of the breakpoints have MAPQ < 20.
LOW_UNIQUE_ALIGNMENTS
Confidence filter
All fusion supporting read alignments map to a unique genomic interval at either of the breakpoints.
LOW_SCORE
Confidence filter
The fusion candidate has probabilistic score as determined by the features of the candidate.
MIN_SUPPORT
Confidence filter
The fusion candidate has very few fusion supporting reads (< 5 supporting read pairs).
READ_THROUGH
Confidence filter
The breakpoints are cis neighbors (< 200 kbp) on the reference genome.
ANCHOR_SUPPORT
Information only
Read alignments of fusion supporting reads are not long enough (12 bp) at either of the two breakpoints.
HOMOLOGOUS
Information only
The candidate is likely a false candidate generated because the two genes involved have high gene homology.
LOW_ALT_TO_REF
Information only
The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints.
LOW_GENE_COVERAGE
Information only
Each breakpoint in an enriched gene has fewer than 125 bp with nonzero read coverage.
NO_COMPLETE_SPLIT_READS
Confidence filter
For every fusion-supporting split read, the total number of aligned bases across two breakpoints is less 60% of the read length.
UNENRICHED_GENE
Confidence filter
Neither of the two parent genes is in the enrichment panel.
The KeepFusion column of the output has a value of TRUE when none of the confidence filters are triggered.
Refer to the headers in the output for more information about each column.
Fusion Columns
Gene A
The gene associated with the A side of the fusion. A semicolon delimited list is used for multiple genes.
Gene B
The gene associated with the B side of the fusion. A semicolon delimited list is used for multiple genes.
Gene A Breakpoint
[Information only] The chromosome and offset of the Gene A side of the fusion.
Gene A Location
Location of the breakpoint within Gene A: - IntactExon—Matches exon boundary - BrokenExon—Inside an exon - Intronic—Within an intron - Intergenic—No gene overlap (currently excluded) If multiple genes are in Gene A, then semicolon separated list of locations. This column is used internally to identify genes to report when a breakpoint occurs in a region overlapping multiple genes. Occasionally, additional values are listed for genes that were excluded from the GeneA list.
Gene A Sense
Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene A. If multiple genes are in Gene A, then semicolon separated list of bools.
Gene A Strand
Strand of Gene A, + for forward, - for reverse.
Gene B Breakpoint
[Information only] The chromosome and offset of the Gene B side of the fusion.
Gene B Location
Location of the breakpoint within Gene B: - IntactExon—Matches exon boundary - BrokenExon—Inside an exon - Intronic—Within an intron - Intergenic—No gene overlap (currently excluded) If multiple genes in Gene B, then semicolon separated list of locations. This column is used internally to identify genes to report when a breakpoint occurs in a region overlapping multiple genes. Occasionally, additional values are listed for genes that were excluded from the GeneB list.
Gene B Sense
Boolean indicating whether left/right breakpoint order suggests fusion transcript is in the same sense of Gene B. If multiple genes are in Gene B, then semicolon separated list of bools.
Gene B Strand
Strand of Gene B, + for forward, - for reverse.
Score
The quality of fusion as determined by DRAGEN server.
Filter
The filter associated with the fusion as determined by the respective caller. Results from different callers are not equivalent.
Ref A Dedup
Gene A uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.
Ref B Dedup
Gene B uniquely mapping reads paired across or split by the junction. Does not support fusion. Duplicate reads are not included.
Alt Split Dedup
Uniquely mapping reads split by the junction. Supports fusion. Duplicate reads are not included.
Alt Pair Dedup
Uniquely mapping reads paired across junction. Supports fusion. Duplicate reads are not included.
KeepFusion
The determination whether the fusion should be kept or dropped from the list of fusions.
Fusion Directionality Known
Whether fusion directionality is known and indicated by gene order.
When using Microsoft Excel to view this report, genes that are convertible to dates (such as MARCH1 automatically convert to dd-mm format (1 Mar) by Excel. The following are fusion allow list genes:
ABL1
AKT3
ALK
AR
AXL
BCL2
BRAF
BRCA1
BRCA2
CDK4
CSF1R
EGFR
EML4
ERBB2
ERG
ESR1
ETS1
ETV1
ETV4
ETV5
EWSR1
FGFR1
FGFR2
FGFR3
FGFR4
FLI1
FLT1
FLT3
JAK2
KDR
KIF5B
KIT
KMT2A
MET
MLLT3
MSH2
MYC
NOTCH1
NOTCH2
NOTCH3
NRG1
NTRK1
NTRK2
NTRK3
PAX3
PAX7
PDGFRA
PDGFRB
PIK3CA
PPARG
RAF1
RET
ROS1
RPS6KB1
TMPRSS2
The block list represents high noise regions in the panel where false positive variant calls are likely produced. As a result, all positions in the gVCF are marked as Filter=excluded_regions to indicate variant call results are not reliable in such regions.
The block list includes the following genes:
HLA A
HLA B
HLA C
KMT2B
KMT2C
KMT2D
chrY
Any position with VAF 1% occurrence in six or more of the 60 baseline samples.
The software processes sequencing data to perform quality control, detect variants, determine tumor mutational burden (TMB), microsatellite instability (MSI) status, and genomic instability score (GIS), and report results. The following sections describe the analysis methods used in DRAGEN TruSight Oncology 500 Analysis Software.
DRAGEN TruSight Oncology 500 Analysis Software uses the following workflows to analyze sequencing data.
FASTQ Generation
DNA Analysis
DNA Alignment and Realignment
Read Collapsing
Indel Realignment and Read Stitching
Small Variant Calling
Small Variant Filtering
Copy Number Variant (CNV) Calling
Phased Variant Calling
Variant Merging
Annotation
Tumor Mutational Burden (TMB) Scoring
Microsatellite Instability (MSI) Status
Contamination Detection
RNA Analysis
Downsampling
Read Trimming
Alignment
Duplicate Marking
Fusion Calling
RNA Fusion Filtering
Splice Variant Calling
Annotation
Fusion Merging
Quality Control
Run QC
DNA Sample QC
RNA Sample QC
RNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.
PCT_CHIMERIC_READS
Percentage of reads that are aligned as two segments which map to nonconsecutive regions in the genome.
%
PCT_ON_TARGET_READS
Percentage of reads that cross any part of the target region versus total reads. A read that partially maps to a target region is counted as on target.
%
SCALED_MEDIAN_GENE_COVERAGE
Median of median base coverage of genes scaled by length. An indication of median coverage depth of genes in the panel.
Count
TOTAL_PF_READS
Total number of reads passing filter.
Count
GENE_MEDIAN_COVERAGE
The median coverage depth of all genes in the panel.
Count
GENE_ABOVE_MEDIAN_CUTOFF
Number of genes above the median coverage cutoff.
Count
PER_GENE_MEDIAN_COVERAGE
Median deduped coverage across each gene (available in Logs_Intermediates only)
Count
Sequencing data stored in BCL format are demultiplexed through a process that uses the index sequences unique to each sample to assign clusters to the library from which they originated. Each cluster contains two indexes (i7 and i5 sequences, one at each end of the library fragment). The combination of those index sequences are used to demultiplex the pooled libraries.
After demultiplexing, this process generates FASTQ files, which contain the sequencing reads for each individual sample library and the associated quality scores for each base call, excluding reads from any clusters that did not pass filter.
Refer to RNA Output for more information.
Each sample is downsampled to 30 million RNA reads. This number represents the total number of single reads (eg, R1 + R2, from all lanes). When using the recommended sequencing configurations or plexity, the samples can have fewer reads than the downsampling limit. In these cases, the FASTQ files are left as-is.
Reads are trimmed to 76 base pairs for further processing.
RNA alignment and fusion detection uses trimmed reads in FASTQ format as input. The outputs include a BAM file that contains duplicate-marked read alignments, an SJ.out.tab file that contains unannotated splice junctions, and a CSV file that contains fusion candidates.
DRAGEN aligns RNA reads in a transcript-aware mode using the human hg19 genome containing unplaced contigs (ie, chrUn_gl regions) and uses GENCODEv19 transcript annotations to identify splice sites. DRAGEN identifies and marks duplicate read alignments using start and end coordinates of alignments, which are adjusted for soft clipped reads.
Fusion and splice variant calling only use deduped fragments to score variants. DRAGEN identifies fusion candidates using chimeric split read alignments (pairs of primary and supplementary alignments) against multiple genes. DRAGEN scores and filters reads based on the various features of each candidate such as the number of supporting reads, mapping quality of supporting reads, and sequence homology between parent genes.
The DRAGEN RNA Fusion caller identifies gene fusions by searching for chimeric reads spanning two distinct parent genes. Based on the chimeric reads, DRAGEN first creates a list of fusion candidates, then scores the candidates to report the list of high confidence fusion calls from the candidate pool.
DRAGEN RNA Fusion caller performs the following steps:
Generates fusion candidate generation based on split read alignment.
Recruits additional evidence from fusion supporting discordant read pairs and soft-clipped reads.
Computes fusion candidate features such as gene coverage, read mapping quality, alternate allele frequency, gene homology, alignment anchor length, and breakpoint distance from exon boundary.
Scores and ranks the fusion candidates using a logistic regression model.
Selects a final list of fusion calls based on score and other filters including number of supporting reads, unique read alignment count, read through transcripts, and fusions matching the enriched regions.
RNA splice variant calling is performed for RNA sample libraries. Candidate splice variants (junctions) from RNA Alignment are compared against a database of known transcripts and a splice variant baseline of non-tumor junctions generated from a set of normal FFPE samples from different tissue types. Any splice variants that match the database or baseline are filtered out unless they are in a set of junctions with known oncological function. If there is sufficient read support, the candidate splice variant is kept. This process also identifies candidate RNA fusions.
Fusions identified during RNA fusion calling are merged with fusions from proximal genes identified during RNA splice variant calling. These are then annotated with gene symbols or names with respect to a static database of transcripts (GENCODE Release 19). The result of this process is a set of fusion calls that are eligible for reporting
The Illumina Annotation Engine annotates detected RNA splice variant calls with transcript-level changes (eg, affected exons in the transcript of a gene) with respect to RefSeq. This RefSeq database is the same RefSeq database used by the small variant annotation process.
When the analysis run completes, the DRAGEN TruSight Oncology 500 Analysis Software generates an analysis output folder in a specified location.
To view analysis output, navigate to the analysis output folder and select the files that you want to view.
Single output folder structure is as follows.
Logs_Intermediates
AdditionalSarjMetrics— Contains per pair ID calculations to support the PCT_TARGET_250X metric.
Annotation—Contains outputs for small variant annotation.
Subfolders per sample ID—Contains the aligned small variants JSON.
CombinedVariantOutput
Subfolders per pair ID—Contains the combined variant output TSV files.
A combined output log file.
Contamination
Subfolders per DNA sample ID—Contains the contamination metrics JSON file and output logs.
DnaDragenCaller
Subfolders per sample ID—Contains the aligned BAM and index files, small variant VCF and gVCF, copy number variant VCF, MSI JSON, and QC outputs in CSV format.
DnaDragenExonCNVCaller
Subfolders per DNA sample ID—Contains the exon-level CNV JSON,the supporting calculation, and the QC files.
DnaFastqValidation—Contains the FASTQ validation output log for DNA samples.
FastqDownsample
Subfolders per RNA sample ID—Contains FASTQ files and output logs.
FastqDownsample output
FastqGeneration
Gis—Contains GIS-related files for HRD samples.
Subfolders per HRD sample ID—Contains the GIS JSON, the supporting calculation, and the QC files.
Also contains the annotated CNV VCF and gene level TSV file with absolute copy number and minor copy number information
LrAnnotation
Subfolders per DNA sample ID—Contains the annotated exon-level CNV JSON.
LrCalculator
Subfolders per DNA sample ID—Contains the exon-level CNV VCF.
MetricsOutput
Subfolders per pair ID—Contains the metrics output TSV files.
A combined output log file.
ResourceVerification—Contains the resource file checksum verification logs.
RnaAnnotation
Subfolders per RNA sample ID—Contains the annotated splice variant JSON.
RnaDragenCaller
Subfolders per sample ID—Contains the aligned BAM, fusion candidates CSV and QC outputs in CSV format.
RnaFastqValidation—Contains the FASTQ validation output log for RNA samples.
RnaFusion
Subfolders per RNA sample ID—Contains the All Fusions CSV and Fusion Processor logs.
RnaQcMetrics
Subfolders per RNA sample ID—Contains the RNA QC metrics JSON.
RnaSpliceVariantCalling
Subfolders per RNA sample ID—Contains the splice variants VCF.
Run QC—Contains the Run QC metrics JSON, Intermediate Run QC metrics JSON, and log file.
SampleAnalysisResults
Subfolders per pair ID—Contains the Sample Analysis Results JSON and detailed log file.
SampleSheetValidation—Contains the Intermediate sample sheet and validation log.
Tmb
Subfolders per DNA sample ID—Contains the TMB metrics CSV, TMB trace TSV, and related files and logs. passing_sample_steps.json
—Contains the steps passed for each sample ID.
pipeline_trace.txt
—Contains a summary and troubleshooting file that lists each Nextflow task executed and the status (for example, COMPLETED or FAILED).
run.log
—Contains a complete trace-level log file describing the Nextflow pipeline execution.
run_report.html
—Contains high-level run statistics (performance, usage, etc.)
run_timeline.html
—Contains timeline-related information about the analysis run.
Results
Metrics Output TSV (all pair IDs)
Pair ID—The following outputs are produced for each sample:
Combined Variant Output TSV
Metrics Output TSV
TMB Trace TSV
Small Variant Genome VCF
Small Variant Genome Annotated JSON
Copy Number Variant VCF
GIS JSON
MSI JSON
Large Rearrangements CNV VCF
Large Rearrangements CNV Annotated JSON
All Fusion CSV
Splice Variant VCF
Splice Variant Annotated JSON
Multiple output folder structure is as follows.
Demultiplex Output
A Logs_Intermediates folder containing FASTQ files per sample.
Node(X) Output—The following outputs are produced for each node used:
A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in the analysis pipeline for the sample run on the node.
A Results folder containing results only for the sample run on the node.
Gathered Output
A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in each analysis pipeline on every node—this contains outputs for all samples and pairs ran across all nodes in the analysis.
A Results folder containing results for all samples and pairs ran across all nodes—results are organized by Pair_ID, then Sample_ID. This folder also contains summary files which contain information on all samples.
This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed. The same output folder structure and content exist in ICA and BaseSpace Sequence Hub.
Run ID
TSO500_Nextflow_logs
_manifest.json
Results
_tags.json
Logs_intermediates
Errors—This folder is only present when analysis fails
The TSO_500_Nextflow_Logs provides information related to the execution of the pipeline on ICA as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the nextflow execution on those nodes.
TSO_500_Nextflow_Logs
_manifest.json
Contains the aggregated MetricsOutput.tsv file at the root level. Additionally, the Results folder contains a subfolder for each pair ID.
Results
MetricsOutput.tsv
Sample_1
Sample_2
Sample_<#>
_tags.json
The Results
subfolder contains the following files:
Results
MetricsOutput.tsv
<Pair_id>
CombinedVariantOutput.tsv
<SampleName>_MetricsOutput.tsv
<DNA_Sample_id>
CopyNumberVariants.vcf
DNAMergedSmallVariants_Annotated.json.gz
MergedSmallVariants.genome.vcf
MergedSmallVariants.vcf
microstat_output.json
TMB_Trace.tsv
<RNA_Sample_id>
AllFusions.csv
RNA_Annotated.json.gz
SpliceVariants.vcf
Contains folders for each submodule in the DRAGEN TSO 500 on ICA pipeline. The folders contain a copy of all the relevant files required to create the metric output files and report files, as well as the combined log files at the root level and subfolders for each sample.
Logs_intermediates
DnaDragenCaller
AdditionalSarjMetrics
CombinedVariantOutput
FastqGeneration
MetricsOutput
DnaDragenExonCnvCaller
DnaFastqValidation
Gis
Tmb
SampleAnalysisResults
SampleSheetValidation
passing_sample_steps.json
RnaFusion
Contamination
Annotation
RnaAnnotation
RnaDragenCaller
RnaSpliceVariantCalling
RunQc
FastqDownsample
PassingSampleSteps
ResourceVerification
LrCalculator
LrAnnotation
RnaQcMetrics
RnaFastqValidation
Contains Errors.tsv. This file contains the summary of all the errors encountered during pipeline execution.
Errors
Errors.tsv
Refer to DNA Analysis Methods for more information.
File name: {SAMPLE_ID}_hard-filtered.gvcf.gz
The small variant genome variant call file contains information on all candidate small variants evaluated, including complex variants up to 15 bp from phased variant calling across the entire TSO 500 panel.
The variant status is determined by the FILTER column in the genome VCF as follows.
PASS
PASS variants.
base_quality
Site filtered because median base quality of alt reads at this locus does not meet threshold.
filtered_reads
Site filtered because the fraction of reads is too large.
fragment_length
Site filtered because absolute difference between the median fragment length of alt reads and median fragment length of ref reads at this locus exceeds threshold.
low_depth
Site filtered because the read depth is too low.
low_frac_info_reads
Site filtered because the fraction of informative reads is below threshold.
long_indel
Site filtered because the indel length is too long.
mapping_quality
Site filtered because median mapping quality of alt reads at this locus does not meet threshold.
multiallelic
Site filtered because more than two alt alleles pass tumor LOD.
no_reliable_supporting_read
Site filtered because no reliable supporting somatic read exists.
read_position
Site filtered because median of distances between start/end of read and this locus is below threshold.
str_contraction
Site filtered due to suspected PCR error where the alt allele is one repeat unit less than the reference.
too_few_supporting_reads
Site filtered because there are too few supporting reads in the tumor sample.
weak_evidence
Somatic variant score (SQ) does not meet threshold.
systematic_noise
Site filtered based on evidence of systematic noise in normal sample.
excluded_regions
Site overlaps with VC excluded regions bed.
File name: {SAMPLE_ID}_DNAVariants_Annotated.json.gz
The small variants annotated file provides variant annotation information for all nonreference positions from the genome VCF including pass and nonpass variants.
The TMB trace file provides comprehensive information on how the TMB value is calculated for a given sample. All passing small variants from the small variant filtering step are included in this file. To calculate the numerator of the TmbPerMb value in the TMB JSON, set the TSV file filter to use the IncludedInTMBNumerator with a value of True.
The TMB trace file is not intended to be used for variant inspections. The filtering statuses are exclusively set for TMB calculation purposes. Setting a filter does not translate into the classification of a variant as somatic or germline.
Chromosome
Chromosome
Position
Position of variant
RefCall
Reference base
AltCall
Alternate base
VAF
Variant allele frequency
Depth
Coverage of position
CytoBand
Cytoband of variant
GeneName
Name of gene if applicable. A semicolon delimited list is used for multiple genes.
VariantType
Type of the variant: SNV, insertion, deletion, MNV
CosmicIDs
Cosmic IDs, if multiple concatenated by “;”
MaxCosmicCount
Maximum Cosmic study count
AlleleCountsGnomadExome
Variant allele count in gnomAD exome database
AlleleCountsGnomadGenome
Variant allele count in gnomAD genome database
AlleleCounts1000Genomes
Variant allele count in 1000 genomes database
MaxDatabaseAlleleCounts
Maximum variant allele count over the three databases
GermlineFilterDatabase
TRUE if variant was filtered by the database filter
GermlineFilterProxi
TRUE if variant was filtered by the proxi filter
CodingVariant
TRUE if variant is in the coding region
Nonsynonymous
TRUE if variant has any transcript annotations with nonsynonymous consequences
IncludedinTMBNumberator
TRUE if variant is used in the TMB calculation
The copy number VCF file contains CNV calls for DNA libraries of the amplification genes targeted by DRAGEN TruSight Oncology 500 Analysis Software. The CNV call indicates fold change results for each gene classified as reference, deletion, or amplification.
The value in the QUAL column of the VCF is a Phred transformation of the p-value where Q=-10xlog10(p-value). The p-value is derived from the t-test between the fold change of the gene against the rest of the genome. Higher Q-scores indicate higher confidence in the CNV call.
In the VCF notation, <DUP> indicates the detected fold change (FC) is greater than a predefined amplification cutoff. <DEL> indicates the detected FC is less than a predefined deletion cutoff for that gene. This cutoff can vary from gene to gene.
In analysis versions prior to v2.5, <DEL> calls in the VCF are marked as LowValidation. The LowValidation filter indicates that the calls have been validated only with in silico data sets and are provided as information only.
Each copy number variant is reported as a fold change on normalized read depth in a testing sample relative to the normalized read depth in diploid genomes. Given tumor purity, you can infer the ploidy of a gene in the sample from the reported fold change.
Given tumor purity X%, for a reported fold change Y, you can calculate the copy number n using the following equation:
For example, a tumor purity at 30% and a MET with fold change of 2.2x indicates that 10 copies of MET DNA are observed.
DNA alignment and error correction involves aligning sequencing reads derived from DNA libraries to a reference genome and correcting errors in the sequencing reads prior to variant calling.
DRAGEN unique molecular identifier (UMI) error correction comprises three main steps:
DRAGEN UMI uses its HW accelerated mapper (based on a hash table implementation) to align DNA sequences in FASTQ files to the hg19 reference genome. These alignments are not written to a BAM.
The raw alignments are processed to remove errors, including errors introduced during FFPE preservation, PCR amplification, and sequencing. Reads from the same original DNA molecule are tagged with the same UMI during library preparation. The UMI allows DRAGEN to compare related reads, remove outlier signals, and collapse multiple reads into a single high-quality sequence. Read collapsing adds the following BAM tags:
RX/XU—UMI.
XV—Number of reads in the family.
XW—Number of reads in the duplex-family or 0 if not a duplex family.
DRAGEN performs a final alignment step on the UMI-collapsed reads. These final alignments are then written to a BAM file and a corresponding BAM index file is created.
DRAGEN continues to use these final alignments as input for gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), microsatellite instability (MSI) status determination, and DNA library quality control.
DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. Variant calling extends ± 10 bp into introns; details of the regions covered can be found in the assay manifest file. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.
DRAGEN small variant calling includes the following steps:
Detects regions with sufficient read coverage (callable regions).
Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).
Assembles de novograph haplotypes are assembled from reads (haplotype assembly).
Extracts possible somatic or germline calls (events) from column wise pileup analysis.
Calibrates read base qualities to account for FFPE noise.
Computes read likelihoods for each read/haplotype pair.
Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.
Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.
The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.
The BRCA large rearrangement step generates segmentation of the BRCA1 and BRCA2 genes for exon-level CNV detection from the BAM file. Using the same method as CNV calling, the large rearrangement component counts coverage of each target interval of the panel, performs normalization, and calculates the fold change values for each probe across the BRCA genes. Normalization includes GC bias correction, sequencing depth, and probe efficiency using a collection of normal FFPE and genomic DNA samples. Initial segmentation is performed for each gene with circular binary segmentation. The merging of segments is then determined by amplitude, noise, and variance at adjacent segments using thresholds established with in silico data. A large rearrangement is reported for genes with more than one segment. Coordinates of the exon-level CNV and the log2 mean fold change for each of the BRCA gene segments are found in the *_DragenExonCNV.json
file.
The Illumina Annotation Engine performs annotation of small variants, CNVs, and exon-level CNVs. The inputs are gVCF files and the outputs are annotated JSON files.
The Illumina Annotation Engine processes each variant entry and annotates with available information from databases such as dbSNP, gnomAD genome and exome, 1000 genomes, ClinVar, COSMIC, RefSeq, and Ensembl. The header includes version information and general details. Each annotated variant is included as a nested dictionary structure in separate lines following the header.
The following table shows version information for each annotation database:
DRAGEN is used to compute tumor mutational burden (TMB) in coding regions where there is sufficient coverage.
The following variants are excluded from the TMB calculation:
Non-PASS variants.
Mitochondrial variants.
MNVs.
Variants that do not meet a minimum depth threshold.
Variants that do not meet the minimum variant allele threshold.
Variants that fall outside the eligible regions.
Tumor driver mutations. Variants with a population allele count ≥ 50 are treated as tumor driver mutations. Germline variants are not counted towards TMB. Variants are determined as germline based on a database and a proxy filter.
Variants with a population allele count ≥ 10 that are observed in either the 1000 Genomes or gnomAD databases are marked as germline. MNVs, which do not count towards TMB, may be marked as germline when all their component small variants are marked as germline. The proxy filter scans the variants surrounding a specific variant and identifies those variants with similar variant allele frequencies (VAF). If the majority of surrounding variants of similar VAF are germline, then the variant is also marked as germline.
The formula for TMB calculation is:
Outputs are captured in a _TMB_Trace.tsv
file that contains information on variants used in the TMB calculation and a .tmb.json
file that contains the TMB score calculation and configuration details.
DRAGEN can determine the MSI status of a sample. It uses a normal reference file, which was created from a set of normal samples. During sequencing, normal reference files are generated by tabulating read counts for each microsatellite site. The normal file contains the read count distribution for each microsatellite.
MSI calling for a tumor-only sample is performed by first tabulating tumor counts from the read alignments for each microsatellite site. Then, the Jensen-Shannon distance (JSD) is calculated between each pair of tumor and normal baseline samples. DRAGEN determines unstable sites by performing Chi-square testing of tumor JSD and normal JSD distributions. Unstable sites are called if the mean distance difference of the two JSD distributions is ≥to the distance threshold and Chi-square p-value is ≤ to the p-value threshold. Lastly, DRAGEN produces an MSI status given assessed site count, unstable site count, the percentage of unstable sites in all assessed sites, and the sum of the Jensen-Shannon distance of all the unstable sites.
Requires HRD add-on assay
Genomic instability score (GIS) is a whole genome signature for homologous recombination deficiency. The GIS is composed of the sum of three components: loss of heterozygosity, telomeric allele imbalance, and large-scale state transition. These components are estimated using the GIS algorithm contracted from Myriad Genetics, which uses an input of the b-allele frequency and coverage across a genome-wide single nucleotide panel. A panel of normal samples is used for both bias reduction and normalization prior to GIS estimation. Final GIS results can be found in the *.gis.json
file.
The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.
The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.
This is a beta feature. Beta feature results are included in the Combined Variant Output file and other files. However, disclaimers that the results are generated by beta features are only provided in the Combined Variant Output file. Requires HRD add-on assay.
Tumor fraction is calculated as described in the User Guide, section “HRD Metrics Report” and leverages the Myriad Genetics algorithm. Tumor fraction is output in the Logs_Intermediates/Gis/SAMPLE/SAMPLE.gis.json and Combined Variant Output file.
This is a beta feature. Beta feature results are included in the Combined Variant Output file and other files. However, disclaimers that the results are generated by beta features are only provided in the Combined Variant Output file. Requires HRD add-on assay.
Ploidy is calculated as described in the User Guide, section “HRD Metrics Report” and leverages the Myriad Genetics algorithm. Ploidy is output in the in the Logs_Intermediates/Gis/SAMPLE/SAMPLE.gis.json and Combined Variant Output file.
This is a beta feature. Beta feature results are included in the Combined Variant Output file and other files. However, disclaimers that the results are generated by beta features are only provided in the Combined Variant Output file. Requires HRD add-on assay.
Absolute copy numbers are calculated by leveraging the Myriad Genetics algorithm. The algorithm segments the entire genome using the HRD panel and provides an A and B allele estimate for each segment. After the TSO 500 pipeline determines CNV calls (using the TSO 500 panel), the segment covering the gene is identified, and the A and B allele numbers of the segment overlapping the gene are reported. If the gene is within 300 kbases from the segment boundary, the estimate is unreliable and “-1” is output. Absolute copy numbers are output in the Logs_Intermediates/Gis/SAMPLE/SAMPLE.abcn_annotated.vc f, Logs_Intermediates/Gis/SAMPLE/SAMPLE.abcn_genes.tsv and Combined Variant Output file.
This is a beta feature. Beta feature results are included in the Combined Variant Output file and other files. However, disclaimers that the results are generated by beta features are only provided in the Combined Variant Output file. Requires HRD add-on assay.
Gene-level loss of heterozygosity is calculated based on the minor copy number reported in the abcn_annotated.vc f. If the minor copy number is 0 then the gene is assumed to have a loss of heterozygosity. Gene-level loss of heterozygosity is output in the Logs_Intermediates/Gis/SAMPLE/SAMPLE.abcn_genes.tsv and Combined Variant Output file.
The following sections describe performance testing methods.
Illumina tests the analytical performance of variant calling using an approach that covers the entire workflow including library preparation, sequencing, and secondary analysis. This approach is used to test a diverse selection of variants. When the variant calling pipeline is expanded to call a new variant class, this approach is always used.
This version of the software includes results generated by features tested in silico and by beta features. Beta features have not been fully evaluated for performance, see .
Illumina uses in silico testing to the test the ability of the software to call an expanded scope of clinically relevant variants, including rare variants. In silico testing is used as a complementary method to analytical performance testing with wet lab step to expand the scope of testing. For example, while Illumina has analytically verified the performance of the software for calling complex variants in EGFR, the in silico testing approach characterizes the ability of the software to call complex variants in other genes.
For in silico testing, variants of interest are extracted from public databases like Cosmic and ClinVar. Each variant is simulated at different VAF levels by, depending on the variant class, spiking in mutant reads into a normal FFPE background (for sequence variants) or by increasing or decreasing the coverage of exons in the normal FFPE sample (for CNVs, for example, exon-level CNVs. The simulated reads match the expected quality of typical FFPE samples, such as fragment length, error rate, and family size. After the simulation, the software processes samples with spiked-in variants and determines the results. This approach does not include library prep and sequencing of tumor FFPE samples that include the rare variants of interest. The software reports these variants, but analytical verification was not performed.
DRAGEN TruSight Oncology 500 Analysis Software includes the following features that were tested i_n silico_ for both TruSight Oncology 500 and TruSight Oncology 500 HT:
Complex variants in genes beyond EGFR
Insertions and deletions > 25 bp
CNV amplifications
CNV deletions
Variants in intron-exon junctions (2 bp – 10 bp into introns)
In addition, the following features were tested in silico for TruSight Oncology 500 HT:
Exon-level CNVs in BRCA1 and BRCA2
This version of DRAGEN TruSight Oncology 500 Analysis Software includes beta features which have not been verified by Illumina due to limited access to samples or lack of an appropriate orthogonal method to perform testing, and, the use of in silico testing alone is not sufficient for verification purposes.
Customers are responsible for evaluating and demonstrating performance of any beta features they choose to implement. Beta features are indicated as such in the CombinedVariantOutput.tsv file. Illumina will continue to evaluate beta features with intent to fully release upon completion of verification for each feature.
This version includes the following beta features that may be used with the TruSight Oncology 500 HRD Assay:
Tumor fraction (beta)
Ploidy (beta)
Absolute copy numbers (ACN) (beta)
Gene-level loss of heterozygosity (LOH) events (beta)
Beta feature results are included in the Combined Variant Output file and other files. However, disclaimers that the results are generated by beta features are only provided in the Combined Variant Output file.
The software calculates several quality control metrics for runs and samples.
The Run Metrics section of the metrics output report provides sequencing run quality metrics along with suggested values to determine if they are within an acceptable range. The overall percentage of reads passing filter is compared to a minimum threshold. For Read 1 and Read 2, the average percentage of bases ≥ Q30, which gives a prediction of the probability of an incorrect base call (Q‑score), are also compared to a minimum threshold. The following tables show run metric and quality threshold information for different systems.
The values in the Run Metrics section are listed as NA in the following situations:
If the analysis was started from FASTQ files.
If the analysis was started from BCL files and the InterOp files are missing or corrupt.
DRAGEN TruSight Oncology 500 uses QC metrics to assess the validity of analysis for DNA libraries that pass contamination quality control. If the library fails one or more quality metrics, then the corresponding variant type or biomarker is not reported, and the associated QC category in the report header displays FAIL. Additionally, a companion diagnostic result may not be available if it relies on QC passing for one or more of the following QC categories.
DNA library QC results are available in the MetricsOutput.tsv
file. Refer to Metrics Output for details.
The input for RNA Library QC is RNA alignment. Metrics and guideline thresholds can be found in the MetricsOutput.tsv
file. Refer to Metrics Output for details.
To avoid failing RNA samples unnecessarily, Illumina does not recommend a universal threshold to determine RNA sample quality. RNA expression varies significantly across tissue types and a small panel size (55 genes), which makes normalization challenging. Tissue-specific thresholds could be considered for normalization.
File name: {Pair_ID}_CombinedVariantOutput.tsv
The combined variant output file contains the variants and biomarkers in a single file that is based on a single sample. If using pair ID, the file is based on paired DNA and RNA samples from the same individual. The output contains the following variant types and biomarkers:
Small variants
Copy number variants (CNV) (with absolute copy number when HRD Assay is run)
TMB
MSI
Fusions
Splice variants
GIS (when HRD Assay is run)
Gene-level Loss of Heterozygosity (when HRD Assay is run)
Exon-level CNVs
The combined variant output file also contains Analysis Details and Sequencing Run Details sections. The details of each are listed in the following table:
Combined variant output produces small variants with blank fields in the following situations:
The variant has been matched to a canonical RefSeq transcript on an overlapping gene not targeted by TruSight Oncology 500.
The variant is located in a region designated iSNP, indel, or Flanking in the TST500_Manifest.bed
file located in the Resources folder.
Small Variants - All variants with the FILTER field marked as PASS in the hard-filtered genome VCF are present in the combined variant output.
Gene information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
Transcript information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
Copy Number Variants - Copy number variants must meet the following conditions:
FILTER field marked as PASS.
ALT field is <DUP or <DEL> .
Fusion Variants - Fusion variants must meet the following conditions:
Passing variant call (KeepFusion field is true).
Contains at least one gene on the fusion allow list.
Genes separated by a dash (-) indicate that the fusion directionality could be determined. Genes separated by a slash (/) indicate that the fusion directionality could not be determined.
Biomarkers TMB/MSI - Always present when DNA sample is processed.
Splice Variants - Passing splice variants that are contained on genes EGFR, MET, and AR.
Biomarker GIS - Present only if TruSight Oncology 500 HRD analysis is performed
Loss of Heterozygosity - Present only when TruSight Oncology 500 HRD is run. Loss of heterozygosity (LOH) must meet the following condition:
MCN field is equal to 0
Exon-level CNVs - Exon-levels CNVs must meet the following conditions:
BRCA1 or BRCA2 contains at least one affected exon.
ALT field is <DUP> or <LOSS> .
PCT_PF_READS (%)
Total percentage of reads passing filter.
≥80.0
All
PCT_Q30_R1 (%)
Percentage of Read 1 reads with quality score ≥ 30.
≥80.0
All
PCT_Q30_R2 (%)
Percentage of Read 2 reads with quality score ≥ 30.
≥80.0
All
PCT_PF_READS (%)
Total percentage of reads passing filter.
≥55.0
All
PCT_Q30_R1 (%)
Percentage of Read 1 reads with quality score ≥ 30.
≥80.0
All
PCT_Q30_R2 (%)
Percentage of Read 2 reads with quality score ≥ 30.
≥80.0
All
PCT_PF_READS (%)
Total percentage of reads passing filter.
≥85.0
All
PCT_Q30_R1 (%)
Percentage of Read 1 reads with quality score ≥ 30.
≥85.0
All
PCT_Q30_R2 (%)
Percentage of Read 2 reads with quality score ≥ 30.
≥85.0
All
PCT_Q30_R1 (%)
Percentage of Read 1 reads with quality score ≥ 30.
≥85.0
All
PCT_Q30_R2 (%)
Percentage of Read 2 reads with quality score ≥ 30.
≥85.0
All
CONTAMINATION_SCORE
The contamination score is based on VAF distribution of SNPs.
Contamination Score ≤
All
MEDIAN_EXON_COVERAGE
Median exon fragment coverage across all exon bases.
≥ 150
Small variant TMB
PCT_EXON_50X
Percent exon bases with 50x fragment coverage.
≥ 90.0
Small variant TMB
MEDIAN_INSERT_SIZE
The median fragment length in the sample.
≥ 70
Small variant TMB
USABLE_MSI_SITES
The number of MSI sites usable for MSI calling.
≥ 40
MSI
MEDIAN_BIN_COUNT_CNV_TARGET
The median raw bin count per CNV target.
≥ 1.0
CNV
MEDIAN_CV_GENE_500X
The median CV for all genes with median coverage > 500x. Genes with median coverage > 500x are likely to be highly expressed. Higher CV median > 500x indicates an issue with library preparation (poor sample input and/or probes pulldown issue).
Fusion Splice
MEDIAN_INSERT_SIZE
The median fragment length in the sample.
≥ 80
Fusion Splice
TOTAL_ON_TARGET_READS
The total number of reads that map to the target regions.
≥ 9000000
Fusion Splice
GENE_MEDIAN_COVERAGE
The median deduped coverage across all genes in the RNA panel (55 genes).
N/A*
Fusion Splice
- Pair ID - DNA sample ID (if DNA is run) - RNA sample ID (if RNA is run) - Output date - Output time - Module version - Pipeline version (Docker image version #)
- Run name - Run date - DNA sample index ID (if DNA is run) - RNA sample index ID (if RNA is run) - [HRD] Sample feature - Instrument ID - Instrument control software version - Instrument type - RTA version - Reagent cartridge lot number
gnomeAD
2.1
COSMIC
v84
ClinVar
2019-02-04
dbSNP
v151
1000 Genomes Project
Phase 3 v5a
RefSeq
NCBI Homo sapiens Annotation Release 105.20201022
The Illumina DRAGEN TruSight Oncology 500 Analysis Software allows for analysis of sequencing data generated from the TruSight Oncology 500 HRD assay. When HRD samples are analyzed new results and metrics are included in the CombinedVariantOutput and MetricsOutput files respectively. The following tables detail how these scores and QC metrics are derived.
GIS Score*
Proprietary Genomic Instability Score (GIS) indicating level of genomic instability in sample genome. Combination of Loss of Heterozygosity (LOH), Telomeric allelic imbalance and Large-scale State Transitions (LST) scores. The GIS scores provided by TruSight Oncology 500 HRD show good correlation (R2= 0.98) with Myriad Genetics GIS however they are not identical (Refer to TruSight Oncology 500 HRD Product Data Sheet Doc# M-GL-00748 for more details). GIS from alternative HRD assays should be not be considered equivalent to Illumina/Myriad GIS.
*The GIS algorithm within the TSO500 pipeline (which does not have a cell line mode due to the TSO500 pipeline being non-configurable) is only intended for FFPE samples. Cell line samples will not accurately report GIS results as the tumor fraction (>90%) is too high to reliably distinguish tumor vs germline variants.
HRD Metrics Added to Metrics Output File
PCT_TARGET_HRD_50X
Percent of HRD probe SNP panel covered by at least 50X coverage
DNA Library QC Metrics for GIS
EXCESSIVE_TF
EXCESSIVE TF indicates if there is excessive tumor content in sample. Troubleshooting: Samples with pure tumor fraction >90% are outside the design for GIS estimation (this includes pure tumor cell lines)
DNA Library QC Metrics for GIS
ALLELE_DOSAGE_RATIO
Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.
DNA Expanded Metrics
MEDIAN_TARGET_HRD_COVERAGE
Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.
DNA Expanded Metrics