Analysis Output

When the analysis run completes, the DRAGEN TruSight Oncology 500 Analysis Software generates an analysis output folder in a specified location.

To view analysis output, navigate to the analysis output folder and select the files that you want to view.

Single Node Analysis Output Folder Structure

Single output folder structure is as follows.

  • Logs_Intermediates

    • AdditionalSarjMetrics— Contains per pair ID calculations to support the PCT_TARGET_250X metric.

    • Annotation—Contains outputs for small variant annotation.

      • Subfolders per sample ID—Contains the aligned small variants JSON.

    • CombinedVariantOutput

      • Subfolders per pair ID—Contains the combined variant output TSV files.

      • A combined output log file.

    • Contamination

      • Subfolders per DNA sample ID—Contains the contamination metrics JSON file and output logs.

    • DnaDragenCaller

      • Subfolders per sample ID—Contains the aligned BAM and index files, small variant VCF and gVCF, copy number variant VCF, MSI JSON, and QC outputs in CSV format.

    • DnaDragenExonCNVCaller

      • Subfolders per DNA sample ID—Contains the exon-level CNV JSON,the supporting calculation, and the QC files.

    • DnaFastqValidation—Contains the FASTQ validation output log for DNA samples.

    • FastqDownsample

      • Subfolders per RNA sample ID—Contains FASTQ files and output logs.

      • FastqDownsample output

    • FastqGeneration

    • Gis—Contains GIS-related files for HRD samples.

      • Subfolders per HRD sample ID—Contains the GIS JSON, the supporting calculation, and the QC files.

      • Also contains the annotated CNV VCF and gene level TSV file with absolute copy number and minor copy number information

    • LrAnnotation

      • Subfolders per DNA sample ID—Contains the annotated exon-level CNV JSON.

    • LrCalculator

      • Subfolders per DNA sample ID—Contains the exon-level CNV VCF.

    • MetricsOutput

      • Subfolders per pair ID—Contains the metrics output TSV files.

      • A combined output log file.

    • ResourceVerification—Contains the resource file checksum verification logs.

    • RnaAnnotation

      • Subfolders per RNA sample ID—Contains the annotated splice variant JSON.

    • RnaDragenCaller

      • Subfolders per sample ID—Contains the aligned BAM, fusion candidates CSV and QC outputs in CSV format.

    • RnaFastqValidation—Contains the FASTQ validation output log for RNA samples.

    • RnaFusion

      • Subfolders per RNA sample ID—Contains the All Fusions CSV and Fusion Processor logs.

    • RnaQcMetrics

      • Subfolders per RNA sample ID—Contains the RNA QC metrics JSON.

    • RnaSpliceVariantCalling

      • Subfolders per RNA sample ID—Contains the splice variants VCF.

    • Run QC—Contains the Run QC metrics JSON, Intermediate Run QC metrics JSON, and log file.

    • SampleAnalysisResults

      • Subfolders per pair ID—Contains the Sample Analysis Results JSON and detailed log file.

      • SampleSheetValidation—Contains the Intermediate sample sheet and validation log.

    • Tmb

      • Subfolders per DNA sample ID—Contains the TMB metrics CSV, TMB trace TSV, and related files and logs. passing_sample_steps.json —Contains the steps passed for each sample ID. pipeline_trace.txt—Contains a summary and troubleshooting file that lists each Nextflow task executed and the status (for example, COMPLETED or FAILED). run.log—Contains a complete trace-level log file describing the Nextflow pipeline execution. run_report.html—Contains high-level run statistics (performance, usage, etc.) run_timeline.html —Contains timeline-related information about the analysis run.

  • Results

    • Metrics Output TSV (all pair IDs)

    • Pair ID—The following outputs are produced for each sample:

      • Combined Variant Output TSV

        • Metrics Output TSV

        • TMB Trace TSV

        • Small Variant Genome VCF

        • Small Variant Genome Annotated JSON

        • Copy Number Variant VCF

        • GIS JSON

        • MSI JSON

        • Large Rearrangements CNV VCF

        • Large Rearrangements CNV Annotated JSON

        • All Fusion CSV

        • Splice Variant VCF

        • Splice Variant Annotated JSON

Multiple Node Analysis Output Folder Structure

Multiple output folder structure is as follows.

  • Demultiplex Output

    • A Logs_Intermediates folder containing FASTQ files per sample.

  • Node(X) Output—The following outputs are produced for each node used:

    • A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in the analysis pipeline for the sample run on the node.

    • A Results folder containing results only for the sample run on the node.

  • Gathered Output

    • A Logs_Intermediates folder containing step specific and component specific outputs and logs for every step/component run in each analysis pipeline on every node—this contains outputs for all samples and pairs ran across all nodes in the analysis.

    • A Results folder containing results for all samples and pairs ran across all nodes—results are organized by Pair_ID, then Sample_ID. This folder also contains summary files which contain information on all samples.

ICA Output Folder Structure

This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed. The same output folder structure and content exist in ICA and BaseSpace Sequence Hub.

High-Level Folder Structure

  • Run ID

    • TSO500_Nextflow_logs

      • _manifest.json

    • Results

      • _tags.json

    • Logs_intermediates

    • Errors—This folder is only present when analysis fails

TSO500_Nextflow_logs Folder Structure

The TSO_500_Nextflow_Logs provides information related to the execution of the pipeline on ICA as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the nextflow execution on those nodes.

  • TSO_500_Nextflow_Logs

    • _manifest.json

Results Folder Structure

Contains the aggregated MetricsOutput.tsv file at the root level. Additionally, the Results folder contains a subfolder for each pair ID.

  • Results

    • MetricsOutput.tsv

    • Sample_1

    • Sample_2

    • Sample_<#>

    • _tags.json

The Results subfolder contains the following files:

  • Results

    • MetricsOutput.tsv

    • <Pair_id>

      • CombinedVariantOutput.tsv

      • <SampleName>_MetricsOutput.tsv

    • <DNA_Sample_id>

      • CopyNumberVariants.vcf

      • DNAMergedSmallVariants_Annotated.json.gz

      • MergedSmallVariants.genome.vcf

      • MergedSmallVariants.vcf

      • microstat_output.json

      • TMB_Trace.tsv

    • <RNA_Sample_id>

      • AllFusions.csv

      • RNA_Annotated.json.gz

      • SpliceVariants.vcf

Logs_intermediates Folder Structure

Contains folders for each submodule in the DRAGEN TSO 500 on ICA pipeline. The folders contain a copy of all the relevant files required to create the metric output files and report files, as well as the combined log files at the root level and subfolders for each sample.

  • Logs_intermediates

    • DnaDragenCaller

    • AdditionalSarjMetrics

    • CombinedVariantOutput

    • FastqGeneration

    • MetricsOutput

    • DnaDragenExonCnvCaller

    • DnaFastqValidation

    • Gis

    • Tmb

    • SampleAnalysisResults

    • SampleSheetValidation

    • passing_sample_steps.json

    • RnaFusion

    • Contamination

    • Annotation

    • RnaAnnotation

    • RnaDragenCaller

    • RnaSpliceVariantCalling

    • RunQc

    • FastqDownsample

    • PassingSampleSteps

    • ResourceVerification

    • LrCalculator

    • LrAnnotation

    • RnaQcMetrics

    • RnaFastqValidation

Errors Folder Structure

Contains Errors.tsv. This file contains the summary of all the errors encountered during pipeline execution.

  • Errors

    • Errors.tsv

Last updated