Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The following sub-pages contain recommended command line options for specific DRAGEN pipelines. For an overview of DRAGEN command line parsing, also see Multicaller Workflows
This recipe is for processing whole genome sequencing data for germline workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Optional settings per component are listed below. Full option list at this page.
This recipe is for processing whole exome sequencing data for somatic tumor normal workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Optional settings per component are listed below. Full option list at this page.
Please include the matched normal sample in the CNV panel of normals.
Generating Panel of Normals (PON)
Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
For more information, see Panel of Normals.
Generic SNV noise files can be downloaded here: DRAGEN Software Support Site page
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
The ALU bed file can be downloaded as part of the Bed File Collection: DRAGEN Software Support Site page
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
Microsatellite sites file can be downloaded here: DRAGEN Software Support Site page
This recipe is for processing sequencing data with unique molecular identifier (UMI) for somatic tumor normal workflows.
For Somatic UMI Tumor Normal inputs, tumor and normal sample need to be run separately for the Map/Align stage, and then Variant Calling is started from tumor and normal UMI collapsed BAM.
For Map/Align stage:
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN
Configure UMI options
For Variant Calling stage:
Configure the INPUT options
Configure the OUTPUT options
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Map/Align stage
Variant Calling (and optional biomarkers) stage:
However for UMI and panels it is strongly recommended to build a custom systematic noise file as follow:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
Please include the matched normal sample in the CNV panel of normals.
Generating Panel of Normals (PON)
Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files please refer to the MSI Biomarker section in the user guide.
This recipe is for processing whole genome sequencing data for somatic tumor only workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
In general (for most libraries and sample types) we recommend the default values, however for some specific libraries or sample types where it may be advisable to use different values those are explicitly listed below each variant caller section under "library specific settings".
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
SV library-specific settings
To build the SV systematic noise file
You can generate systematic noise BEDPE files from normal samples collected using library prep, sequencing system, and panels.
To generate a BEDPE file, do as follows.
Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise
set to true to generate VCF output per normal sample.
Build the BEDPE file using the VCFs and the --sv-build-systematic-noise-vcfs-list
: List of input VCFs from previous step. Enter one VCF per line. Example command line is provided below
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
Build Normal references of miscrosatellite repeat distribution
Normal reference files can be generated by running collect-evidence
mode on a panel of normal samples. This ONLY works with DRAGEN germline mode.
The --msi-microsatellites-file
should be the same file used for running tumor-only
mode. --msi-coverage-threshold
should also be the same value used for running tumor-only
mode.
A minimum of 20 normal samples is required for tumor-only mode.
This recipe is for processing whole genome sequencing data for somatic tumor normal workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
SNV library specific settings
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
Generating SV systematic noise BEDPE file You can generate systematic noise BEDPE files from normal samples collected using library prep, sequencing system, and panels.
To build the SV systematic noise file
Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise
set to true to generate VCF output per normal sample.
Build the BEDPE file using the VCFs and the --sv-build-systematic-noise-vcfs-list
: List of input VCFs from previous step. Enter one VCF per line. Example command line is provided below
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
This recipe is for processing whole exome sequencing data for germline workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Please include the matched normal sample in the CNV panel of normals.
Generating Panel of Normals (PON)
WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
This recipe is for processing sequencing data with unique molecular identifier (UMI) for somatic tumor only workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
However for UMI samples and panels it is strongly recommended to build a custom systematic noise file as follow:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
Generating Panel of Normals (PON)
Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files please refer to the MSI Biomarker section in the user guide.
Normal reference files can be generated by running collect-evidence
mode on a panel of normal samples. This ONLY works with DRAGEN germline mode.
The --msi-microsatellites-file
should be the same file used for running tumor-only
mode. --msi-coverage-threshold
should also be the same value used for running tumor-only
mode.
A minimum of 20 normal samples is required for tumor-only mode.
This recipe is for processing whole exome sequencing data for somatic tumor only workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Generating Panel of Normals (PON)
Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
Normal reference files can be generated by running collect-evidence
mode on a panel of normal samples. This ONLY works with DRAGEN germline mode.
The --msi-microsatellites-file
should be the same file used for running tumor-only
mode. --msi-coverage-threshold
should also be the same value used for running tumor-only
mode.
A minimum of 20 normal samples is required for tumor-only mode.
This recipe is for processing panel data for RNA workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure the RNA MAP/ALIGN options
Configure the QUANT options
Configure the SPLICE options
Configure the FUSION options
Configure the VARIANT options
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Amplicon data
If you are running amplicon, you need to set --enable-rna-amplicon true --amplicon-target-bed <AMPLICON_BED_PATH>
.
If RNA amplicon mode is enabled and the amplicon bed file already includes the gene name, then you do not need to set the ENRICH options option; DRAGEN will read the enriched genes names from the amplicon BED file (fifth column).
SPLICE options
You can provide a list of normal slice variants to reduce noisy calls. The file should be a tab separated file with the following first four columns:
contig name
first base of the splice junction (1-based)
last base of the splice junction (1-based)
strand (0: undefined, 1: +, 2: -) Use the optional option --rna-splice-variant-normals <SPLICE_NORMAL_FILE_PATH>
to provide the normal splice variants.
This recipe is for processing Whole Transcriptome Sequencing data for RNA workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure the RNA MAP/ALIGN options
Configure the QUANT options
Configure the SPLICE options
Configure the FUSION options
Configure the VARIANT options
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
For SPLICE options, you can provide a list of normal slice variants to reduce noisy calls. The file should be a tab separated file with the following first four columns:
contig name
first base of the splice junction (1-based)
last base of the splice junction (1-based)
strand (0: undefined, 1: +, 2: -) Use the optional option --rna-splice-variant-normals <SPLICE_NORMAL_FILE_PATH>
to provide the normal splice variants.
Option | Description |
---|---|
Option | Description |
---|---|
Option | Description |
---|---|
Option | FFPE |
---|---|
Option | Description |
---|---|
Option | Description |
---|---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
Option | Description |
---|
Option | FFPE |
---|
Generic SNV noise files can be downloaded here:
The ALU bed file can be downloaded as part of the Bed File Collection:
Option | Description |
---|
For more information, see .
Option | Description |
---|
Option | Description | Solid | Liquid |
---|
Microsatellite sites file can be downloaded here:
Option | Description | Solid | Liquid (cfDNA) |
---|
Option | Description |
---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
Option | Description |
---|
Option | FFPE |
---|
Generic SNV noise files can be downloaded here:
The ALU bed file can be downloaded as part of the Bed File Collection:
Option | Description |
---|
Option | Recommended Value For Liquid Tumors (e.g., AML/MLL) |
---|
You can also build systematic noise BEDPE files in the cloud using the .
Microsatellite sites file can be downloaded here:
Option | Description |
---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
Option | Description |
---|
Option | FFPE |
---|
Generic SNV noise files can be downloaded here:
The ALU bed file can be downloaded as part of the Bed File Collection:
Option | Description |
---|
You can also build systematic noise BEDPE files in the cloud using the .
Microsatellite sites file can be downloaded here:
Option | Description |
---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
For more information, see .
Option | Description |
---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
Option | Description |
---|
Option | FFPE |
---|
Generic SNV noise files can be downloaded here:
The ALU bed file can be downloaded as part of the Bed File Collection:
Option | Description |
---|
For more information, see .
Option | Description |
---|
Option | Description | Solid | Liquid |
---|
Microsatellite sites file can be downloaded here:
Option | Description | Solid | Liquid (cfDNA) |
---|
Option | Description |
---|
Optional settings per component are listed below. Full option list at .
Option | Description |
---|
For more information, see .
Option | Description |
---|
Option | FFPE |
---|
Generic SNV noise files can be downloaded here:
The ALU bed file can be downloaded as part of the Bed File Collection:
Microsatellite sites file can be downloaded here:
Option | Description |
---|
enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
hla-enable-class-2
Extend genotyping to HLA class 2 genes
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts. For more information, see GC Bias Correction.
--cnv-segmentation-mode $SEG_MODE
Specifies the segmentation algorithm to perform. For more information, see Segmentation.
--vc-sq-filter-threshold $THRESHOLD
Threshold for sensitivity-specificity tradeoff. The default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise $SYSTEMATIC_NOISE_FILE
Systematic noise filter. In tumor-normal calling, this filter is recommended for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the DRAGEN Software Support Site page. Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples.
--vc-somatic-hotspots somatic_hotspots_GRCh38.vcf.gz
Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available.
--vc-combine-phased-variants-distance $DIST
Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15]
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-target-vaf FLOAT
This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when --vc-enable-umi-liquid=true
).
--vc-systematic-noise-method
The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header.
--vc-excluded-regions-bed $BED
Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed.
--sv-enable-liquid-tumor-mode true
Enable liquid tumor mode. For more information, see Liquid Tumor Mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level. For more information, see Liquid Tumor Mode.
enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
hla-enable-class-2
Extend genotyping to HLA class 2 genes
| Configures DRAGEN to use CNV settings for Liquid Tumors (e.g., AML/MLL). |
| Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed. |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Extend genotyping to HLA class 2 genes |
| Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed. |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Extend genotyping to HLA class 2 genes |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Extend genotyping to HLA class 2 genes |
| Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed. |
| Variant mininum allele frequency for usable variants. | 0.05 ( default) | 0.002 |
| Minimum coverage for a microsatellite | 60 ( default) | 500 |
| Minimum Jensen-Shannon distance between tumor and normal for a microsatellite | 0.1 ( default) | 0.02 |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Internal option to set min alignment score threshold |
| Minimum Alignment score of a read mate to be considered |
| Extend genotyping to HLA class 2 genes |
| When running from UMI data, one of these options is required to let DRAGEN know that the reads have been UMI-collapsed and are therefore more reliable than non-UMI reads. Solid mode is optimized for solid tumors with post collapsed coverage rates of ~200—300X and target allele frequencies of 5% and higher. Liquid mode is optimized for a liquid biopsy pipeline with post collapsed coverage rates of ~2000–2500X and target allele frequencies of 0.4% and higher. As a rough rule of thumb, choose solid for coverage below 1000X and liquid for higher coverage. |
| Threshold for sensitivity-specificity tradeoff. The default threshold is 4(Solid)/2(Liquid). Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity. |
| Systematic noise filter. In tumor-normal calling, this filter is recommended for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the DRAGEN Software Support Site page. Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples. |
| Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available. |
| Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15] |
| Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control. |
| Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation. |
| This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when |
| The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header. |
| In FFPE samples with UMI Simplex collapsing it may be beneficial to increase the vc-target-vaf to 0.2 or 0.3. In FFPE samples with UMI Duplex collapsing some of the strand specific FFPE deamination noise may be removed by the duplex collapsing so that the default vc-target-vaf of 0.01 may remain appropriate. |
| Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed. |
| Variant mininum allele frequency for usable variants. | 0.05 ( default) | 0.002 |
| Minimum coverage for a microsatellite | 60 ( default) | 500 |
| Minimum Jensen-Shannon distance between tumor and normal for a microsatellite | 0.1 ( default) | 0.02 |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Internal option to set min alignment score threshold |
| Minimum Alignment score of a read mate to be considered |
| Extend genotyping to HLA class 2 genes |
| Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed. |
| Enable HLA typer (this setting by default will only genotype class 1 genes) |
| Extend genotyping to HLA class 2 genes |
| Threshold for sensitivity-specificity tradeoff. The default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity. |
|
| Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override a custom hotspot file if a list of positions of interest is available. |
| Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15] |
|
| This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when |
| The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header. |
|
| configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL). |
|
| 100000 |
| Configures DRAGEN to use CNV settings for HEME. |
|
| Threshold for sensitivity-specificity tradeoff. The default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity. |
|
| Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available. |
| Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15] |
| Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control. |
| Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation. |
| This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when |
| The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header. |
| Configure DRAGEN to use SV settings for HEME. |
|
|
|
|
|
|
| If UMI is nonrandom, enter the path for a customized, valid UMI sequence. |
|
| Enter the path for target region in BED format. |
|
| When running from UMI data, one of these options is required to let DRAGEN know that the reads have been UMI-collapsed and are therefore more reliable than non-UMI reads. Solid mode is optimized for solid tumors with post collapsed coverage rates of ~200—300X and target allele frequencies of 5% and higher. Liquid mode is optimized for a liquid biopsy pipeline with post collapsed coverage rates of ~2000–2500X and target allele frequencies of 0.4% and higher. As a rough rule of thumb, choose solid for coverage below 1000X and liquid for higher coverage. |
| Threshold for sensitivity-specificity tradeoff. The default threshold is 4(Solid)/2(Liquid). Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity. |
| Systematic noise filter. In tumor-only variant calling, this filter is essential for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the DRAGEN Software Support Site page. Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples. |
| Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probabilityto variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available. |
| Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15] |
|
| This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when |
| The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header. |
|
|
|
|
|
| If UMI is nonrandom, enter the path for a customized, valid UMI sequence. |
|
| Enter the path for target region in BED format. |
|
|
|
|
|
|
|
|
| Threshold for sensitivity-specificity tradeoff. The default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity. |
|
| Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probabilityto variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available. |
| Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15] |
|
| This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when |
| The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header. |
This recipe is for processing whole genome sequencing data for somatic heme tumor only workflows.
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Optional settings per component are listed below. Full option list at this page.
Generic SNV noise files (including a HEME specific WGS noise file) can be downloaded here: DRAGEN Software Support Site page
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
To build the SV systematic noise file
You can generate systematic noise BEDPE files from normal samples collected using library prep, sequencing system, and panels.
To generate a BEDPE file, do as follows.
Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise
set to true to generate VCF output per normal sample.
Build the BEDPE file using the VCFs and the --sv-build-systematic-noise-vcfs-list
: List of input VCFs from previous step. Enter one VCF per line. Example command line is provided below
You can also build systematic noise BEDPE files in the cloud using the DRAGEN Baseline Builder App on BaseSpace.
Pre-built SV systematic noise file
The following prebuilt systematic noise files for WGS are available for download on the DRAGEN Software Support Site page. To generate these noise files, we used 46 unrelated normal samples.
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup.vcf.gz
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
Systematic noise filter. In tumor-only variant calling, this filter is essential for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the . Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples.
Germline filtering. Enable to tag variants as germline or somatic based on population databases. $REFERENCE can be GRCh37 or GRCh38 (GRCh37 is compatible with hs37d5 and hg19). The Nirvana annotation database is downloadable at .
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). For more information, see .
A prebuilt systematic noise BEDPE file can be downloaded from the
Specify germline CNVs from the matched normal sample. For more information, see .
Systematic noise filter. In tumor-normal calling, this filter is recommended for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the . Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples.
Enable liquid tumor mode. For more information, see .
Set the Tumor-in-Normal (TiN) contamination tolerance level. For more information, see .
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). For more information, see .
Enable or disable GC bias correction when generating target counts. For more information, see .
Specify the input type for the UMI sequence. For more information, see .
Set the batch option for different UMIs correction. For more information, see .
Specify the number of matching UMI inputs reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. For more information, see .
Set the consensus sequence type to output. DRAGEN UMI allows you to collapse duplex sequences from the two strands of the original molecules. For more information, see .
Germline filtering. Enable to tag variants as germline or somatic based on population databases. $REFERENCE can be GRCh37 or GRCh38 (GRCh37 is compatible with hs37d5 and hg19). The Nirvana annotation database is downloadable at .
Enable or disable GC bias correction when generating target counts. For more information, see .
Specifies the segmentation algorithm to perform. For more information, see .
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). For more information, see .
Specify the input type for the UMI sequence. For more information, see .
Set the batch option for different UMIs correction. For more information, see .
Specify the number of matching UMI inputs reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. For more information, see .
Set the consensus sequence type to output. DRAGEN UMI allows you to collapse duplex sequences from the two strands of the original molecules. For more information, see .
Enable or disable GC bias correction when generating target counts. For more information, see .
Specifies the segmentation algorithm to perform. For more information, see .
Enable liquid tumor mode. For more information, see .
Set the Tumor-in-Normal (TiN) contamination tolerance level. For more information, see .
Enable or disable GC bias correction when generating target counts. For more information, see .
Specifies the segmentation algorithm to perform. For more information, see .
Specifies a population SNV catalog for ASCN CNV. For more information on specifying b-allele loci, see .
Systematic noise filter. In tumor-only variant calling, this filter is essential for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the . Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples.
Germline filtering. Enable to tag variants as germline or somatic based on population databases. $REFERENCE can be GRCh37 or GRCh38 (GRCh37 is compatible with hs37d5 and hg19). The Nirvana annotation database is downloadable at .
Option | Description |
---|---|
Option | Description |
---|---|
Option | Description |
---|---|
Pre-built Systematic Noise File | Comment | Systematic Noise Version | DRAGEN Compatibilit |
---|---|---|---|
--heme-cnv true
Configures DRAGEN to use CNV settings for Liquid Tumors (e.g., AML/MLL).
--vc-sq-filter-threshold $THRESHOLD
Threshold for sensitivity-specificity tradeoff. The default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise $SYSTEMATIC_NOISE_FILE
Systematic noise filter. In tumor-only variant calling, this filter is essential for removing systematic noise observed in normal samples. Prebuilt systematic noise files are available for download on the DRAGEN Software Support Site page. Alternatively, a systematic noise file can be generated by running the somatic TO pipeline on normal samples. We recommend using a systematic noise file based on normal samples that match the library prep of the tumor samples.
--vc-somatic-hotspots somatic_hotspots_GRCh38.vcf.gz
Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override a custom hotspot file if a list of positions of interest is available.
--vc-combine-phased-variants-distance $DIST
Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15]
--vc-enable-germline-tagging true --enable-variant-annotation true --variant-annotation-data $NIRVANA_ANNOTATION_FOLDER --variant-annotation-assembly $REFERENCE
Germline filtering. Enable to tag variants as germline or somatic based on population databases. $REFERENCE can be GRCh37 or GRCh38 (GRCh37 is compatible with hs37d5 and hg19). The Nirvana annotation database is downloadable at this page.
--vc-target-vaf FLOAT
This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when --vc-enable-umi-liquid=true
).
--sv-systematic-noise $SV_SYSTEMATIC_NOISE_BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). For more information, see Systematic Noise Filtering.
--heme-sv true
configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL).
--sv-min-scored-variant-size $MIN_SCORED_VAR_SIZE
100000
--sv-somatic-ins-tandup-hotspot-regions-bed $BED_FILE
Specify a BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. By default, DRAGEN SV automatically selects a reference-specific hotspots BED file. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
>200x coverage with 2x150bp reads for the HG38 reference
3.0.0
4.3.*