# Structural Variant De Novo Quality Scoring

You can enable *de novo* structural variant quality scoring in DRAGEN.

To enable *de novo* scoring for structural variant joint diploid calling, set `--sv-denovo-scoring` to true. To adjust the threshold value for which variants are classified as *de novo*, use the `--sv-denovo-threshold` command line option. See DN Field for more information.

## Inputs

*De novo* scoring requires the following two files:

* A pedigree file that specifies the relationship of all samples in the pedigree.
* The VCF output from germline structural variant calling analysis run jointly over all samples in the pedigree.

### Pedigree File

The pedigree file is required for *de novo* scoring. Use the same file format as required for joint small variant calling analysis and *de novo* scoring. For information on the file format, see [Small Variant De Novo Calling](https://support-docs.illumina.com/SW/DRAGEN_v39/Content/SW/DRAGEN/PedigreeMode.htm#Small.htm). The file specifies which sample in the trio is the proband, mother, or father. If there are multiple trios specified in the pedigree file (eg, multigeneration pedigree or siblings), DRAGEN automatically detects the trios and provides the *de novo* scores on the proband sample of each detected trio.

### Joint Germline Structural Variant VCF

DRAGEN applies *de novo* scoring to the VCF output from germline structural variant analysis for all samples specified in the pedigree file. You can supply the VCF file directly using the command line or produce the file as part of the DRAGEN run where *de novo* scoring is enabled.

## Output

*De novo* scoring adds the *de novo* quality score (`DQ`) and *de novo* call (`DN`) fields for each sample in the output VCF file.

### DQ Field

The `DQ` field is defined as follows.

```
##FORMAT=<ID=DQ,Number=1,Type=Float,Description="Denovo quality">
```

The `DQ` field represents the Phred-scaled posterior probability of the variant being *de novo* in the proband. For example, DQ scores of 13 and 20 correspond to a posterior probability of a *de novo* variant of 0.95 and 0.99. If DRAGEN can calculate the DQ score, the score is added to the proband samples. If the DQ score cannot be calculated, the field is set to ".".

### DN Field

The `DN` Field is defined as follows.

```
##FORMAT=<ID=DN,Number=1,Type=String,Description="Possible values are 'DeNovo' or 'LowDQ'. Threshold for a passing de novo call is DQ >= 20">
```

DRAGEN compares valid (> 0) DQ scores against a threshold value. You can set the threshold value using the `--sv-denovo-threshold` command line option. For example, to set the threshold value to 10, add `--sv-denovo-threshold 10` to the command line. The default threshold value is 20.

If a DQ score is greater than or equal to the threshold value, the `DN` field is set to `DeNovo`. If the DQ score is below the threshold value, the `DN` field is set to `LowDQ`. If the DQ is 0 or ".", the DQ score is invalid and the `DN` field is set to ".".

## De Novo Scoring Workflows

You can use *de novo* structural variant scoring in the following workflows.

* Perform *de novo* scoring in two DRAGEN runs. In the first, run germline structural variant analysis jointly over all samples in the pedigree file. In the second, apply *de novo* structural variant scoring to the joint germline VCF output. See Two-Run Workflow.
* Perform *de novo* scoring in one DRAGEN run. Run germline structural variant analysis jointly over all samples in the pedigree file, and then apply *de novo* scoring to the joint germline structural variant calls. See One-Run Workflow.

### Two-Run Workflow

In the two-run workflow, first run a standard DRAGEN joint germline analysis over multiple samples as shown in the following example.

```
dragen -f \
--ref-dir <HASH_TABLE> \
--bam-input <BAM1> \
--bam-input <BAM2> \
--bam-input <BAM3> \
--enable-map align false \
--enable-sv true \
--output-directory <OUT_DIR1> \
--output-file-prefix <PREFIX1>
```

In the second run, use the VCF output (`<OUT_DIR1>/<PREFIX1>.sv.vcf.gz`) as input for *de novo* scoring. You can provide the VCF input using the `--variant` option. The following command line provides an example of the second run.

```
dragen -f \
--variant <MULTI_SAMPLE_VCF_FILE> \
--pedigree-file <PED_FILE> \
--enable-map-align false \
--sv-denovo-scoring true \
--output-directory <OUT_DIR2> \
--output-file-prefix <PREFIX2>
```

The resulting output VCF file (`<OUT_DIR2>/<PREFIX2>.sv.vcf.gz`) includes all *de novo* scoring annotations.

### One-Run Workflow

Run a standard DRAGEN joint germline analysis over multiple samples with all required *de novo* scoring options. The following example shows the one-run workflow.

```
dragen -f \
--ref-dir <HASH_TABLE> \
--bam-input <BAM1> \
--bam-input <BAM2> \
--bam-input <BAM3> \
--enable-map align=false \
--enable-sv=true \
--output-directory <OUT_DIR> \
--output-file-prefix <PREFIX> \
--sv-denovo-scoring true \
--pedigree-file <PED_FILE>
```

The resulting output VCF file (`<OUT_DIR>/<PREFIX>.sv.vcf.gz`) includes all *de novo* scoring annotations
