Ploidy Caller
Last updated
Last updated
The Ploidy Caller uses the per contig median coverage values from the Ploidy Estimator to detect aneuploidy and chromosomal mosaicism in mammalian germline samples from whole genome sequencing data.
The Ploidy Caller runs by default except in the following circumstances:
The Ploidy Estimator cannot determine if the input data is from whole genome sequencing. For example, data from exome or targeted sequencing.
The reference genome does not contain any autosomes following the expected naming convention (e.g. chr1
or 1
).
There is no germline sample. For example, tumor-only analysis.
Chromosomal mosaicism is detected when there is a significant shift in median coverage of a chromosome compared to the overall autosomal median coverage.
The following table displays some examples of expected shifts in coverage for a give aneuploidy and mosaic fraction.
Neutral Copy Number | Variant Copy Number | Mosaic Fraction | Expected Coverage Shift |
---|---|---|---|
2 | 1 | 10% | -5% |
2 | 1 | 5% | -2.5% |
2 | 3 | 5% | +2.5% |
2 | 3 | 10% | +5% |
The Ploidy Caller models coverage as a normal distribution for both the null (neutral) and the alternative (mosaic) hypotheses. The two normal distributions have equal mean at the median autosomal coverage for the sample, but the variance of the alternative normal distribution is greater than that of the null normal distribution. The baseline variance of the two models at 30x coverage was determined empirically from a cohort of ~2500 WGS samples. The actual variance used for the two models is calculated from the baseline variance at 30x coverage, adjusting for the median autosomal coverage of the sample. Below are the likelihood distributions for the null and alternative hypotheses for a sample with 35x median autosomal coverage.
After applying an empirically estimated prior for chromosomal mosaicism the Ploidy Caller generates ploidy calls according to the posterior probability of the null and alternative hypotheses as shown below for a sample with 35x median autosomal sequencing coverage.
At 35x median autosomal coverage, the threshold for deciding between a neutral (REF) and an alternative (DEL or DUP) call is roughly at +/- 5% shift in coverage for an autosome. At 100x median autosomal coverage, the threshold is at roughly +/- 3% shift in coverage for an autosome. A Q20 threshold is used to filter low quality calls.
In addition to detecting aneuploidy and chromosomal mosaicism in autosomes where the expected reference ploidy is 2, the Ploidy Caller can also detect these variants in allosomes. The reference sex karyotype used for making calls on the allosomes is determined from the sex karyotype of the sample either provided on the command line using the --sample-sex
option or from the Ploidy Estimator. If the sex karyotype of the sample is not provided on the command line and not determined by the Ploidy Estimator, then the sex karyotype is assumed to be XX. Whenever the sex karyotype contains at least one Y chromosome, the reference sex karyotpye is XY. If the sex karyotype does not contain at least one Y chromosome, then the reference sex karyotype is XX. The following table displays each of the possible sex karyotypes for a sample. If the Y chromosome reference ploidy is zero, then ploidy calling is not performed on the Y chromosome.
Sex Karyotype | X Reference Ploidy | Y Reference Ploidy |
---|---|---|
XX | 2 | 0 |
XY | 1 | 1 |
XXY | 1 | 1 |
XYY | 1 | 1 |
X0 | 2 | 0 |
XXXY | 1 | 1 |
XXX | 2 | 0 |
The Ploidy Caller generates a <output-file-prefix>.ploidy.vcf.gz
file in the output directory. The output file follows the VCF 4.2 Specification. A single record is reported for each reference autosome and allosome, except for the Y chromosome if the reference sex karotype is XX. Calls are not made for other sequences in the reference genome, such as mitochondrial DNA, unlocalized or unplaced sequences, alternate contigs, decoy contigs, or the Epstein-Barr virus sequence. The VCF header is annotated with ##source=DRAGEN_PLOIDY
to indicate the file is generated by the DRAGEN PLOIDY pipeline.
The following information is provided in the VCF file.
Meta-information--The VCF output file contains common meta-information such as DRAGENVersion
and DRAGEN CommandLine
, as well as Ploidy Caller specific information. The VCF header contains the meta-information for median autosome depth of coverage, the provided sex karyotype if available, the estimated sex karyotype from the Ploidy Estimator if available, and the reference sex karyotype. The following is an example of the header lines:
FILTER Fields--The VCF output file includes the LowQual filter, which filters results with quality score below 20.
INFO Fields--The VCF output INFO fields include the following:
END—End position of the variant described in this record.
SVTYPE—Type of structural variant.
FORMAT Fields--The VCF output file includes the following format fields. There is no GT
FORMAT field. A variant call in the VCF displays either <DUP>
or <DEL>
in the ALT column. A non-variant call displays .
in the ALT column. If using the output file for downstream use, a GT field can be added for variant calls using ./1
for a diploid contig and 1 for a haploid contig. For non-variant calls, use 0/0
for diploid and 0
for haploid.
DC—Depth of coverage.
NDC—Normalized depth of coverage.
The following is an example output file for a sample with mosaic loss of the Y chromosome.
The following is an example output file for a sample with trisomy 21.
Samples derived from cell lines frequently have coverage artifacts that might result in variant ploidy calls on some chromosomes. Chromosomes 17, 19, and 22 are the most common for the cell line coverage artifacts. When performing accuracy assessments of ploidy calls on cell line samples, filter out chromosomes with known cell line artifacts.