B-Allele frequency (BAF) output is enabled by default in germline and somatic VCF and gVCF runs.
The BAF value is calculated as either AF
or (1 - AF)
, where
AF = (alt_count / (ref_count + alt_count))
BAF = 1 - AF
, only when ref base < alt base, order of priority for bases is A < T < G < C < N
.
The B-allele frequency values are often plotted to visually inspect the spread away from a perfectly diploid heterozygous call (BAF=50%). This plot is more easily interpreted if it is symmetric about the BAF=50% line. To ensure the symmetry, a heuristic must be used to determine when BAF = AF
or BAF = 1-AF
. This definition of B-Allele Frequency is based on the definition that is used for bead arrays, as most users are accustomed to that implementation. Here, the choice of the B allele is based on the color of dye attached to each nucleotide. A and T get one color, G and C get the other color. The bead array implementation has much more complex rule for tie-breaking between A and T or G and C that involves top and bottom strands. This is unnecessary and so the simpler hierarchical approach of using a priority for the nucleotides A<T<G<C<N
is used.
For each small variant VCF entry with exactly one SNP alternate allele, the output contains a corresponding entry in the BAF output file.
<NON_REF>
lines are excluded
ForceGT variants (as marked by the "FGT" tag in the INFO field) are not included in the output, unless the variant also contains the "NML" tag in the INFO field.
Variants where the ref_count and alt_count are both zero are not included in the output.
--vc-enable-baf
Enable or disable B-allele frequency output. Enabled by default.
The BF generates are BigWig-compressed files, named <output-file-prefix>.baf.bw
and <output-file-prefix>.hard-filtered.baf.bw
. The hard-filtered file only contains entries for variants that pass the filters defined in the VCF (ie, PASS entries).
Each entry contains the following information: Chromosome Start End BAF
Where:
Chromosome is a string matching a reference contig.
Start and end values are zero-based, half open intervals.
BAF is a floating point value.