B-Allele Frequency Output
B-Allele frequency (BAF) output is enabled by default in germline and somatic VCF and gVCF runs.
The BAF value is calculated as either AF or (1 - AF), where
- AF = (alt_count / (ref_count + alt_count))
- BAF = 1 - AF, only when ref base < alt base, order of priority for bases is- A < T < G < C < N.
The B-allele frequency values are often plotted to visually inspect the spread away from a perfectly diploid heterozygous call (BAF=50%). This plot is more easily interpreted if it is symmetric about the BAF=50% line. To ensure the symmetry, a heuristic must be used to determine when BAF = AF or BAF = 1-AF. This definition of B-Allele Frequency is based on the definition that is used for bead arrays, as most users are accustomed to that implementation. Here, the choice of the B allele is based on the color of dye attached to each nucleotide. A and T get one color, G and C get the other color. The bead array implementation has much more complex rule for tie-breaking between A and T or G and C that involves top and bottom strands. This is unnecessary and so the simpler hierarchical approach of using a priority for the nucleotides A<T<G<C<N is used.
For each small variant VCF entry with exactly one SNP alternate allele, the output contains a corresponding entry in the BAF output file.
- <NON_REF>lines are excluded- ForceGT variants (as marked by the "FGT" tag in the INFO field) are not included in the output, unless the variant also contains the "NML" tag in the INFO field. 
- Variants where the ref_count and alt_count are both zero are not included in the output. 
 
BAF Options
- --vc-enable-bafEnable or disable B-allele frequency output. Enabled by default.
BAF Output
The BF generates are BigWig-compressed files, named <output-file-prefix>.baf.bw and <output-file-prefix>.hard-filtered.baf.bw. The hard-filtered file only contains entries for variants that pass the filters defined in the VCF (ie, PASS entries).
Each entry contains the following information: Chromosome Start End BAF
Where:
- Chromosome is a string matching a reference contig. 
- Start and end values are zero-based, half open intervals. 
- BAF is a floating point value. 
Last updated
Was this helpful?
