vcftools Useful for population frequency calculations. Documentation
How to create a noise or historic database file
1
Calculate population statistics
General: Allele Number (AN): Calculate the total number of alleles in your population by multiplying the number of individuals (N) by 2:
AN=2×N.
For each variant within the dataset:
Allele Count (AC): Determine the number of times alternate allele of a variant appears across all individuals. This is inferred from genotype counts (n). In case of a biallelic variant where allele A is reference and allele B is alternate, the allele count for alternate allele is calculated as follows:
AC(B)=2×n(BB)+n(AB)
Allele Frequency (AF): Calculate as the ratio of Allele Count to Allele Number:
AF(B)=AC(B)/AN.
2
Create a VCF file with your variants
In the INFO field, include the AF sub-field.
Optionally, include AC, AN, and TEN (a list of up to 10 samples carrying the variant). You may add other fields, provided the field names do not contain underscores or hyphens.
Specify the exact format of each INFO sub-field in ##INFO meta-information lines.
See format example below.
3
Sort variants in the VCF based on chromosome and position with awk
In the INFO field, include the significance sub-field and assign its value to each variant based on Table 1.
Only one significance value is allowed per variant. If a variant has multiple interpretations, list the variant in separate rows, each with a different significance value.
Table 1. Mapping of significance values to pathogenicity classes.
Significance value in the VCF
Pathogenicity class in the UI
0
Unknown
1
Benign
2
Likely Benign
3
VUS
4
Likely Pathogenic
5
Pathogenic
Optionally, include comment , category, or other fields to capture text or numerical values that are relevant to classification. You may add other fields, provided the field names do not contain underscores or hyphens.
Specify the exact format of each INFO sub-field in ##INFO meta-information lines.
See format example below.
2
Sort variants in the VCF based on chromosome and position with awk
3
Compress the VCF with bgzip
4
Create a TBI index file with tabix
Example curated DB VCF variant lines
Small variant
Copy number variant
Next steps
1
Reach out to Illumina support
Provide VCF and TBI files to Illumina support to upload to your organization's dedicated storage bucket, along with information:
Database name. Underscores ("_") in a database name are not allowed.
Database type (Noise, Historic, Curated)
Variant type (SNV, CNV)
Genome reference (GRCh37, GRCh38)
2
Register the database
Once the database is uploaded, the user with appropriate permissions can register the database by selecting it from their bucket in Settings.
##fileformat=VCFv4.2
##fileDate=20250906
##reference=ftp://ftp.ensembl.org/pub/hg19
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
##contig=<ID=chr5,length=180915260,assembly=hg19>
##contig=<ID=chr6,length=171115067,assembly=hg19>
##contig=<ID=chr7,length=159138663,assembly=hg19>
##contig=<ID=chr8,length=146364022,assembly=hg19>
##contig=<ID=chr9,length=141213431,assembly=hg19>
##contig=<ID=chr10,length=135534747,assembly=hg19>
##contig=<ID=chr11,length=135006516,assembly=hg19>
##contig=<ID=chr12,length=133851895,assembly=hg19>
##contig=<ID=chr13,length=115169878,assembly=hg19>
##contig=<ID=chr14,length=107349540,assembly=hg19>
##contig=<ID=chr15,length=102531392,assembly=hg19>
##contig=<ID=chr16,length=90354753,assembly=hg19>
##contig=<ID=chr17,length=81195000,assembly=hg19>
##contig=<ID=chr18,length=78077000,assembly=hg19>
##contig=<ID=chr19,length=59128000,assembly=hg19>
##contig=<ID=chr20,length=63025000,assembly=hg19>
##contig=<ID=chr21,length=48129000,assembly=hg19>
##contig=<ID=chr22,length=51304000,assembly=hg19>
##contig=<ID=chrX,length=155270000,assembly=hg19>
##contig=<ID=chrY,length=59373000,assembly=hg19>
##contig=<ID=chrM,length=16500,assembly=hg19>
##INFO=<ID=COUNT,Number=1,Type=Integer,Description="Number of occurrences of the variant in the dataset">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency of the variant in the dataset">
##INFO=<ID=AN,Number=A,Type=Float,Description="Allele Number in the dataset">
##INFO=<ID=AC,Number=A,Type=Float,Description="Allele Count of the variant in the dataset">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the structural variant">
##INFO=<ID=TEN,Number=.,Type=String,Description="Ten samples containing the variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 849466 . N <DEL> . . COUNT=1;AF=0.00017;AC=2;AN=12118;END=1073402;SVTYPE=DEL;SVLEN=223936;TEN=SAMPLE1
#CHROM POS ID REF ALT QUAL FILTER INFO
16 89531962 . G T . PASS significance=3;category=interpretations=1.COMMENT='in silico: 3/3 damaging (PP3)'.OBSERVATIONS=1.UPDATE='2020-12-12'
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 145413388 . N <DUP> . . significance=1;SVLEN=333881;END=145747269;SVTYPE=DUP;category=type=duplicate.COMMENT=proximal duplicate BP2-BP3 1q21.1. 1q21 microdeletion syndrome.REMARK=Only AR OMIM genes.