Input Files
The following section describes the input files required by DRAGEN Array.
IDAT Files
For each sample a pair of raw intensity files (.idat) are generated from the iScan System or NextSeq550 (for non-methylation arrays). They provide intensities in the red and green channels for each probe on the Infinium array.
An IDAT file is identified by the BeadChip Barcode (12-digit unique Sentrix ID, i.e. 123456789101), BeadChip Position (row and column of the sample, i.e. R01C01), and Grn (Green) or Red for the specific channel.
Manifest Files
The CSV and BPM manifest files can be found on the Illumina Support Site for all commercial Infinium BeadChips or on MyIllumina for custom and semi-custom designs. For instructions on obtaining manifest files from MyIllumina, see Illumina Knowledge article, How to access custom array product files (manifest and product definition files) in MyIllumina.
The CSV manifest file (.csv) provides complementary data to the BPM manifest file in a human readable format. It is a required input to the genotype gtc-to-vcf command to enable VCF generation for insertion/deletion variants.
Cluster File
The cluster file (.egt) is a standard product file provided by Illumina for commercial genotyping products and it is a required input for the genotype call command in DRAGEN Array. Custom cluster files may be required for optimal genotyping performance. See section Optimizing cluster files and copy number models for additional details.
CN Model File
The CN (Copy Number) model file (.dat) is a required input to the copy-number call command to enable accurate copy number calling for pharmacogenomics. Illumina provides a standard CN model file for each PGx array product. See section Optimizing cluster files and copy number models for additional details.
PGx Database File
The PGx database file (.zip) contains the variant mapping information from Infinium PGx arrays to PGx variants. For each gene and each variant used in the star allele definitions of the gene, there is a mapping to the ID field in the SNV VCF file. Each line in the gene mapping file represents a single variant and contains the SNV VCF ID for that variant followed by the HGVS (Human Genome Variation Society) tag for the variant. The PGx database file is array specific and is one of the product files provided by Illumina for each PGx array product.
Genome FASTA Files
The genome FASTA file (.fa) is a text file with the reference genome sequences.The FASTA index file (.fai) contains meta-data about chromosomal orchestration within the FASTA file for a particular species. DRAGEN Array PGx calling supports human genome build 37 and 38. The genome FASTA file and FASTA index file are both provided by Illumina for human species and should be stored together in the same input folder.
IDAT Sample Sheet
For local analysis, the IDAT sample sheet can be a CSV or JSON formatted file with direct paths to sample IDAT files. It enables easy analysis of samples from different directories.
Example CSV format:
Green IDAT Path,Red IDAT Path
/path/to/sample1_Grn.idat,/path/to/sample1_Red.idat
/path/to/sample2_Grn.idat,/path/to/sample2_Red.idat
/path/to/sample3_Grn.idat,/path/to/sample3_Red.idat
Example JSON format:
[
{
"Green IDAT Path": "/path/to/sample1_Grn.idat",
"Red IDAT Path": "/path/to/sample1_Red.idat"
},
{
"Green IDAT Path: "/path/to/sample2_Grn.idat",
"Red IDAT Path": "/path/to/sample2_Red.idat"
},
{
"Green IDAT Path": "/path/to/sample3_Grn.idat",
"Red IDAT Path": "/path/to/sample3_Red.idat"
},
]
For cloud analysis, the IDAT sample sheet can be a CSV formatted file.
beadChipName,sampleSectionName
Beadchip 1 barcode (204753010023), sample section (R01C01)
Beadchip 1 barcode (204753010023), sample section (R02C01)
Beadchip 2 barcode (204753010024), sample section (R01C01)
Beadchip 2 barcode (204753010024), sample section (R02C01)
For DRAGEN Array Methylation QC on cloud, additional optional sample sheet fields are available.
Following Sample_Group, any number of additional columns can be added to include meta data fields such as sex, sample type, plate and well information, etc. Additional columns added after the Sample_Group column may have user-defined column header values. The Sample_ID field and any additional meta data added will be replicated in the Sample QC Summary output files.
The Sample_Group field will be used to populate the PCA Control Plot within the Sample QC Summary Plots file and the Principal Component Summary file. For the PCA Control Plot, each sample group will be assigned a unique color. Samples assigned to the same Sample_Group value will be the same color in the PCA Control Plot.
beadChipName,sampleSectionName,Sample_ID,Sample_Group,MetaData1
Beadchip 1 barcode (204753010023), sample section (R01C01),NA1231,Group1,F
Beadchip 1 barcode (204753010023), sample section (R02C01),NA1232,Group2,F
Beadchip 2 barcode (204753010024), sample section (R01C01),NA1233,Group2,M
Beadchip 2 barcode (204753010024), sample section (R02C01),NA1234,Group1,M
GTC Sample Sheet
The GTC sample sheet is a CSV or JSON formatted file with direct paths to sample GTC files. It enables easy analysis of samples from different directories.
Example CSV format:
GTC Path
/path/to/sample1.gtc
/path/to/sample2.gtc
/path/to/sample3.gtc
Example JSON format:
[
{
"GTC Path": "/path/to/sample1.gtc"
},
{
"GTC Path": "/path/to/sample2.gtc"
},
{
"GTC Path": "/path/to/sample3.gtc"
}
]
Input File Summary Table
In addition to the input files, there are set of intermediate files, including GTC, SNV VCF, CNV VCF and PGx CSV, which are outputs of some DRAGEN Array Local commands and inputs to other commands.
The table below summarizes the input files or intermediate file, their sources, and the associated DRAGEN Array Local commands and options.
Input File | Source | Command | Option |
---|---|---|---|
IDAT | User provided from scanning instrument | genotype call | --idat-folder |
CSV Manifest | Product file from Illumina | genotype gtc-to-vcf | --csv-manifest |
BPM Manifest | Product file from Illumina | copy-number train genotype call genotype gtc-to-bedgraph genotype gtc-to-vcf | --bpm-manifest |
Cluster File | Product file from Illumina or user created using GenomeStudio | genotype call | --cluster-file |
CN Model | Product file from Illumina or user created using DRAGEN Array Local | copy-number call | --cn-model |
PGx Database | Product file from Illumina | star-allele call | --database |
Genome FASTA | Product file from Illumina | genotype gtc-to-vcf copy-number train | --genome-fasta-file |
IDAT Sample Sheet | User provided | genotype call | --idat-sample-sheet |
GTC Sample Sheet | User provided | genotype gtc-to-bedgraph genotype gtc-to-vcf copy-number call copy-number train | --gtc-sample-sheet |
GTC | DRAGEN Array output from genotype call | genotype gtc-to-bedgraph genotype gtc-to-vcf copy-number call copy-number train | --gtc-folder |
SNV and CNV VCF | DRAGEN Array output from genotype gtc-to-vcf and copy-number call | star-allele call | --vcf-folder |
PGx CSV | DRAGEN Array output from star-allele call | star-allele annotate | --star-alleles |
Last updated