BaseSpace Sequence Hub converts *.bcl files into FASTQ files, which contain base call and quality information for all reads that pass filtering.
BaseSpace Sequence Hub automatically generates FASTQ files in sample sheet-driven workflow apps. Other apps that perform alignment and variant calling also automatically use FASTQ files.
FASTQ files can be used as sequence input for alignment and other secondary analysis software. Do not use them with tools that are not compatible with the FASTQ format.
FASTQ files are named with the sample name and the sample number, which is a numeric assignment based on the order that the sample is listed in the sample sheet. Example: Data\Intensities\BaseCalls\samplename_S1_L001_R1_001.fastq.gz
samplename - The sample name provided in the sample sheet. If a sample name is not provided, the file name includes the sample ID, which is a required field in the sample sheet and must be unique.
S1 — The sample number based on the order that samples are listed in the sample sheet starting with 1. In this example, S1 indicates that this sample is the first sample listed in the sample sheet.
Reads that cannot be assigned to any sample are written to a FASTQ file for sample number 0, and excluded from downstream analysis.
L001—The lane number.
R1—The read. In this example, R1 means Read 1. For a paired-end run, there is at least one file with R2 in the file name for Read 2.
001—The last segment is always 001.
FASTQ files are saved compressed in the GNU zip format (an open source file compression program), indicated by the .gz file extension.
Each entry in a FASTQ file consists of four lines:
Sequence identifier
Sequence
Quality score identifier line (consisting only of a +)
Quality score
For the Undetermined FASTQ files only, the sequence observed in the index read is written to the FASTQ header in place of the sample number. This information can be useful for troubleshooting demultiplexing.
The following table describes the elements:
An example of a valid entry is as follows; note the space preceding the read number element:
If the read is not identified as a control, then the 10th column (<control number>) is zero. If the read is identified as a control, the number is greater than zero, and the value specifies what type of control it is. The value is the decimal representation of a bit-wise encoding scheme. In that scheme bit 0 has a decimal value of 1, bit 1 a value of 2, bit 2 a value of 4, and so on.
Element
Requirements
Description
@
@
Each sequence identifier line starts with @
<instrument>
Characters allowed: a–z, A–Z, 0–9 and underscore
Instrument ID
<run number>
Numerical
Run number on instrument
<flowcell ID>
Characters allowed: a–z, A–Z, 0–9
Flowcell ID
<lane>
Numerical
Lane number
<tile>
Numerical
Tile number
<x_pos>
Numerical
X coordinate of cluster
<y_pos>
Numerical
Y coordinate of cluster
<read>
Numerical
Read number. 1 can be single read or Read 2 of paired-end
<is filtered>
Y or N
Y if the read is filtered (did not pass), N otherwise
<control number>
Numerical
0 when none of the control bits are on, otherwise it is an even number. On HiSeq X systems, control specification is not performed and this number is always 0.
<sample number>
Numerical
Sample number from sample sheet