The FASTQ file is a text format file used to represent sequences. Each record has four lines of data: an identifier (read descriptor), the sequence, +, and the quality scores. For a detailed description of the FASTQ format, see FASTQ Files.
Make sure the FASTQ file adheres to the following upload requirements:
FASTQ files are generated on Illumina instruments and saved in gzip format
The name of the FASTQ files conforms to the following convention: SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz
Examples: SampleName_S1_L001_R1_001.fastq.gz
SampleName_S1_L001_R2_001.fastq.gz
The read descriptor in the FASTQ files conforms to the following convention: @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
Examples: Read 1 descriptor: @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
Corresponding Read 2 descriptor has ReadNum field: @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13
Quality considerations:
The number of base calls for each read equals the number of quality scores.
The number of entries for Read 1 equals the number of entries for Read 2.
The uploader determines whether files are paired-end based on matching file names in which the only difference is the ReadNum.
For paired-end reads, read 1 and read 2 files need to be uploaded together or can be combined later.
Each read has passed filter.
The file uploader imports the following file types to any project you have write access to: FASTQ (.fastq.gz), analysis (VCF and gVCF), manifest (.txt), or other file types. Use the file uploader when you want to analyze files generated outside of BaseSpace Sequence Hub, or to attach other information related to the project.
Open the project.
From the project, select File, Upload, and then select Files.
Select type of files to upload.
If you are uploading a FASTQ file, do as follows.
To upload FASTQs to a sample,
Set the "Save Upload To" toggle to "Sample".
Select Finish Upload.
To upload FASTQs to a biosample,
Select Select Biosample, then select an existing or create a new biosample to associate the FASTQ dataset with.
Enter a library name.
Select a prep kit.
Select Finish Upload.
If you are uploading a VCF file, do as follows.
[Optional] Select Select Biosample, then select or create a biosample that the VCF will be associated with.
Select Finish Upload.
If you are uploading a manifest file, do as follows.
[Optional] Select Select Biosample, then select or create a biosample that the manifest will be associated with.
Select Finish Upload.
If you are uploading other file types, do as follows.
[Optional] Select Select Biosample, then select or create a biosample that the files will be associated with.
Select Finish Upload.
Uploading multiple FASTQ, VCF, or manifest files in a single session requires files of the same type.
The FastQ importer only works for complete samples, you can not upload read2 of a FASTQ alone.
FASTQ files need to adhere to Illumina standards, as specified below:
Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively.
The uploader will only support gzipped FASTQ files generated on Illumina instruments.
The name of the FASTQ files must conform the following convention: SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
The read descriptor in the FASTQ files must conform to the following convention: @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
Read 1 descriptor would look like this: @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
Read 2 would have a 2 in the ReadNum field, like this: @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13
Quality considerations
The number of base calls for each read must equal the number of quality scores
The number of entries for Read 1 must equal the number of entries for Read 2
The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
Each read has passed filter