Quality Scores
A quality score (or Q-score) expresses an error probability. In particular, it serves as a convenient and compact way to communicate very small error probabilities.
Given an assertion, A, the quality score, Q(A), expresses the probability that A is not true, P(~A), according to the relationship:
Q(A) =-10 log10(P(~A))
where P(~A) is the estimated probability of an assertion A being wrong.
The relationship between the quality score and error probability is demonstrated with the following table:
Quality score, Q(A) | Error probability, P(~A) |
10 | 0.1 |
20 | 0.01 |
30 | 0.001 |
On supported Illumina systems, Q-scores are automatically binned. The specific binning applied depends on the current Q-table. See the white paper Reducing Whole Genome Data Storage Footprint for more information, available from the Illumina website.
Quality Score Encoding
In FASTQ files, quality scores are encoded into a compact form, which uses only 1 byte per quality value. In this encoding, the quality score is represented as the character with an ASCII code equal to its value + 33. The following table demonstrates the relationship between the encoding character, its ASCII code, and the quality score represented.
When Q-score binning is in use, the subset of Q-scores applied by the bins is displayed.
Symbol | ASCII Code | Q-Score |
! | 33 | 0 |
" | 34 | 1 |
# | 35 | 2 |
$ | 36 | 3 |
% | 37 | 4 |
& | 38 | 5 |
' | 39 | 6 |
( | 40 | 7 |
) | 41 | 8 |
* | 42 | 9 |
+ | 43 | 10 |
, | 44 | 11 |
- | 45 | 12 |
. | 46 | 13 |
/ | 47 | 14 |
0 | 48 | 15 |
1 | 49 | 16 |
2 | 50 | 17 |
3 | 51 | 18 |
4 | 52 | 19 |
5 | 53 | 20 |
6 | 54 | 21 |
7 | 55 | 22 |
8 | 56 | 23 |
9 | 57 | 24 |
: | 58 | 25 |
; | 59 | 26 |
< | 60 | 27 |
= | 61 | 28 |
> | 62 | 29 |
? | 63 | 30 |
@ | 64 | 31 |
A | 65 | 32 |
B | 66 | 33 |
C | 67 | 34 |
D | 68 | 35 |
E | 69 | 36 |
F | 70 | 37 |
G | 71 | 38 |
H | 72 | 39 |
I | 73 | 40 |
Last updated