Quality Scores
Last updated
Last updated
A quality score (or Q-score) expresses an error probability. In particular, it serves as a convenient and compact way to communicate very small error probabilities.
Given an assertion, A, the quality score, Q(A), expresses the probability that A is not true, P(~A), according to the relationship:
Q(A) =-10 log10(P(~A))
where P(~A) is the estimated probability of an assertion A being wrong.
The relationship between the quality score and error probability is demonstrated with the following table:
On supported Illumina systems, Q-scores are automatically binned. The specific binning applied depends on the current Q-table. See the white paper Reducing Whole Genome Data Storage Footprint for more information, available from the Illumina website.
In FASTQ files, quality scores are encoded into a compact form, which uses only 1 byte per quality value. In this encoding, the quality score is represented as the character with an ASCII code equal to its value + 33. The following table demonstrates the relationship between the encoding character, its ASCII code, and the quality score represented.
When Q-score binning is in use, the subset of Q-scores applied by the bins is displayed.
Quality score, Q(A)
Error probability, P(~A)
10
0.1
20
0.01
30
0.001
Symbol
ASCII Code
Q-Score
!
33
0
"
34
1
#
35
2
$
36
3
%
37
4
&
38
5
'
39
6
(
40
7
)
41
8
*
42
9
+
43
10
,
44
11
-
45
12
.
46
13
/
47
14
0
48
15
1
49
16
2
50
17
3
51
18
4
52
19
5
53
20
6
54
21
7
55
22
8
56
23
9
57
24
:
58
25
;
59
26
<
60
27
=
61
28
>
62
29
?
63
30
@
64
31
A
65
32
B
66
33
C
67
34
D
68
35
E
69
36
F
70
37
G
71
38
H
72
39
I
73
40