A Phred score is a numerical measure of how likely a sequencing base call is to be wrong, expressed on a logarithmic scale and assigned to each nucleotide in sequencing reads.
These scores are stored alongside sequences in FASTQ files and are used throughout sequencing pipelines to filter, trim, assemble, and call variants.
Why Phred scores exist
Sequencing instruments do not directly observe DNA bases. They infer bases from signals that include noise and uncertainty.
Downstream tools therefore need a way to measure confidence at each position. Phred scores allow pipelines to automatically downweight or remove unreliable bases, improving alignment, assembly, and variant detection.
Without base-level quality scores, most modern automated analysis pipelines would not function reliably.
Core mechanism
A Phred score links base-call quality to the probability of error using:
Q = −10 log10(P)
where:
- Q is the Phred score.
- P is the probability the base is incorrect.
The inverse relation is:
P = 10^(−Q/10)
Each increase of 10 in score corresponds to a tenfold reduction in error probability.
Typical interpretations:
| Phred score | Error probability | Base accuracy |
|---|---|---|
| 10 | 1 in 10 | 90% |
| 20 | 1 in 100 | 99% |
| 30 | 1 in 1,000 | 99.9% |
| 40 | 1 in 10,000 | 99.99% |
Modern short-read data commonly show scores between 20 and 40, often decreasing toward read ends.
Example interpretation in FASTQ
FASTQ files store sequences and encoded quality values:
@READ_1
ACGT
+
IIII
Here:
- Sequence:
ACGT - Quality string:
IIII - Each character corresponds to one base.
In standard PHRED+33 encoding, character I corresponds to quality score 40, meaning an error probability of about 1 in 10,000 per base.
Thus this read segment is high confidence.
Details that matter
Key operational points:
- Phred scores are logarithmic, not linear.
- FASTQ stores encoded characters, not numeric scores.
- Sequence and quality lengths must match.
- Quality typically decreases toward read ends.
- Read trimming and filtering rely directly on quality thresholds.
Common mistakes
Frequent interpretation errors include:
- Assuming Q30 means 30% accuracy instead of 99.9%.
- Averaging quality scores without accounting for logarithmic scaling.
- Ignoring FASTQ encoding variants when converting files.
- Assuming quality is uniform across reads.
- Confusing base quality with mapping quality, which measures alignment confidence rather than base-call confidence.
Where Phred scores fit in pipelines
Typical workflow:
Sequencing → FASTQ with Phred scores → trimming/filtering → alignment → variant analysis
Quality scores directly influence which bases and reads contribute to downstream results.
Adjacent concept: FASTQ quality encoding
FASTQ files store Phred scores using ASCII characters with a fixed numeric offset. Historical FASTQ variants used different offsets and score systems, which can cause interpretation errors if misidentified.
Modern pipelines typically standardise data to PHRED+33 encoding.
References
Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. Genome Research (1998). https://doi.org/10.1101/gr.8.3.175
http://www.phrap.org/phredphrapconsed.html
Footnote
Phred (Phil’s Read Editor[1]) is a computer program for base calling, that is to say, identifying a nucleobase sequence from fluorescence “trace” data generated by an automated DNA sequencer that uses electrophoresis and 4-fluorescent dye method.[2][3] When originally developed, Phred produced significantly fewer errors in the data sets examined than other methods, averaging 40–50% fewer errors. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.
[1] Moody, Glyn (2004). Digital code of life: how bioinformatics is revolutionizing science, medicine, and business. Hoboken, New Jersey: John Wiley & Sons, Inc. ISBN 978-0-471-32788-2.