What is a Phred score?

Explanation of Phred quality scores used to quantify sequencing base call accuracy

A Phred score is a numerical measure of how likely a sequencing base call is to be wrong, expressed on a logarithmic scale and assigned to each nucleotide in sequencing reads.

These scores are stored alongside sequences in FASTQ files and are used throughout sequencing pipelines to filter, trim, assemble, and call variants.

Why Phred scores exist

Sequencing instruments do not directly observe DNA bases. They infer bases from signals that include noise and uncertainty.

Downstream tools therefore need a way to measure confidence at each position. Phred scores allow pipelines to automatically downweight or remove unreliable bases, improving alignment, assembly, and variant detection.

Without base-level quality scores, most modern automated analysis pipelines would not function reliably.

Core mechanism

A Phred score links base-call quality to the probability of error using:


Q = −10 log10(P)

where:

Q is the Phred score.
P is the probability the base is incorrect.

The inverse relation is:


P = 10^(−Q/10)

Each increase of 10 in score corresponds to a tenfold reduction in error probability.

Typical interpretations:

Phred score	Error probability	Base accuracy
10	1 in 10	90%
20	1 in 100	99%
30	1 in 1,000	99.9%
40	1 in 10,000	99.99%

Modern short-read data commonly show scores between 20 and 40, often decreasing toward read ends.

Example interpretation in FASTQ

FASTQ files store sequences and encoded quality values:

@READ_1
ACGT
+
IIII

Here:

Sequence: ACGT
Quality string: IIII
Each character corresponds to one base.

In standard PHRED+33 encoding, character I corresponds to quality score 40, meaning an error probability of about 1 in 10,000 per base.

Thus this read segment is high confidence.

Details that matter

Key operational points:

Phred scores are logarithmic, not linear.
FASTQ stores encoded characters, not numeric scores.
Sequence and quality lengths must match.
Quality typically decreases toward read ends.
Read trimming and filtering rely directly on quality thresholds.

Common mistakes

Frequent interpretation errors include:

Assuming Q30 means 30% accuracy instead of 99.9%.
Averaging quality scores without accounting for logarithmic scaling.
Ignoring FASTQ encoding variants when converting files.
Assuming quality is uniform across reads.
Confusing base quality with mapping quality, which measures alignment confidence rather than base-call confidence.

Where Phred scores fit in pipelines

Typical workflow:

Sequencing → FASTQ with Phred scores → trimming/filtering → alignment → variant analysis

Quality scores directly influence which bases and reads contribute to downstream results.

Adjacent concept: FASTQ quality encoding

FASTQ files store Phred scores using ASCII characters with a fixed numeric offset. Historical FASTQ variants used different offsets and score systems, which can cause interpretation errors if misidentified.

Modern pipelines typically standardise data to PHRED+33 encoding.

References

Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. Genome Research (1998). https://doi.org/10.1101/gr.8.3.175

http://www.phrap.org/phredphrapconsed.html

Footnote

Phred (Phil’s Read Editor[1]) is a computer program for base calling, that is to say, identifying a nucleobase sequence from fluorescence “trace” data generated by an automated DNA sequencer that uses electrophoresis and 4-fluorescent dye method.[2][3] When originally developed, Phred produced significantly fewer errors in the data sets examined than other methods, averaging 40–50% fewer errors. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.

[1] Moody, Glyn (2004). Digital code of life: how bioinformatics is revolutionizing science, medicine, and business. Hoboken, New Jersey: John Wiley & Sons, Inc. ISBN 978-0-471-32788-2.