Learn genomics

Short explanations of genomics and bioinformatics concepts used in sequencing and variant interpretation.

Human reference genomes quick reference

A human reference genome defines the coordinate system used to map sequencing reads and report variants. Multiple bui...

What are BAM and BAI files?

A BAM file is a compressed binary format used to store sequencing read alignments to a reference genome. A BAI file i...

What is the BED format?

The BED format (Browser Extensible Data) is a tab-delimited text format used to describe genomic regions using chromo...

What is the CRAM format?

CRAM is a compressed file format used to store sequencing read alignments, equivalent to BAM but designed to achieve ...

CRAM vs SAM vs BAM explained

SAM, BAM, and CRAM are formats used to store sequencing reads after alignment to a reference genome. They record wher...

What is the CSI index format?

CSI (Coordinate Sorted Index) is a genomic index format that enables fast access to regions within large coordinate-s...

What is the FASTA format?

FASTA is a text format used to store biological sequences, such as DNA, RNA, or protein sequences, using single-lette...

What is the FASTQ format?

FASTQ is a text format used to store sequencing reads together with a quality score for each base, recording both the...

Genes vs DNA vs chromosomes explained

DNA is the molecule that stores genetic information. A gene is a specific stretch of DNA with a functional role. A ch...

How genetic inheritance works

Genetic inheritance is the way DNA variants are passed from parents to children through eggs and sperm. A child recei...

What is a Phred score?

A Phred score is a numerical measure of how likely a sequencing base call is to be wrong, expressed on a logarithmic ...

GRCh37 vs GRCh38 reference genomes explained

Human sequencing data are aligned to a reference genome that defines genomic coordinates. GRCh37 and GRCh38 are the t...

What is the SAM format?

The SAM format (Sequence Alignment/Map) is a tab-delimited text format used to record how sequencing reads align to a...

What is Tabix indexing?

Tabix is a tool and index format that allows fast retrieval of genomic regions from large, position-sorted text files...

What is a VCF file?

A VCF file (Variant Call Format) is a tab-delimited text format used to describe genomic variants detected from seque...

VCF vs gVCF explained

A VCF records positions where variants were detected in a genome, while a gVCF records both variant sites and regions...

What is a genetic disease?

A genetic disease is a disease caused by a change in DNA, either in a single gene, multiple genes, or larger parts of...

What is a genome?

A genome is the complete set of DNA in an organism. In humans, it includes nearly all genetic material in the nucleus...

What is DNA?

DNA, short for deoxyribonucleic acid, is the molecule that stores hereditary biological information in humans and mos...