Short explanations of genomics and bioinformatics concepts used in sequencing and variant interpretation.
A human reference genome defines the coordinate system used to map sequencing reads and report variants. Multiple bui...
A BAM file is a compressed binary format used to store sequencing read alignments to a reference genome. A BAI file i...
The BED format (Browser Extensible Data) is a tab-delimited text format used to describe genomic regions using chromo...
CRAM is a compressed file format used to store sequencing read alignments, equivalent to BAM but designed to achieve ...
SAM, BAM, and CRAM are formats used to store sequencing reads after alignment to a reference genome. They record wher...
CSI (Coordinate Sorted Index) is a genomic index format that enables fast access to regions within large coordinate-s...
FASTA is a text format used to store biological sequences, such as DNA, RNA, or protein sequences, using single-lette...
FASTQ is a text format used to store sequencing reads together with a quality score for each base, recording both the...
DNA is the molecule that stores genetic information. A gene is a specific stretch of DNA with a functional role. A ch...
Genetic inheritance is the way DNA variants are passed from parents to children through eggs and sperm. A child recei...
A Phred score is a numerical measure of how likely a sequencing base call is to be wrong, expressed on a logarithmic ...
Human sequencing data are aligned to a reference genome that defines genomic coordinates. GRCh37 and GRCh38 are the t...
The SAM format (Sequence Alignment/Map) is a tab-delimited text format used to record how sequencing reads align to a...
Tabix is a tool and index format that allows fast retrieval of genomic regions from large, position-sorted text files...
A VCF file (Variant Call Format) is a tab-delimited text format used to describe genomic variants detected from seque...
A VCF records positions where variants were detected in a genome, while a gVCF records both variant sites and regions...
A genetic disease is a disease caused by a change in DNA, either in a single gene, multiple genes, or larger parts of...
A genome is the complete set of DNA in an organism. In humans, it includes nearly all genetic material in the nucleus...
DNA, short for deoxyribonucleic acid, is the molecule that stores hereditary biological information in humans and mos...