MAFₘₐₓ calculator

Estimate the maximum minor allele frequency (MAF) your cohort can support. A quick, transparent tool for setting MAF thresholds in human WGS cohort analysis.

MAF filtering helps exclude common, likely benign variants so that analyses focus on rare, potentially disease-causing alleles. This calculator estimates the maximum MAF for a variant observed in a cohort. It assumes diploidy (2 alleles per individual) and allows you to specify:

  • Alternate alleles in patient(s): default 2 for a single case (1 if heterozygous, 2 if homozygous or recessive). Increase beyond 2 only if you have multiple affected individuals with the same phenotype.
  • Alternate alleles from parents: default 2 (if both carriers), 0 if parents are not in the cohort, and only increase beyond 2 if multiple parental carriers are included across shared families.

The calculated MAFₘₐₓ can be used with PLINK or vcftools for filtering, for example:

vcftools \
  --gzvcf ${vcf_file} \
  --max-maf ${qv_snvindel_v2_vcftoolsMAF_value} \
  --recode --recode-INFO-all \
  --out ${vcf_out}
  

Some tools (like certain versions of vcftools or PLINK) exclude variants exactly matching the MAF threshold. Tick this box to add a small margin (+1 allele) so that variants at the boundary are included.


Why is this used?

Filtering by minor allele frequency (MAF) or allele frequency (AF) is a standard step in whole genome sequencing (WGS) analysis. Cohorts are used because combining samples provides internal frequency context and improves quality control. Even when analysing single cases, they are often grouped into cohorts, either because they share a disease phenotype or because their data were processed together in the same sequencing batch.

This approach separates rare, likely causal variants from common variants that are usually benign. Each individual carries thousands of variants, so frequency filtering reduces false positives and improves interpretation. A MAF of 0.01 means that the alternate allele appears in about 1% of all alleles in the cohort, which corresponds to roughly 10% of individuals if everyone is heterozygous. This helps identify whether a variant is truly rare or widespread in the population.

Many pipelines process cohort data jointly to detect shared artefacts or systematic sequencing errors before filtering down to individual patient results. Common tools that apply or interpret MAF filters include GATK, BCFtools, VCFtools, and PLINK for genome-wide association studies (GWAS). Filtering ensures that downstream analyses, whether statistical, Bayesian, or clinical, are based on realistic population frequencies rather than being influenced by common, neutral variation.

Related methods

The cohort-level MAF calculation above provides the simplest frequency-based filter. Other approaches use additional data or internal computations within analysis tools:

  • External population filters – use public datasets such as gnomAD or TOPMed to exclude globally common variants.
    bcftools filter -e 'INFO/gnomAD_AF >= 0.01' input.vcf.gz -Oz -o filtered.vcf.gz
  • Case–control frequency testing – compare allele frequencies between affected and unaffected groups.
    plink --bfile cohort --pheno phenotype.txt --assoc
  • Call-rate adjusted MAF – correct for missing genotypes when some samples lack calls.
    MAF = alt_alleles / (2 * n_called_individuals)
  • Population- or ancestry-stratified MAF – compute subgroup-specific frequencies for population structure or batch QC.
    bcftools +fill-tags input.vcf.gz -- -t AF,AC,AN -S ancestry_groups.txt
  • Theoretical MAF thresholds – estimate the maximum credible frequency for pathogenic variants based on disease prevalence and inheritance model.
    MAF_max = prevalence × (1 / penetrance) × (1 / genetic_heterogeneity)

These methods extend simple cohort-level frequency filtering and are implemented internally in tools such as GATK, BCFtools, VCFtools, and PLINK depending on study design.