When sequencing became infrastructure

2026-03-14

TLDR: Genomics won the race to read DNA. It has not yet solved what to do with what it reads. The next phase will be defined by reference genomes, interpretation, data systems, and trust, not by sequencing alone.

The first human genome took thirteen years and billions of dollars. Today a genome can be sequenced before lunch.

That change is hard to overstate. Sequencing costs have fallen by more than a million-fold since the Human Genome Project. One of the great technical barriers in modern science has already fallen.

Now a different barrier has taken its place. The problem is no longer reading DNA. It is turning genomic data into decisions that are accurate, representative, and usable.

Over the next decade, genomics research is expected to generate between 2 and 40 exabytes of data. An exabyte is a billion gigabytes. Scarcity has become abundance. The limiting factor is no longer data generation. It is interpretation, computation, standards, and trust.

A genome sequence on its own is not a diagnosis. It is not a treatment. It is not a prevention strategy. Those outcomes depend on everything around the sequence: reference genomes, analysis methods, clinical workflows, education, governance, and public trust.

Sequencing opened the field. Infrastructure will decide what the field becomes.

The first barrier fell

The first era of genomics asked a simple question. Could we read the human genome at all, and could we do it at a scale that mattered.

That work succeeded.

Recent milestones show how far the field has moved. The Telomere-to-Telomere Consortium delivered truly complete human chromosome sequences, and in 2023 completed the first gapless sequence of the human Y chromosome. For the first time, all human chromosomes had been sequenced end to end.

We are no longer working from a rough sketch of the genome. We are moving towards a complete map.

A single reference is not enough

For years, much of genomics depended on a reference that reflected only a narrow slice of humanity. That is a scientific problem. It is also a clinical one.

If the reference is too narrow, variant interpretation becomes less reliable across populations. If disease studies are too narrow, risk prediction becomes uneven. If the data are biased, the benefits are biased.

Work is now under way to move beyond the idea of a single reference genome towards a pangenome reference that captures a wider breadth of human variation. Hundreds of high-quality reference genomes are being generated for that purpose, and the direction is clear.

This is one of the most important transitions now under way in genomics. A reference that does not represent the world cannot guide the world.

The second barrier is harder

The second barrier is understanding the genome well enough to use it.

That means linking variants to function. It means understanding disease across tissues, across development, and across populations. It means integrating DNA with RNA, proteins, metabolites, clinical records, environment, and social determinants of health. It means turning vast data into conclusions that can be explained and defended.

This is why data science now sits at the centre of genomics. The field increasingly depends on shared computational systems where researchers analyse large datasets where they already sit, rather than moving copies from place to place. That shift is practical, but it is also conceptual. Genomics is now as much about systems as it is about sequence.

Sequence is not the whole story

Reading DNA tells us what is present. It does not fully explain what that sequence does, when it matters, or how it contributes to health or disease.

Functional genomics, developmental genomics, and comparative genomics are all expanding the same frontier: how variation shapes biology across tissues, across life stages, and across species. The same is true of multi-omics, where DNA is analysed alongside RNA, proteins, metabolites, and relevant clinical or environmental context.

That broader view is especially important when the aim is not simply to catalogue variation, but to understand health and disease in real populations. Current large-scale efforts are therefore combining molecular data with ancestry, environment, and social determinants of health, while deliberately increasing participation from groups that were underrepresented in earlier genomics research.

Health is not explained by sequence alone. It emerges from biology in context.

Medicine now needs systems

Genomic medicine is no longer waiting in the wings. It is already being built into clinical settings.

One visible sign is the growth of shared clinical curation efforts for genes and variants. These are helping create common interpretation frameworks that can be reused across institutions rather than rebuilt case by case. That is what clinical infrastructure looks like in genomics.

The same broader problem appears in risk prediction. Polygenic models have often worked best in populations of European ancestry because the training data were uneven. Improving this does not require a slogan. It requires broader datasets, better reference panels, and methods that perform across populations rather than only within the groups that were easiest to study first.

Healthcare systems are also beginning to integrate genomic information into routine care and to use real-world clinical outcomes to improve future practice. Genomics is moving from isolated discovery into operational medicine.

Rare disease shows the real bottleneck

Rare disease makes the state of the field especially clear. Many patients still endure a long diagnostic odyssey. Standard approaches have solved many cases, but a large fraction remain unresolved.

The harder cases do not yield simply because one more sequencing run is ordered. They require better references, long-read sequencing, RNA analysis, other molecular layers, and better ways of combining evidence.

Rare disease therefore exposes the real bottleneck in modern genomics. The limiting step is no longer only data generation. It is evidence integration.

Ethics is part of the machinery

A field this powerful cannot treat ethics as an afterthought. Genomics changes what can be known about people, families, ancestry, disease risk, and future health. A system with that reach must also be serious about privacy, fairness, communication, misuse, and trust.

The important point is not that ethics sits alongside the science. It is that ethics is part of the infrastructure too. Community engagement, genomic literacy, accessible enrolment, multilingual materials, and wider participation are not decorative additions. They are part of what makes a genomic system usable and legitimate.

What comes next

The Human Genome Project was a beginning. The past two decades reduced costs and expanded capacity. The coming decade will be defined by something harder: building the systems that make genomic information trustworthy, interpretable, and actionable for everyone.

That means better references. Better data platforms. Better tools across ancestries. Better integration of biology, environment, and clinical practice. Better standards for evidence. Better public trust.

Genomics is no longer limited by whether we can read DNA. It is limited by whether we can build the infrastructure worthy of what we have learned.

Sequencing opened the door. Infrastructure will decide who gets through it.