Quantum Pangenomics
Recent advances in genomic research, driven by the rapid growth of sequencing technologies and data analysis methods, have raised questions about the adequacy of single reference genomes in capturing the full genetic diversity of species. Pangenomics, the study of multiple complete genomes in tandem, provides a more comprehensive approach by analysing the entire genetic variation within a species, rather than relying on a single reference genome.
However, pangenomic analysis is computationally intensive due to the complexity and structure of the data. Classical algorithms often depend on heuristics to manage these complexities, which limits their scalability and accuracy as datasets grow.
Quantum computing offers promise to revolutionise the field, by using algorithms capable of efficiently navigating complex data. This quantum advantage could enable more accurate and scalable pangenome analysis without requiring heuristics, facilitating insights into regions with high genetic variability (e.g., the HLA-DRB1 gene, critical for human immune function) and improving pathogen surveillance (e.g., tracking mutations in the spike protein of SARS-CoV-2).
As part of the Wellcome Leap Q4Bio initiative, our international team is pioneering the application of quantum computing to pangenomics, and laying the foundations of a novel field of research in Quantum Pangenomics, and demonstrating the power of this research to improve human health on a global scale.
In Phase I of the project, our team adapted complex problems like genome assembly and construction of phylogenetic trees into a hybrid quantum-classical framework, enabling promising quantum speedups with emerging technology. Our key innovations include scalable quantum data encoding algorithms, setting the stage for storing and manipulating significant amounts of genomic data, and faster algorithms for genome assembly and phylogenetic tree inference.
The role of research computing¶
Phase II has required that we simulate our core algorithms at scale on HPC.
We have simulated our new approaches with HPC using machine-learning-oriented encoding schemes and tensor network methods. Alongside this work, we have tested the ability of our algorithms to resolve parts of the genome graph that are intractable classically. As we gain insight into performance at scale, we hope to move forward into Phase III to implementation on real quantum hardware. In collaboration with quantum hardware vendors, we will ensure that our proposed implementations account for hardware-specific architecture and noise properties.
Our RSE and RTP team has had a critical role in this pioneering quantum bioinformatics research, from providing a robust platform for quantum simulations to fast-tracking the swift procurement and installation of dedicated hardware.
Our RSE have been instrumental in working with the researchers directly in an R&D role. The algorithms for data encoding, graph tangle resolution, and phylogenetic tree construction have been adapted to use GPU resources and have been optimised for HPC. Additionally, the RSE have maintained an active role in the development of algorithms from the drawing board to execution.
Commitment to open science - public web pipeline for quantum bioinformatics¶
We have developed a public web interface for quantum bioinformatics simulation tooling, based on the work of the QPG collaboration.
This pipeline allows external users to submit their own data and run quantum simulations of our core algorithms on Sanger HPC.