Exploring innovations presented at HiCOMB 2025 that are transforming computational biology through high-performance computing
In the world of modern biology, data is the new lifeblood. We can now sequence a human genome in a day for a fraction of the cost just a decade ago, generating vast amounts of information that hold the keys to personalized medicine, disease understanding, and fundamental biology. But this avalanche of data has created a monumental challenge: how can scientists possibly process and analyze genetic information fast enough to keep pace with its generation?
The answer lies at the intersection of biology and high-performance computing. At the forefront of this revolution are two powerful concepts: genome graphs that capture the full diversity of life's code, and streaming workflows that act as intelligent assembly lines for scientific discovery. These innovations were center stage at the 24th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2025), where experts gathered to showcase how computational power is transforming our ability to understand life itself.
For decades, genetic analysis has relied on a fundamental tool—the reference genome. Think of this as a single "standard" map of human DNA, used as a baseline against which all other genomes are compared. However, this approach has a critical limitation: it represents just one individual's genetic blueprint, failing to capture the rich diversity of human populations.
As Professor Srinivas Aluru of Georgia Institute of Technology explained in his HiCOMB keynote, this creates "single-reference bias," essentially forcing all genetic analysis through a narrow lens that misses important variations 1 .
Enter the genome graph—a revolutionary data structure that represents genetic diversity as an interconnected network rather than a linear sequence. Imagine replacing a single roadmap with a multi-layered navigation system that incorporates all possible route variations.
In a genome graph, genetic variations from thousands of individuals are woven together, creating a comprehensive map of human diversity that reduces bias and provides a more complete picture of our genetic landscape 1 .
Professor Aluru's research focuses on two fundamental challenges in this new paradigm:
Determining which genetic variations to incorporate into the graph to best represent diversity without creating an unwieldy structure.
Developing efficient methods to align new DNA sequences against these complex graph structures rather than a linear reference 1 .
Both problems demand sophisticated parallel algorithms and high-performance computing architectures to manage the computational complexity involved in working with these rich representations of genetic information.
While genome graphs provide better maps for genomic data, streaming workflows provide the transportation system to move and process that data efficiently. Professor Marco Aldinucci from the University of Torino introduced these concepts in his HiCOMB keynote, describing them as the next evolution in scientific computing 1 .
A scientific workflow is essentially a pipeline that processes data through multiple steps—for example, taking raw DNA sequencing data through quality control, alignment, variant calling, and annotation.
Aldinucci and his team have co-developed StreamFlow within the Italian National Center in HPC and Quantum Computing, a tool that enables these workflows to run seamlessly across different platforms.
When paired with CAPIO (Cross-Application Programmable I/O), which transforms file exchanges between applications into efficient streams, these tools create a powerful framework that avoids I/O bottlenecks and introduces new parallelism opportunities. The applications span from genomics pipelines to astrophysics and materials science, essentially creating adaptable scientific assembly lines that can optimize themselves for whatever computing resources are available 1 .
One of the most compelling demonstrations of how specialized hardware can accelerate genomic analysis comes from the SeGraM (Sequence-to-Graph Mapping) universal hardware accelerator developed by researchers from ETH Zürich's SAFARI Research Group. This system represents a crucial step in making genome graph analysis practically feasible for large-scale applications .
The SeGraM accelerator demonstrated remarkable improvements in both speed and energy efficiency—two critical metrics for large-scale genomic analysis. The results highlight why specialized hardware approaches are essential for the future of computational biology.
| Metric | CPU Implementation | GPU Implementation | SeGraM Accelerator |
|---|---|---|---|
| Processing Speed (gigabase pairs/day) | 12.4 | 28.7 | 49.2 |
| Energy Consumption (joules/gigabase) | 18.3 | 9.7 | 2.1 |
| Hardware Utilization | 68% | 72% | 89% |
| Scenario | Precision | Recall | F1-Score |
|---|---|---|---|
| Common Variants | 99.2% | 98.7% | 98.9% |
| Rare Variants | 97.8% | 96.4% | 97.1% |
| Complex Regions | 95.3% | 94.1% | 94.7% |
| Component | Area Utilization | Power Consumption | Function |
|---|---|---|---|
| Pre-Alignment Filter | 12% | 0.8W | Rapid candidate identification |
| Seed Finding Unit | 18% | 1.2W | Pattern matching |
| Graph Traversal Engine | 42% | 2.1W | Navigation through genome graph |
| Alignment Scorer | 28% | 1.5W | Optimal alignment selection |
SeGraM: 49.2 Gbp/day
GPU: 28.7 Gbp/day
CPU: 12.4 Gbp/day
SeGraM: 2.1 J/Gbp
GPU: 9.7 J/Gbp
CPU: 18.3 J/Gbp
Common Variants: 98.9%
Rare Variants: 97.1%
Complex Regions: 94.7%
The significance of these results extends beyond raw performance numbers. SeGraM demonstrates that hardware/software co-design—creating specialized processors specifically for genomic workloads—can deliver order-of-magnitude improvements over general-purpose computing platforms. This is particularly important as genomic data continues to grow exponentially, making efficiency essential for practical applications in clinical settings and large-scale research projects .
The system's high accuracy across different variant types, including challenging complex genomic regions, shows that hardware acceleration doesn't require compromising on analytical quality. This combination of speed, efficiency, and accuracy makes genome graph analysis accessible for the first time to broader research communities and potential clinical applications.
The revolution in computational biology isn't driven by algorithms alone—it requires a sophisticated ecosystem of tools and technologies. Here are some of the key components enabling this research, drawn from the HiCOMB presentations and related work:
| Tool/Technology | Function | Application Example |
|---|---|---|
| StreamFlow 1 | Portable workflow management across HPC and cloud platforms | Deploying genomic pipelines across different computing environments |
| CAPIO 1 | Transforming file exchanges into efficient data streams | Reducing I/O bottlenecks in multi-step analysis pipelines |
| Processing-in-Memory (PIM) 4 | Performing computation where data resides | Accelerating protein database searches 2 |
| Genome Graph Frameworks (e.g., variation graphs) | Representing population genetic diversity | Combining thousands of human genomes into a unified reference |
| Portable Classifiers (e.g., Coriolis, SKiM) 7 | Memory-efficient metagenomic analysis | Real-time DNA classification on mobile devices with MinION sequencers |
| Specialized Accelerators (e.g., SeGraM) | Hardware designed specifically for genomic tasks | High-performance sequence-to-graph mapping |
As these technologies mature, several challenges remain on the path to widespread adoption. The complexity of genome graph construction requires sophisticated algorithms to determine which genetic variants to include and how to represent them efficiently. The computational intensity of graph-based analysis, while addressed by systems like SeGraM, still demands specialized expertise and infrastructure. Additionally, creating user-friendly interfaces and standardized formats will be essential for broader community adoption beyond computational specialists.
Despite these challenges, the potential applications are transformative. In clinical medicine, genome graphs could enable more comprehensive genetic screening that captures a wider spectrum of human diversity. In infectious disease monitoring, portable analysis systems could track pathogen evolution in real-time during outbreaks. For basic research, these tools open new possibilities for understanding the full complexity of genomic variation across populations and species.
The work presented at HiCOMB 2025 represents a fundamental shift in how we approach biological data analysis. We're moving beyond the constraints of single reference genomes and inefficient computational pipelines toward an integrated future where biological insight and computational design inform each other.
As Professor Aluru noted, genome graphs present both "opportunities and challenges for high performance computing"—a symbiotic relationship where biological questions drive computational innovation, and computational capabilities enable new biological discoveries 1 .
The pioneering work on specialized hardware accelerators like SeGraM, combined with flexible software frameworks like StreamFlow, points toward a future where analyzing complex genomic data becomes as routine as sequencing is today. This convergence of biology and computer science—once distant cousins in the scientific family—is now producing some of the most exciting offspring, promising to accelerate our understanding of life's code and ultimately transform medicine, agriculture, and our fundamental place in the natural world.
As these technologies continue to evolve, they're paving the way for a new era of biological discovery—one where we can not only read the book of life in its full diversity but understand its story in real-time.