Seminar: Graduate Seminar
MetaCache-FPGA: Hardware Acceleration for Genome Read Classification Based on Minhashing
The rapid decline in DNA sequencing costs has fueled massive growth in genomic data and intensified the need for fast, scalable metagenomic read classification tools. Traditional CPU-based solutions, such as Kraken2 and Centrifuge, struggle with the computational demands and irregular memory access patterns of building and querying large k-merโbased indices. MetaCache-FPGA addresses this by leveraging FPGA acceleration to implement a pipelined, memory-efficient hash table with on-chip minhash fingerprinting and highly parallel insertion logic. Inspired by GPU-style dataflow but optimized for FPGA primitives, the architecture exploits the orthogonality of read data streams to maximize resource utilization, pipeline efficiency, and operating frequency. On modern FPGA platforms like the Xilinx Alveo U280, MetaCache-FPGA can build reference indices of hundreds of gigabases in under a minute, far outperforming CPU solutions that require tens of minutes to hours. On AWS F1 (VU9P) hardware, we prove that it achieves a 64ร average query speedup over the multithreaded MetaCache CPU version and a 50ร improvement compared to the NVIDIA A100 single GPU implementation from MetaCache-GPU. These advances enable interactive-scale indexing and classification, paving the way for real-time, dynamic reference database composition in clinical and environmental metagenomics.
M.Sc. student under the supervision of Prof. Ran Ginosarย and Dr. Leonid Yavits.