A readahead prefetching mechanism for GPU I/O
Recent works have exhibited the performance gains of integrating GPUs with Operating Systems, which enables GPU threads to issue I/O requests without the intervention of the CPU. However, when the GPU threads access files sequentially, they tend to suffer from reduced I/O performance compared to CPU threads that perform sequential I/O. The reasons for this performance degradation are the inherent limitations imposed by the CPU-GPU system organization and the overhead incurred by the infrastructure that allows the GPU threads to perform I/O.
In this work therefore, we propose a new, system-level, readahead prefetching mechanism for efficient GPU I/O that is integrated with GPUfs which is a filesystem layer that provides POSIX-like abstractions for GPU. We perform an in-depth analysis of the CPU-GPU heterogeneous system limitations that hinder the efficiency of GPU I/O sequential accesses, and explain how we tackle them with our proposed solution. In addition, we examine the case when the GPU page cache size is smaller than the size of the file that needs to be accessed. We show that prefetching cannot boost the performance significantly due to cache thrashing and propose a new cache replacement policy to overcome the page cache size limitation.
The proposed prefetcher increases significantly the performance of GPU I/O sequential accesses. The geometric mean of their speedup is roughly 2.37x than the default case. Finally, we evaluate our mechanism on two real-world applications derived from the RODINIA benchmark suite. The use of our prefetcher improves the average (geometric mean) performance of the execution time by 18.7%, and the average (geometric mean) performance of the I/O bandwidth by 81.6% compared to CPU I/O.