Lynx: A Learning Linux Prefetching Mechanism For SSD Performance Model
Résumé
Traditional Linux prefetching algorithms were based on spatial locality of I/O workloads and performance model of hard disk drives. From the applicative point of view, current data-intensive applications I/O workloads are turning towards more random patterns while from the storage device perspective, flash based storage devices present a different performance model than HDDs. In this work, we present a new prefetching mechanism named Lynx. Lynx aims to adapt and/or complement the Linux read-ahead prefetching system for both SSD performance model and new applications needs. Lynx uses a simple machine learning system based on Markov chains. The learning phase detects I/O workload patterns and computes the transition probabilities between file pages. The prediction phase prefetchs predicted file pages with the resulting Markov statemachine. We have implemented our solution and integrated it into the Linux kernel. We experimented our solution using the TPCH benchmark. The results show that Lynx divides the number of page cache misses (major page faults) by 2 on average and thus reduces TPC-H queries execution time by 50% as compared to traditional Linux read-ahead.