Leverage Blue Waters I/O Traces to Size and Allocate Future Storage Systems

Research topic and goals

In recent years, to address the growing gap between computing power and I/O bandwidth, storage tiers have multiplied within supercomputers. While the paradigm until then was a shared global file system (Lustre, IBM Storage Scale, …), storage is now available in a disaggregated form, still presenting a centralized storage system but completed with several intermediate layers such as node-local disks, burst buffers or network-attached storage (NVMeoF, CXL) to name a few.

This profusion of different technologies and architectures makes the use of these resources complex and their sizing at the machine design stage risky. One approach to partially solving these problems is to model supercomputers equipped with several levels of storage, coupled with the simulation of the scheduling of an execution history of large-scale I/O intensive applications. This type of simulation allows us to observe the behavior of storage tiers in the face of real-world workloads. Recently, following the decommissioning of the machine, several years of execution traces (including I/O traces) of applications that ran on Blue Waters have been made public. This mass of information is invaluable to feed simulations and study the architecture of modern storage systems.

In this JLESC project, we propose to analyze Darshan traces and Lustre metrics from several years of Blue Waters production to feed StorAlloc (Monniot et al. 2022), a simulator of a storage-aware job scheduler developed within the Inria KerData’s team. The goal of work is twofold: to provide a post-mortem study on the sizing of Blue Waters’ storage system and to explore the design of future highly storage-disaggregated HPC systems.

Results for 2023/2024

We introduce Fives, a storage system simulator based on WRENCH and SimGrid, two simulation frameworks in the field. Fives, currently under development, is capable of reproducing the behavior of a Lustre file system. Using Darshan execution traces to both calibrate and validate the simulator, Fives can extract a number of metrics and correlation indices demonstrating a reasonable level of accuracy between real and simulated I/O times. The traces currently used in Fives come from machines for which only aggregated Darshan traces are publicly available. We are currently working on using Blue Waters traces to feed our simulator.

A paper presenting our first results is in preparation and will be submitted in the first half of 2024.

In a second time, we will consider new calibration opportunities offered by the union of Blue Waters Darshan, Torque (resource manager) and Lustre traces. In particular, we expect this new data to allow finer calibration and validation of the Lustre model inside our simulator.

Results for 2024/2025

A conference paper presenting Fives (Monniot et al. 2024), an HPC storage system simulator, has been accepted at HiPC’24 (Bengalore, India). Although Fives uses Darshan traces from an ANL system (Theta), an extended version of the paper exploiting Blue Waters traces is in preparation.

We also introduced MOSAIC (Jolivel et al. 2024), an approach to categorize execution traces and give information about the general behavior of applications from an I/O perspective. we analyze a full year of I/O execution traces of Blue Waters from which, we determine a set of non-exclusive categories to describe the I/O behavior of jobs, including the temporality and the periodicity of the accesses and the metadata overhead. This paper has been accepted in the SC’24 PDSW workshop. This work is currently being pursued, with several lines of research focusing in particular on automating the clustering of I/O operations.

Finally, still using Blue Waters traces among others, we proposed an in-depth study of access temporality on large-scale storage systems. This work has been accepted at IPDPS 2025 (Boito et al. 2025).

References

Boito, Francieli, Luan Teylo, Mihail Popov, Théo Jolivel, François Tessier, Jakob Luettgau, Julien Monniot, Ahmad Tarraf, André Carneiro, and Carla Osthoff. 2025. “A Deep Look Into the Temporal I/O Behavior of HPC Applications.” https://inria.hal.science/hal-04887809.

@unpublished{boitoEtAl2024,
  title = {{A Deep Look Into the Temporal I/O Behavior of HPC Applications}},
  author = {Boito, Francieli and Teylo, Luan and Popov, Mihail and Jolivel, Th{\'e}o and Tessier, Fran{\c c}ois and Luettgau, Jakob and Monniot, Julien and Tarraf, Ahmad and Carneiro, Andr{\'e} and Osthoff, Carla},
  url = {https://inria.hal.science/hal-04887809},
  note = {working paper or preprint},
  year = {2025},
  month = jan,
  keywords = {high-performance computing ; parallel file systems ; workload characterization ; temporal I/O behavior},
  pdf = {https://inria.hal.science/hal-04887809v1/file/main.pdf},
  hal_id = {hal-04887809},
  hal_version = {v1}
}

Monniot, Julien, François Tessier, Henri Casanova, and Gabriel Antoniu. 2024. “Simulation of Large-Scale HPC Storage Systems: Challenges and Methodologies.” In HiPC 2024 - 31st IEEE International Conference on High Performance Computing, Data, and Analytics, 1–11. Bangalore, India. https://inria.hal.science/hal-04784808.

@inproceedings{monniotEtAl2024,
  title = {{Simulation of Large-Scale HPC Storage Systems: Challenges and Methodologies}},
  author = {Monniot, Julien and Tessier, Fran{\c c}ois and Casanova, Henri and Antoniu, Gabriel},
  url = {https://inria.hal.science/hal-04784808},
  booktitle = {{HiPC 2024 - 31st IEEE International Conference on High Performance Computing, Data, and Analytics}},
  address = {Bangalore, India},
  pages = {1-11},
  year = {2024},
  month = dec,
  keywords = {HPC ; Storage ; Modeling ; Simulation},
  pdf = {https://inria.hal.science/hal-04784808v1/file/Simulation_based_Study_of_a_Large_Scale_Storage_System-2.pdf},
  hal_id = {hal-04784808},
  hal_version = {v1}
}

Jolivel, Théo, François Tessier, Julien Monniot, and Guillaume Pallez. 2024. “MOSAIC: Detection and Categorization of I/O Patterns in HPC Applications.” In PDSW 2024. Atlanta, United States. https://doi.org/10.1109/SCW63240.2024.00172.

@inproceedings{jolivelEtAl2024,
  title = {{MOSAIC: Detection and Categorization of I/O Patterns in HPC Applications}},
  author = {Jolivel, Th{\'e}o and Tessier, Fran{\c c}ois and Monniot, Julien and Pallez, Guillaume},
  url = {https://hal.science/hal-04808300},
  booktitle = {{PDSW 2024}},
  address = {Atlanta, United States},
  year = {2024},
  month = nov,
  doi = {10.1109/SCW63240.2024.00172},
  keywords = {characterization ; analysis ; I/O ; HPC},
  pdf = {https://hal.science/hal-04808300v1/file/PDSW24_Workshop_Paper-7.pdf},
  hal_id = {hal-04808300},
  hal_version = {v1}
}

Monniot, Julien, François Tessier, Matthieu Robert, and Gabriel Antoniu. 2022. “StorAlloc: A Simulator for Job Scheduling on Heterogeneous Storage Resources.” In HeteroPar 2022. Glasgow, United Kingdom. https://inria.hal.science/hal-03683568.

@inproceedings{monniot:hal-03683568,
  title = {{StorAlloc: A Simulator for Job Scheduling on Heterogeneous Storage Resources}},
  author = {Monniot, Julien and Tessier, Fran{\c c}ois and Robert, Matthieu and Antoniu, Gabriel},
  url = {https://inria.hal.science/hal-03683568},
  booktitle = {{HeteroPar 2022}},
  address = {Glasgow, United Kingdom},
  year = {2022},
  month = aug,
  pdf = {https://inria.hal.science/hal-03683568v2/file/StorAlloc_2022.pdf},
  hal_id = {hal-03683568},
  hal_version = {v2}
}