Making Control in High Performance Computing for Overload Avoidance Adaptive in Time and Job Size - CTRL-A : ConTRoL for safe Autonomic computing systems
Communication Dans Un Congrès Année : 2024

Making Control in High Performance Computing for Overload Avoidance Adaptive in Time and Job Size

Résumé

The feedback control of High-Performance Computing (HPC) has been explored as an application area of Control Theory, because of the high variability involved in their resource management. A regulation mechanism can allow to soundly automate the injection of small flexible jobs in a cluster. A trade-off is needed, to fill up the cluster’s computing capacity while avoiding overload of e.g., the file server. In this work, we describe new results in this context, where the overload avoidance controller is made adaptive to the jobs’ size, that is a time-varying unknown parameter. To do so, the original PI controller is enhanced with an online estimation algorithm that allows the controller to adapt to various working conditions, to avoid performance degradation. Parallel and robust estimation algorithms are designed, tackling the challenges of bursting and noise in the system. Validation and evaluation of the adaptive controller are performed on a large-scale experimental HPC platform, showing higher robustness thanthe state-of-the-art in highly varying conditions. Reproducible analysis are available at doi:10.5281/zenodo.11961696.
Fichier principal
Vignette du fichier
2024 CCTA - Adaptive PI Cigri.pdf (10.99 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04669743 , version 1 (09-08-2024)

Licence

Identifiants

  • HAL Id : hal-04669743 , version 1

Citer

Rosa Pagano, Sophie Cerf, Bogdan Robu, Quentin Guilloteau, Raphaël Bleuse, et al.. Making Control in High Performance Computing for Overload Avoidance Adaptive in Time and Job Size. CCTA 2024 - 8th IEEE Conference on Control Technology and Applications, Aug 2024, Newcastle Upon Tyne, United Kingdom. pp.1-8. ⟨hal-04669743⟩
246 Consultations
63 Téléchargements

Partager

More