Towards accurate network utilization forecasting using portable MPI-level monitoring

Research topic and goals

The goal of this project is to study how a careful monitoring of MPI communications can help in forecasting communication to avoid congestion on the network when writing checkpoints. This work will be based on the low-level monitoring interface that has been implemented by Inria and UTK in OpenMPI (George et al. 2017). We want to monitor applications communication with this feature and, using time-series analysis and other techniques, predict the future usage of the network by the application. With such prediction we will schedule I/O access of VeloC (“Very Low Overhead transparent multilevel Checkpoint/restart”), to avoid interference between the checkpoint writing to the storage system and the usage of the network by the application.

Contributions:

  • A transparent application monitoring system within VeloC
  • A network tool that predicts network usage of the application
  • Strategies to avoid network interference between the application and VeloC.

Results for 2018/2019

None yet.

Visits and meetings

Impact and publications

    Future plans

    References

    1. George, Bosilca, Foyer Clement, Jeannot Emmanuel, Mercier Guillaume, and Papauré Guillaume. 2017. “Online Dynamic Monitoring of MPI Communications.” Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing. Springer, Cham.
      @inproceeding{Bosilca17online,
        title = {{Online Dynamic Monitoring of MPI Communications}},
        author = {George, Bosilca and Clement, Foyer and Emmanuel, Jeannot and Guillaume, Mercier and Guillaume, Papauré},
        booktitle = {Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing},
        publisher = {Springer, Cham},
        volume = {10417},
        pages = {49--62},
        year = {2017}
      }