Energy Efficiency and Load Balancing

Research topic and goals

The power consumption of High Performance Computing (HPC) systems is an increasing concern as large-scale systems grow in size and, consequently, consume more energy. In response to this challenge, we propose new energy-aware load balancers that aim at reducing the energy consumption of parallel platforms running imbalanced scientific applications without degrading their performance. Our research explores dynamic load balancing, low power manycore platforms and DVFS techniques in order to reduce power consumption.

Results for 2015/2016

Load Balancing

In this work we propose the improvement of the performance and scalability of parallel seismic wave models through dynamic load balancing. These models suffer from load imbalance for two reasons. First, they add a specific numerical condition at the borders of the domain, in order to absorb the outgoing energy. The decomposition of the domain into a grid of subdomains, which are distributed among tasks, creates load differences between the tasks that simulate the borders and those responsible for the central subdomains. Second, the propagation of waves in the simulated area changes the workload on the subdomains on different time-steps. Therefore causing dynamic load imbalance. In order to evaluate the use of dynamic load balancing, we ported a seismic wave simulator to Adaptive MPI, to benefit from its load balancing framework. Our experimental results show that dynamic load balancers can adapt to load variations during the application’s execution and improve performance by 36%. This work was presented in the PDP 2014 conference (Tesser et al. 2014). An extended version will be published in the International Journal of High Performance Computing and Applications (Tesser et al. 2014) Laercio Pilla described most of the load balancers in his PhD (Pilla 2014).

Power consumption

Power consumption is one of the main challenges to achieve Exascale performance. Current research trends aim at overcoming power consumption constraints using low-power processors. Although new processors feature sensors that enable precise power measurements, they provide different interfaces to collect data, making it difficult to correlate performance with energy consumption. To overcome this issue, we developed a platform-independent tool that collects power and energy data from homogeneous and heterogeneous systems. Using this tool, it provides a detailed comparison between a low-power processor (ARM big.LITTLE) and a high performance processor (Intel Sandy Bridge-EP) using all applications from the NAS parallel benchmarks and a real-world soil irrigation simulator. The results show that the average power demand of Intel Sandy Bridge-EP is within 12.6X to 152.4X higher than ARM big. LITTLE, whereas its average energy consumption is within 1.6X to 7.1x superior. Overall, ARM big.LITTLE presented a better performance/energy trade-off when it takes less than 9.2X the execution time of Intel Sandy Bridge-EP to solve the same problem. This work was published in (Padoin et al. 2015) and (Francesquini et al. 2015). Large-scale simulation of seismic wave propagation is an active research topic. Its high demand for processing power makes it a good match for High Performance Computing (HPC). Although we have observed a steady increase on the processing capabilities of HPC platforms, their energy efficiency is still lacking behind. In this work, we analyze the use of a low-power manycore processor, the MPPA-256, for seismic wave propagation simulations. First we look at its peculiar characteristics such as limited amount of on-chip memory and describe the intricate solution we brought forth to deal with this processor’s idiosyn- crasies. Next, we compare the performance and energy efficiency of seismic wave propagation on MPPA-256 to other common- place platforms such as general-purpose processors and a GPU. Finally, we wrap up with the conclusion that, even if MPPA-256 presents an increased software development complexity, it can indeed be used as an energy efficient alternative to current HPC platforms, resulting in up to 71% and 81% less energy than a GPU and a general-purpose processor, respectively. This work was presented at the SBAC PAD conference in Paris (Castro et al. 2014).

Load Balancing and Power Saving

In this work, we focus on reducing the energy consumption of imbalanced applications through a combination of load balancing and Dynamic Voltage and Frequency Scaling (DVFS). Our strategy employs an Energy Daemon Tool to gather power information and a load balancing module that benefits from the load balancing framework available in the CHARM++ runtime system. We propose two variants of our energy-aware load balancer (ENERGYLB) to save energy on imbalanced workloads without considerably impacting the overall system performance. The first one, called Fine- Grained EnergyLB (FG-ENERGYLB), is suitable for plat- forms composed of few tens of cores that allow per-core DVFS. The second one, called Coarse-Grained EnergyLB (CGENERGLB) is suitable for current HPC platforms composed of several multi-core processors that feature per-chip DVFS. This work was presented at the HiPC conference (Padoin et al. 2014).

Visits and meetings

Edson Padoin, November 2014, JLESC Workshop, Chicago
Jean-Francois Mehaut, November Z014, JLESC Workshop, Chicago
Brice Videau, June 2015, JLESC Workshop, Barcelona
Jean-Francois Mehaut, June 2015, JLESC Workshop, Barcelona

Impact and publications

Tesser, Rafael Keller, Laercio Lima Pilla, Fabrice Dupros, Philippe Olivier Alexandre Navaux, Jean-Francois Mehaut, and Celso L. Mendes. 2014. “Improving the Performance of Seismic Wave Simulations with Dynamic Load Balancing.” In 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014, Torino, Italy, February 12-14, 2014, 196–203. https://doi.org/10.1109/PDP.2014.37.

@inproceedings{KellerTesserEtAl2014,
  author = {Tesser, Rafael Keller and Pilla, Laercio Lima and Dupros, Fabrice and Navaux, Philippe Olivier Alexandre and Mehaut, Jean-Francois and Mendes, Celso L.},
  bibsource = {dblp computer science bibliography, http://dblp.org},
  biburl = {http://dblp.uni-trier.de/rec/bib/conf/pdp/TesserPDNMM14},
  booktitle = {22nd Euromicro International Conference on Parallel, Distributed,
     and Network-Based Processing, {PDP} 2014, Torino, Italy, February 12-14, 2014},
  doi = {10.1109/PDP.2014.37},
  pages = {196--203},
  timestamp = {Tue, 03 Feb 2015 17:12:45 +0100},
  title = {Improving the Performance of Seismic Wave Simulations with Dynamic
     Load Balancing},
  url = {http://dx.doi.org/10.1109/PDP.2014.37},
  year = {2014}
}

———. 2014. “Dynamic Load Balancing for Seismic Wave Propagation Models.” International Journal of High Performance Computing Applications (Accepted).

@article{KellerTesserEtAl2014a,
  title = {Dynamic load balancing for seismic wave propagation models},
  journal = {International Journal of High Performance Computing Applications (accepted)},
  author = {Tesser, Rafael Keller and Pilla, Laercio Lima and Dupros, Fabrice and Navaux, Philippe Olivier Alexandre and Mehaut, Jean-Francois and Mendes, Celso L.},
  year = {2014},
  note = {accepted}
}

Future plans

Using simulations (SimGrid, BigSim, Dimemas) for the design and analysis of load balancers (Rafael Tesser, Philippe Navaux, Arnaud Legrand, Celso Mendes)
Load Balancing and heterogenous platforms/processors (Victor Martinez, Fabrice Dupros/BRGM, Philippe Navaux, Jean-Francois Mehaut)

References

Francesquini, Emilio, Marcio Castro, Pedro H. Penna, Fabrice Dupros, Henrique C. Freitas, Philippe O.A. Navaux, and Jean-Francois Mehaut. 2015. “On the Energy Efficiency and Performance of Irregular Application Executions on Multicore, NUMA and Manycore Platforms.” J. Parallel Distrib. Comput. 76 (C): 32–48. https://doi.org/10.1016/j.jpdc.2014.11.002.

@article{FrancesquiniEtAl2015,
  acmid = {2780859},
  address = {Orlando, FL, USA},
  author = {Francesquini, Emilio and Castro, Marcio and Penna, Pedro H. and Dupros, Fabrice and Freitas, Henrique C. and Navaux, Philippe O.A. and Mehaut, Jean-Francois},
  doi = {10.1016/j.jpdc.2014.11.002},
  issn = {0743-7315},
  issue_date = {February 2015},
  journal = {J. Parallel Distrib. Comput.},
  keywords = {Energy efficiency, K-Means, Manycore, Multicore, NUMA, Performance, 
     Seismic wave propagation, TSP},
  month = feb,
  number = {C},
  pages = {32--48},
  publisher = {Academic Press, Inc.},
  title = {On the Energy Efficiency and Performance of Irregular Application Executions on Multicore, 
     NUMA and Manycore Platforms},
  url = {http://dx.doi.org/10.1016/j.jpdc.2014.11.002},
  volume = {76},
  year = {2015}
}

Padoin, Edson L., Laercio Lima Pilla, Marcio Bastos Castro, Francieli Zanon Boito, Philippe Olivier Alexandre Navaux, and Jean-Francois Mehaut. 2015. “Performance/Energy Trade-off in Scientific Computing: the Case Of ARM Big.LITTLE and Intel Sandy Bridge.” IET Computers and Digital Techniques 9 (1): 27–35. https://doi.org/10.1049/iet-cdt.2014.0074.

@article{PadoinEtAl2015,
  author = {Padoin, Edson L. and Pilla, Laercio Lima and Castro, Marcio Bastos and Boito, Francieli Zanon and Navaux, Philippe Olivier Alexandre and Mehaut, Jean-Francois},
  bibsource = {dblp computer science bibliography, http://dblp.org},
  biburl = {http://dblp.uni-trier.de/rec/bib/journals/iet-cdt/PadoinPCBNM15},
  doi = {10.1049/iet-cdt.2014.0074},
  journal = {IET Computers and Digital Techniques},
  number = {1},
  pages = {27--35},
  timestamp = {Fri, 10 Apr 2015 01:00:00 +0200},
  title = {Performance/energy trade-off in scientific computing: the case of
     ARM big.LITTLE and Intel Sandy Bridge},
  url = {http://dx.doi.org/10.1049/iet-cdt.2014.0074},
  volume = {9},
  year = {2015}
}

Castro, Marcio Bastos, Fabrice Dupros, Emilio Francesquini, Jean-Francois Mehaut, and Philippe Olivier Alexandre Navaux. 2014. “Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor.” In 26th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014, Paris, France, October 22-24, 2014, 57–64. https://doi.org/10.1109/SBAC-PAD.2014.28.

@inproceedings{CastroEtAl2014,
  author = {Castro, Marcio Bastos and Dupros, Fabrice and Francesquini, Emilio and Mehaut, Jean-Francois and Navaux, Philippe Olivier Alexandre},
  biburl = {http://dblp.uni-trier.de/rec/bib/conf/sbac-pad/CastroDFMN14},
  bibsource = {dblp computessr science bibliography, http://dblp.org},
  booktitle = {26th {IEEE} International Symposium on Computer Architecture and High
     Performance Computing, {SBAC-PAD} 2014, Paris, France, October 22-24, 2014},
  doi = {10.1109/SBAC-PAD.2014.28},
  pages = {57--64},
  timestamp = {Wed, 22 Apr 2015 16:41:58 +0200},
  title = {Energy Efficient Seismic Wave Propagation Simulation on a Low-Power
                 Manycore Processor},
  url = {http://dx.doi.org/10.1109/SBAC-PAD.2014.28},
  year = {2014}
}

Padoin, Edson L., Marcio Bastos Castro, Laercio Lima Pilla, Philippe Olivier Alexandre Navaux, and Jean-Francois Mehaut. 2014. “Saving Energy by Exploiting Residual Imbalances on Iterative Applications.” In 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014, 1–10. https://doi.org/10.1109/HiPC.2014.7116895.

@inproceedings{PadoinEtAl2014,
  author = {Padoin, Edson L. and Castro, Marcio Bastos and Pilla, Laercio Lima and Navaux, Philippe Olivier Alexandre and Mehaut, Jean-Francois},
  bibsource = {dblp computer science bibliography, http://dblp.org},
  biburl = {http://dblp2.uni-trier.de/rec/bib/conf/hipc/PadoinCPNM14},
  booktitle = {21st International Conference on High Performance Computing, HiPC 2014, Goa, India, 
     December 17-20, 2014},
  doi = {10.1109/HiPC.2014.7116895},
  pages = {1--10},
  timestamp = {Wed, 10 Jun 2015 07:47:46 +0200},
  title = {Saving energy by exploiting residual imbalances on iterative applications},
  url = {http://dx.doi.org/10.1109/HiPC.2014.7116895},
  year = {2014}
}

Pilla, Laercio L. 2014. “Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems.” Theses, Universite de Grenoble ; UFRGS. https://tel.archives-ouvertes.fr/tel-00981136.

@phdthesis{Pilla2014,
  author = {Pilla, Laercio L.},
  hal_id = {tel-00981136},
  hal_version = {v1},
  keywords = {Computer architecture ; Parallel programming ; Scheduling ; Programmation Parallele ; Profiling ; Ordonnancement ; Architecture des ordinateurs},
  month = apr,
  pdf = {https://tel.archives-ouvertes.fr/tel-00981136/file/ThA_se_LaA_rcio_LIMA_PILLA_2014-1.pdf},
  school = {Universite de Grenoble ; UFRGS},
  title = {Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems},
  type = {Theses},
  url = {https://tel.archives-ouvertes.fr/tel-00981136},
  year = {2014}
}