Developer tools for porting and tuning parallel applications on extreme-scale parallel systems

Research topic and goals

Application developers targeting extreme-scale HPC systems such as JUQUEEN (BG/Q) and Kei (K computer) need effective tools to assist with porting and tuning for these unusual systems. The XcalableMP compilation system (and directive-based language) (Lee and Sato 2010) (Tsuji et al. 2013), Scalasca/Score-P execution measurement and analysis tools (Geimer et al. 2010) (Knüpfer et al. 2012) (using SIONlib scalable file I/O (Frings, Wolf, and Petkov 2009)) and Paraver/Extrae/Dimemas measurement and analysis tools (“BSC Tools For Performance Analysis” 2017) are notable examples of tools developed by RIKEN, JSC and BSC for this purpose. This project proposes to extend their support for JLESC HPC systems and exploit their capabilities in an integrated work flow.
Existing training material will be adapted to collaborators large-scale HPC systems, augmented with newly prepared material, and refined for better uptake based on participant evaluations and feedback. Travel and accommodation expenses of training presenters to participate in joint training events (such as VI-HPS Tuning Workshops (“VI-HPS Tuning Workshop Series” 2016)) will be supported. Collaborative work with application developers will assess the effectiveness of the current (and revised) tools, and help direct development of new tool capabilities.

Results for 2016/2017

  • Specification of XMPT generic tool interface for XcalableMP PGAS runtime (based on OMPT).
  • Initial prototype implementation of XMPT interface in Omni XMP compiler, used by Extrae/Paraver.
  • Definition of POP standard metrics for MPI and OpenMP applications (“POP Standard Metrics For Parallel Performance Analysis” 2016).
  • Document how to obtain POP standard metrics in Paraver (“Paraver Efficiences Guide” 2016)
  • Calculation of POP standard metrics as derived metrics by CUBE.
  • Tools training for NERSC, DKRZ, IT4I, EPCC/Southampton and RWTH, covering tools from BSC (Paraver/Extrae/Dimemas) and JSC (Scalasca/Score-P/CUBE) using local HPC systems.

Visits and meetings

Face-to-face meetings at ISC-HPC (Frankfurt am Main, 2016-06), SC (Salt Lake City, 2016-11) and 6th JLESC Workshop (Kobe, 2016-12). Meeting with MYX project (“Project MYX” 2016) members at ISC-HPC to discuss XMPT tools interface commonalities for correctness checking and performance analysis tools.

Visits planned for the next 12 months: none

Impact and publications

POP standard metrics applied in POP services performance analyses.

    Future plans

    The existing integration of XcalableMP and Scalasca will be updated to the latest community-developed Score-P instrumentation and measurement infrastructure.
    Investigate Extrae and Score-P support required for XcalableMP. Use of Scalasca/Score-P and Paraver/Extrae to analyse execution performance of RIKEN FIBER mini-apps. Address terminology inconsistencies between JSC and BSC tools in their analyses and documentation. Workshops and training organised under the auspices of VI-HPS (“Virtual Institute – High Productivity Supercomputing” 2016) or the POP Centre of Excellence (“Performance Optimisation And Productivity: EU Centre of Excellence” 2015).

    References

    1. “BSC Tools For Performance Analysis.” 2017. http://tools.bsc.es/.
      @misc{BSCtools,
        title = {BSC Tools for Performance Analysis},
        url = {http://tools.bsc.es/},
        year = {2017}
      }
      
    2. “FIBER Mini-App Suite.” 2016. http://fiber-miniapp.github.io/.
      @misc{FIBER,
        title = {FIBER Mini-app Suite},
        url = {http://fiber-miniapp.github.io/},
        year = {2016}
      }
      
    3. “Project MYX.” 2016. http://doc.itc.rwth-aachen.de/display/CCP/Project+MYX.
      @misc{MYXproject,
        title = {Project MYX},
        url = {http://doc.itc.rwth-aachen.de/display/CCP/Project+MYX},
        year = {2016}
      }
      
    4. “NEST Neural Simulation Tool.” 2016. http://www.nest-simulator.org/.
      @misc{NEST,
        title = {NEST Neural Simulation Tool},
        url = {http://www.nest-simulator.org/},
        year = {2016}
      }
      
    5. “POP Standard Metrics For Parallel Performance Analysis.” 2016. https://pop-coe.eu/node/69.
      @misc{POPmetrics2016,
        title = {POP Standard Metrics for Parallel Performance Analysis},
        url = {https://pop-coe.eu/node/69},
        year = {2016}
      }
      
    6. “Paraver Efficiences Guide.” 2016. https://pop-coe.eu/sites/default/files/pop_files/paraverefficenciesguide.pdf.
      @misc{POPmetParaver2016,
        title = {Paraver Efficiences Guide},
        url = {https://pop-coe.eu/sites/default/files/pop_files/paraverefficenciesguide.pdf},
        year = {2016}
      }
      
    7. “Virtual Institute – High Productivity Supercomputing.” 2016. http://www.vi-hps.org/.
      @misc{VIHPS,
        title = {Virtual Institute -- High Productivity Supercomputing},
        url = {http://www.vi-hps.org/},
        year = {2016}
      }
      
    8. “VI-HPS Tuning Workshop Series.” 2016. http://www.vi-hps.org/training/tws/.
      @misc{VIHPSTWS,
        title = {VI-HPS Tuning Workshop Series},
        url = {http://www.vi-hps.org/training/tws/},
        year = {2016}
      }
      
    9. Kitayama, Itaru, Brian J. N. Wylie, and Toshiyuki Maeda. 2015. “Execution Performance Analysis Of the ABySS Genome Sequence Assembler Using Scalasca on the K Computer.” In Proc. Int’l Conf. On Parallel Computing (ParCo, Edinburgh, Scotland). IOS Press. https://juser.fz-juelich.de/record/279895.
      @inproceedings{KitayamaEtAl2015,
        author = {Kitayama, Itaru and Wylie, Brian J. N. and Maeda, Toshiyuki},
        booktitle = {Proc. Int'l Conf. on Parallel Computing (ParCo, Edinburgh, Scotland)},
        month = sep,
        publisher = {IOS Press},
        title = {Execution Performance Analysis of the {ABySS} Genome Sequence Assembler using {Scalasca} on the {K} computer},
        url = {https://juser.fz-juelich.de/record/279895},
        year = {2015}
      }
      
    10. “Performance Optimisation And Productivity: EU Centre of Excellence.” 2015. https://www.pop-coe.eu/.
      @misc{POP,
        title = {Performance Optimisation and Productivity: EU Centre of Excellence},
        url = {https://www.pop-coe.eu/},
        year = {2015}
      }
      
    11. “VI-HPS Tools Guide.” 2015. http://www.vi-hps.org/upload/material/general/ToolsGuide.pdf.
      @misc{VIHPS2015,
        title = {VI-HPS Tools Guide},
        url = {http://www.vi-hps.org/upload/material/general/ToolsGuide.pdf},
        month = oct,
        year = {2015}
      }
      
    12. Tsuji, Miwako, Mitsuhisa Sato, Maxime R. Hugues, and Serge G. Petiton. 2013. “Multiple-SPMD Programming Environment Based On PGAS and Workflow toward Post-Petascale Computing.” In 42nd International Conference On Parallel Processing, ICPP 2013, Lyon, France, October 1-4, 2013, 480–85. doi:10.1109/ICPP.2013.58.
      @inproceedings{TsujiEtAl2013,
        author = {Tsuji, Miwako and Sato, Mitsuhisa and Hugues, Maxime R. and Petiton, Serge G.},
        bibsource = {dblp computer science bibliography, http://dblp.org},
        biburl = {http://dblp.uni-trier.de/rec/bib/conf/icpp/TsujiSHP13},
        booktitle = {42nd International Conference on Parallel Processing, {ICPP} 2013,
            Lyon, France, October 1-4, 2013},
        crossref = {DBLP:conf/icpp/2013},
        doi = {10.1109/ICPP.2013.58},
        timestamp = {Tue, 02 Dec 2014 17:13:28 +0100},
        title = {Multiple-SPMD Programming Environment Based on {PGAS} and Workflow
            toward Post-petascale Computing},
        pages = {480--485},
        url = {http://dx.doi.org/10.1109/ICPP.2013.58},
        year = {2013}
      }
      
    13. Knüpfer, A., C. Rössel, D. an Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, et al. 2012. “Score-P: A Joint Performance Measurement Run-Time Infrastructure For Periscope, Scalasca, TAU, and Vampir.” In Tools For High Performance Computing 2011, Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing (Dresden, September 2011). doi:10.1007/978-3-642-31476-6_7.
      @inproceedings{KnuepferEtAl2012,
        author = {Kn{\"{u}}pfer, A. and R{\"{o}}ssel, C. and an Mey, D. and Biersdorff, S. and Diethelm, K. and Eschweiler, D. and Geimer, M. and Gerndt, M. and Lorenz, D. and Malony, A.D. and Nagel, W.E. and Oleynik, Y. and Philippen, P. and Saviankou, P. and Schmidl, D. and Shende, S.S. and Tsch{\"{u}}ter, R. and Wagner, M. and Wesarg, B. and Wolf, F.},
        booktitle = {Tools for High Performance Computing 2011, Proceedings of the 5th
            International Workshop on Parallel Tools for High Performance Computing (Dresden, September 2011)},
        cin = {JSC},
        cid = {I:(DE-Juel1)JSC-20090406},
        comment = {Tools for High Performance Computing 2011, Proceedings of the 5th International 
            Workshop on Parallel Tools for High Performance Computing, September 2011, Dresden},
        doi = {$10.1007/978-3-642-31476-6_7$},
        note = {Record converted from VDB: 12.11.2012},
        pid = {G:(DE-Juel1)FUEK411 / G:(DE-HGF)POF2-411},
        pnm = {Scientific Computing / 411 - Computational Science and Mathematical Methods 
            (POF2-411)},
        title = {Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,
            Scalasca, TAU, and Vampir},
        typ = {PUB:(DE-HGF)8 / PUB:(DE-HGF)7},
        url = {http://juser.fz-juelich.de/record/23267},
        year = {2012}
      }
      
    14. Geimer, Markus, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. “The Scalasca Performance Toolset Architecture.” Concurr. Comput. : Pract. Exper. 22 (6). John Wiley and Sons Ltd.: 702–19. doi:10.1002/cpe.v22:6.
      @article{GeimerEtAl2010,
        author = {Geimer, Markus and Wolf, Felix and Wylie, Brian J. N. and {\'{A}}brah{\'{a}}m, Erika and Becker, Daniel and Mohr, Bernd},
        acmid = {1753234},
        doi = {10.1002/cpe.v22:6},
        issn = {1532-0626},
        issue_date = {April 2010},
        journal = {Concurr. Comput. : Pract. Exper.},
        keywords = {parallel computing, performance analysis, scalability},
        month = apr,
        number = {6},
        numpages = {18},
        pages = {702--719},
        publisher = {John Wiley and Sons Ltd.},
        title = {The Scalasca Performance Toolset Architecture},
        url = {http://dx.doi.org/10.1002/cpe.v22:6},
        volume = {22},
        year = {2010}
      }
      
    15. Lee, Jinpil, and Mitsuhisa Sato. 2010. “Implementation And Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems.” In 39th International Conference On Parallel Processing, ICPP Workshops 2010, San Diego, California, USA, 13-16 September 2010, 413–20. doi:10.1109/ICPPW.2010.62.
      @inproceedings{LeeSato2010,
        author = {Lee, Jinpil and Sato, Mitsuhisa},
        booktitle = {39th International Conference on Parallel Processing, {ICPP} Workshops
             2010, San Diego, California, USA, 13-16 September 2010},
        bibsource = {dblp computer science bibliography, http://dblp.org},
        biburl = {http://dblp.uni-trier.de/rec/bib/conf/icppw/LeeS10},
        doi = {10.1109/ICPPW.2010.62},
        pages = {413--420},
        timestamp = {Fri, 25 Jul 2014 14:09:13 +0200},
        title = {Implementation and Performance Evaluation of XcalableMP: {A} Parallel
            Programming Language for Distributed Memory Systems},
        url = {http://dx.doi.org/10.1109/ICPPW.2010.62},
        year = {2010}
      }
      
    16. Frings, Wolfgang, Felix Wolf, and Ventsislav Petkov. 2009. “Scalable Massively Parallel I/O To Task-Local Files.” In Proceedings Of the Conference on High Performance Computing Networking, Storage and Analysis, 17:1–17:11. SC ’09. ACM. doi:10.1145/1654059.1654077.
      @inproceedings{FringsEtAl2009,
        acmid = {1654077},
        articleno = {17},
        author = {Frings, Wolfgang and Wolf, Felix and Petkov, Ventsislav},
        booktitle = {Proceedings of the Conference on High Performance Computing Networking, Storage and 
            Analysis},
        doi = {10.1145/1654059.1654077},
        isbn = {978-1-60558-744-8},
        location = {Portland, Oregon},
        numpages = {11},
        pages = {17:1--17:11},
        publisher = {ACM},
        series = {SC '09},
        title = {Scalable Massively Parallel I/O to Task-local Files},
        url = {http://doi.acm.org/10.1145/1654059.1654077},
        year = {2009}
      }