layout | title | comments |
---|---|---|
template1 |
Correctness 2018 |
false |
Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis
Ensuring correctness in high-performance computing (HPC) applications is one of the fundamental challenges that the HPC community faces today. While significant advances in verification, testing, and debugging have been made to isolate software errors (or defects) in the context of non-HPC software, several factors make achieving correctness in HPC applications and systems much more challenging than in general systems software—growing heterogeneity (architectures with CPUs, GPUs, and special purpose accelerators), massive scale computations (very high degree of concurrency), use of combined parallel programing models (e.g., MPI+X), new scalable numerical algorithms (e.g., to leverage reduced precision in floating-point arithmetic), and aggressive compiler optimizations/transformations are some of the challenges that make correctness harder in HPC. The following report lays out the key challenges and research areas of HPC correctness: DOE Report of the HPC Correctness Summit.
As the complexity of future architectures, algorithms, and applications in HPC increases, the ability to fully exploit exascale systems will be limited without correctness. With the continuous use of HPC software to advance scientific and technological capabilities, novel techniques and practical tools for software correctness in HPC are invaluable.
The goal of the Correctness Workshop is to bring together researchers and developers to present and discuss novel ideas to address the problem of correctness in HPC. The workshop will feature contributed papers and invited talks in this area.
Topics of interest include, but are not limited to:
- Formal methods and rigorous mathematical techniques for correctness in HPC applications
- Frameworks to address the challenges of testing complex HPC applications (e.g., multiphysics applications)
- Approaches for the specification of numerical algorithms with the goal of correctness checking
- Error identification in the design and implementation of numerical algorithms using finite-precision floating point numbers
- Tools to control the effect of non-determinism when debugging and testing HPC software
- Scalable debugging solutions for large-scale HPC applications
- Scalable tools for model checking, verification, certification, or symbolic execution
- Static and dynamic analysis to test and check correctness in the entire HPC software ecosystem
- Predictive debugging and testing approaches to forecast the occurrence of errors in specific conditions
- Machine learning and anomaly detection for bug detection and localization
- Correctness in emerging HPC programing models
- Analysis of software error propagation and error handling in HPC runtime systems and libraries
- Metrics to measure the degree of correctness of HPC software
- Specifications to check the correctness of runtime systems
- Large databases of bug reports and/or reproducible test cases of HPC software
- Benchmarks to test the effectiveness of HPC correctness tools
Authors are invited to submit manuscripts in English structured as technical or experience papers not exceeding 8 pages of content, including everything. Submissions must use the IEEE format.
Submitted papers must represent original unpublished research that is not currently under review for any other venue. Papers not following these guidelines will be rejected without review. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information. Papers should be submitted electronically at: https://submissions.supercomputing.org/?page=Submit&id=SC18WorkshopCorrectness2018Submission&site=sc18.
We encourage authors to submit an optional artifact description (AD) appendix along with their paper, describing the details of their software environments and computational experiments to the extent that an independent person could replicate their results. The AD appendix is not included in the 8-page limit of the paper and should not exceed 2 pages of content. For more details of the SC Reproducibility Initiative please see: https://sc18.supercomputing.org/submit/sc-reproducibility-initiative/.
The proceedings will be archived in IEEE Xplore via TCHPC.
Due to several requests, we have extended the submission deadline to Aug/20.
- Paper submissions due:
August 10, 2018Extended: August 17, 2018Last extendend deadline: August 20, 2018 - Notification of acceptance: September 21, 2018
- E-copyright registration completed by authors (firm):
October 3, 2018October 15, 2018October 19, 2018 - Camera-ready papers due (firm):
October 8, 2018October 15, 2018October 19, 2018
All time zones are AOE.
Ignacio Laguna, LLNL
Cindy Rubio-González, UC Davis
Eva Darulova, MPI-SWS, Germany
Ganesh Gopalakrishnan, University of Utah, USA
Paul Hovland, Argonne National Laboratory, USA
Geoff Hulette, Sandia National Laboratories, USA
Costin Iancu, Lawrence Berkeley National Laboratory, USA
Sriram Krishnamoorthy, Pacific Northwest National Laboratory, USA
Richard Lethin, Reservoir Labs, Yale University, USA
Francesco Logozzo, Facebook Research, USA
Jackson Mayo, Sandia National Laboratories, USA
John Mellor-Crummey, Rice University, USA
Matthias Müller, RWTH Aachen University, Germany
Tristan Ravitch, Galois, Inc, USA
Nathalie Revol, INRIA - ENS de Lyon, France
Markus Schordan, Lawrence Livermore National Laboratory, USA
Koushik Sen, UC Berkeley, USA
Stephen Siegel, University of Delaware, USA
Kay Bailey Hutchison Convention Center
650 S Griffin St, Dallas, TX 75202
Room: D171 (Level 1)
Keynote Speaker 1 (Morning Session)
Ganesh L. Gopalakrishnan, Professor, School of Computing, University of Utah
Bio: Ganesh L. Gopalakrishnan (Senior Member of IEEE and ACM Distinguished Scientist) earned his B.Sc.(EE) degree from NIT Calicut in 1978, M.Tech (EE) from IIT Kanpur in 1980, and PhD in Computer Science from Stony Brook University in 1986, when he joined the University of Utah.
His external engagements include Visiting Assistant Professorship at the University of Calgary (1988-89) and sabbatical visits at Stanford University (1995-96), Intel (2002-03), and sabbatical projects with Microsoft on developing parallel computing curriculum (2009-10). He has authored a 2006 Springer book "Computation Engineering: Applied Automata Theory and Logic," and is finishing a 2018 textbook using Jupyter notebooks in undergraduate Discrete Math and Automata Theory classes.
He has published over 180 refereed papers, and has graduated 21 PhD students. He is serving as the Director of the Center for Parallel Computing at Utah ("CPU"). He was awarded one of the six "Beacons of Excellence" Awards for 2012 by the University of Utah for his work on mentoring undergraduates.
His currently active projects are: Verification Methods and Tool Frameworks for Parallel and Concurrent Systems, Formal Techniques for System Resilience, Floating-point Precision Tuning, and Data Race Checking for OpenMP and GPUs. His current research grants and contracts are from NSF (CISE) and DOE (in collaboration with LLNL and PNNL).
#### *Making Formal Methods for HPC Disappear*
Abstract: Formal methods include rigorous specification methods that can render language standards reliable and unambiguous. They also include rigorous testing methods that target well-specified coverage criteria, and formal concepts that help guide debugging tool implementations. Those who say formal methods don't apply to HPC probably misunderstand formal methods to be some esoteric diversion, and not as a software productivity booster in the sense we describe.
Undoubtedly, HPC correctness is far too complex: there are the accidentally flipping bits, unpredictable floating point rounding, threads that incur a data race, and capricious compilers whose optimizations change results. All these can severely impact overall productivity. Formal approaches are possible for many of these pursuits, while for others they may emerge if one persists, and if there is a community invested in developing them in the long run. A worthwhile direction is to invest in formal methods based pedagogy: not only does this help buy us some time to develop useful formal methods for HPC, but it also gives some hope to save future generations from today's debugging drudgery. Today's parallel computing education still only gives lip service to correctness - let alone formal.
My talk will try and present examples of all of this. We will present examples where formal ideas did transition to production-level data race checking tools. I will also present examples where we finished production-level tools for floating-point non-reproducibility in the field, and hope to backfill the formalism eventually.
Eventually, as Rushby says, Formal Methods must "disappear" - be incorporated into standard practice and we don't see them.
Keynote Speaker 2 (Afternoon Session)
James Demmel, Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics, University of California at Berkeley
Bio: James Demmel is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics at the University of California at Berkeley, and Chair of the EECS Dept. His research is in numerical linear algebra, HPC, and communication avoiding algorithms. He is known for his work on the LAPACK and ScaLAPACK linear algebra libraries. He is a member of the NAS, NAE, and American Academy of Arts and Sciences; a Fellow of the AAAS, ACM, AMS, IEEE and SIAM; and winner of the IPDPS Charles Babbage Award, IEEE Computer Society Sidney Fernbach Award, the ACM Paris Kanellakis Award, and numerous best paper prizes.
#### *Correctness of Floating Point Programs - Exception Handling and Reproducibility*
Abstract: We consider two related aspects of analyzing and guaranteeing correctness
of floating point programs: exception handling and reproducibility.
Exception handling refers to reliable and consistent propagation of errors due
to overflow, invalid operations (like sqrt(-1)
), convergence failures, etc.
Reproducibility refers to getting bitwise reproducible results from multiple
runs of the same program, e.g., despite parallelism causing floating point
sums to be evaluated in different order with different roundoff errors.
We describe the efforts of two standards committees, the Basic Linear
Algebra Subprograms (BLAS) Standard, and the IEEE 754 Floating Point
Standard, to address these issues, and how these efforts should make it
easier to accomplish these goals for higher level applications, such as
linear algebra libraries.
09:00am - 09:05am: Opening remarks | |
09:05am - 10:00am: Keynote Speaker 1: "Making Formal Methods for HPC Disappear", Ganesh L. Gopalakrishnan (slides) |
10:00am - 10:30am: Break (coffee provided by SC18) |
10:30am - 10:50am: "Hybrid Theorem Proving as a Lightweight Method for Verifying Numerical Software", Alper Altuntas, John Baugh (slides) | |
10:50am - 11:10am: "HPC Software Verification in Action: A Case Study with Tensor Transposition", Erdal Mutlu, Ajay Panyala, Sriram Krishnamoorthy (slides) |
11:10am - 11:30am: "Correctness of Dynamic Dependence Analysis for Implicitly Parallel Tasking Systems", Wonchan Lee, George Stelle, Patrick McCormick, Alex Aiken (slides) | |
11:30am - 11:50am: "Verifying Qthreads: Is Model Checking Viable for User Level Tasking Runtimes?", Noah Evans (slides) |
11:50am - 12:10pm: "Incremental Static Race Detection in OpenMP Programs", Bradley Swain, Jeff Huang (slides) | |
12:10pm - 12:30pm: "Using Polyhedral Analysis to Verify OpenMP Applications are Data Race Free", Fangke Ye, Markus Schordan, Chunhua Liao, Pei-Hung Lin, Ian Karlin, Vivek Sarkar (slides) |
12:30pm - 02:00pm: Lunch (on your own) |
02:00pm - 03:00pm: Keynote Speaker 2: "Correctness of Floating Point Programs - Exception Handling and Reproducibility", James Demmel (slides) |
03:00pm - 03:30pm: Break (coffee provided by SC18) |
03:30pm - 03:50pm: "Compiler-Aided Type Tracking for Correctness Checking of MPI Applications", Alexander Hück, Jan-Patrick Lehr, Sebastian Kreutzer, Joachim Protze, Christian Terboven, Christian Bischof, Matthias S. Müller (slides) | |
03:50pm - 04:10pm: "Towards Deductive Verification of Message-Passing Parallel Programs", Ziqing Luo, Stephen F. Siegel (slides) | |
04:10pm - 04:30pm: "PARCOACH Extension for a Full-Interprocedural Collectives Verification", Pierre Huchant, Emmanuelle Saillard, Denis Barthou, Hugo Brunie, Patrick Carribault (slides) |
04:30pm - 05:30pm: Panel on: Facilitating the Adoption of Correctness Tools in HPC Applications | |
Panelist 1: Alper Altuntas, National Center for Atmospheric Research | |
Panelist 2: Ganesh Gopalakrishnan, University of Utah | |
Panelist 3: Mike Lam, James Madison University | |
Panelist 4: Markus Schordan, LLNL |
Please address workshop questions to Ignacio Laguna ([email protected]) and/or Cindy Rubio-González ([email protected]).