DOESciDAC ReviewOffice of Science
News
Fault Tolerance Research
PNNL to Lead Three-Year Research Project

Current and future extreme-scale computers are composed of millions of components that make hardware faults likely while running applications. ASCR, under the FASTOS-2 program, has funded the SFT-2 Scalable Fault Tolerance for Petascale Computers research to address these faults. The three-year project, led by Pacific Northwest National Laboratory (PNNL) in collaboration with Ohio State University and Oak Ridge National Laboratory, will develop fault tolerant technologies for global address space programming models using virtualization and fault resilient runtime for malleable applications. This will complement other efforts that focus on MPI applications. This project follows on from a previous SFT-1 project that also focused on fault tolerance.