HKHLR Advances Scientific Progress With TotalView Multi-Site Licenses
Hesse is home to two of the world’s 500 fastest high performance computing clusters, which are available to researchers from five major Hessian universities across various disciplines. These researchers are experts in their own fields, but are typically not experts in computer science, requiring support and training to effectively leverage the power of HPC.
The Hessian Competence Center for High Performance Computing (HKHLR) supports university researchers in the state who are striving for the efficient and sustainable use of modern HPC systems. HKHLR was founded by the universities of Darmstadt, Frankfurt, Giessen, Kassel, and Marburg, and is funded by the Hessen State Ministry of Higher Education, Research and the Arts. HKHLR offers a variety of different services to aid researchers, including courses and workshops for beginning and advanced HPC users.
TotalView Helps HKHLR…
Simplify debugging across distributed applications.
Establish a higher quality of support and education.
Reduce time to solution and fewer installation issues.
Bugs Throttle Scientific Progress; Clean Code Advances It
Accuracy is critical to any research project. For researchers making use of high performance computing, clean code is critical to achieving accuracy, which means adopting the right dynamic analysis tool to identify any bugs that can throttle scientific progress.
“Debuggers are like fire extinguishers. As long as there is no fire, you don’t need them. But if there is a fire, you want the best.”
Dr. Iwainsky, HKHLR HPC Expert
Whichever debugging solution HKHLR adopted, it had to support a diverse user base. Hessian researchers run the gauntlet of programming expertise. Some are just starting out, while others are advanced users with years of experience. Researchers also span an array of academic fields and use a variety of programming languages, including C++ and Python.
“Many of our users are used to debugging with printfs, which can be time-consuming. We wanted to use something more sophisticated and powerful.”
HKHLR found its “fire extinguisher” in TotalView. Leveraging TotalView allows for faster fault isolation, improved memory optimization, and dynamic visualization for high-scale HPC applications. Hesse’s researchers can simultaneously debug many processes and threads in a single window to get complete control over program execution: running, stepping, and halting line-by-line through code within a single thread or arbitrary groups of processes or threads. Users can also work backwards from failure through reverse debugging, isolating the root cause faster by eliminating repeated restarts of the application.
TotalView supports debugging of many programming languages, including C++ and Python. Whether researchers are experienced or novice programmers, TotalView finds errors quickly, validates prototypes, verifies calculations, and certifies code correctness.
Shared Licenses Make Support and Training Easy
HKHLR considered other debugging products, but TotalView offered several strengths over the competition, including reverse debugging and a single license agreement model for all Hessian universities. Two universities initially adopted TotalView, but HKHLR wanted to deploy the solution to more universities with larger HPC systems. TotalView is now the only officially supported debugger at the central computing installations at all universities participating in HKHLR.
“Having shared licenses enables us to share this technology across sites, forming a uniform platform, which is easier to support than locally-changing technologies.”
Dr. Sternel, General Manager of HKHLR
Using just one industry-leading debugger across all sites makes it easier to train users, as HKHLR’s trainers do not have to be experts on many different debugging platforms. HKHLR’s trainers are themselves trained on the TotalView platform by TotalView experts in a “train the trainer” program. Coupling a leading debugging solution with effective education has significantly expedited researchers’ ability to find and resolve coding issues.
Multi-site licensing offers a few other benefits over traditional team licensing. Unlike the team license, the shared license is installed and configured on the HPC cluster, reducing installation time. Licensing and maintenance costs are also lower compared to the team model. If HKHLR wants to expand the solution to more universities, the licensing model allows the user base to be increased. The license also allows HKHLR to quickly get access to all new releases and software improvements.
High performance computing is a major driver of scientific innovation. Hessian researchers are now seeing that clean code plays a major role as well. “By combining a leadership-grade commercial debugger with high-quality training workshops, Hesse’s scientists have become more aware of the importance of stable code in academic research,” says Dr. Sternel. For HKHLR, this education is the top outcome of the TotalView project.
“Learning how to debug is an investment in the future health of your code. With the floating pool of licenses usable by any university, plus the training program, HPC experts aren’t the only ones who can use this tool. All of our researchers can use it in their daily work to find problems in their codes.” - Dr. Iwainsky
Quickly Solve Your Debugging Challenges
TotalView helps organizations like HKHLR easily scale and deploy a solution that allows you to intuitively diagnose and understand complex code. See for yourself how TotalView will help you do the same.