image-blog-tv-mpmd-debugging.jpg.jpeg
August 21, 2023

MPMD Debugging with TotalView

Debugging Best Practices

In this blog post we will look at how to debug a simple MPMD application with TotalView. The program is an MPI application with 2 executables exec1 and exec2. The exec1 program sends a message to exec2 which outputs the message. Both programs are run using MPI and 2 processes each.

Back to top

What is MPMD ?

MPMD (Multiple Program Multiple Data) is a high level programming model where tasks execute different programs simultaneously, and all tasks may use different data. MPMD programs are not as common as SPMD (Single Program Multiple Data) applications but may be better suited for certain types of problems.  

Back to top

About TotalView

TotalView is a high performance debugging tool with support for multi-threaded and multi-process applications. TotalView can be used from the user interface or from the command line interface, and supports local as well as remote debugging. 

Back to top

Requirements

This blog post uses TotalView 2023.2, Ubuntu Linux 20.02 and OpenMPI.

Steps 

  1. Start with two example MPI applications exec1.cpp and exec2.cpp:

exec1.cpp

MPMD Blog Image 1

 

exec2.cpp

MPMD Blog Image 2

 

  1. Compile the exec1.cpp and exec2.cpp programs using the mpiCC compiler with debugging enabled:

$mpiCC -g exec1.cpp -o exec1 

$mpiCC -g exec2.cpp -o exec2

  1. Start the examples with mpirun to check they are working correctly. In this example we use 4 MPI ranks in total, 2 MPI ranks for exec1 and 2 MPI ranks for exec2:

$ mpirun  -np 2 exec1 : -np 2 exec2

MPMD Blog Image 3

 

  1. Start the MPMD program under TotalView’s control:  

$totalview -args mpirun  -np 2 exec1 : -np 2 exec2

Step 4 DMDP Debugging

 

  1. Press GO to start debugging.
Step 5 DMDP Debugging

 

  1. TotalView asks if you want stop the job now. Select yes.
  2. Add a breakpoint at line 15 in the exec1.cpp file. 
Step 6 DMDP Debugging

 

  1. Select process rank 2 or rank 3 in the Processes and Threads window. TotalView displays the exec2.cpp file. Add a breakpoint at line 15 in the exec2.cpp file. 
Step 8 DMDP Debugging

 

  1. Press GO to run all 4 processes. The Processes and Threads window shows that MPI ranks 0 and 1 have stopped at line 15 in exec1.cpp and MPI ranks 2 and 3 have stopped at line 15 in exec2.cpp.
Step 9 DMDP Debugging

 

  1. Change the focus to exec1.cpp by double clicking on process rank 0 or 1 in the Processes and Threads window. From the drop down list select Group (Share). This allows TotalView to control only the MPI ranks associated with the current share group (exec1).
Step 10 DMDP Debugging

 

  1. Press GO to run MPI ranks 0 and 1. The Processes and Threads window updates to show that MPI ranks 0 and 1 are running, and MPI ranks 2 and 3 are at a breakpoint.
Step 11 DMDP Debugging

 

  1. Change the focus to MPI rank 2 or 3 by double clicking the process in the Processes and Threads window. Press GO to run MPI ranks 2 and 3. The program runs and the TotalView Input/Output window displays: “Executable 2 received the following message: Hello from Executable 1 !”
Step 12 MDMP Debugging Blog

 

There is more information about how TotalView uses share groups and what happens to other share groups when you step a specific share group in the TotalView User Guide

Back to top

Next Steps & Additional Resources

In this blog post we have seen how we can run MPMD programs under TotalView’s control. You can try TotalView for free and test our debugging capabilities for yourself, today.

TRY TOTALVIEW

For additional reading and resources check out our resources page.

Resources

Back to top