GEM Overview


GEM is a graphical front end for In-situ Partial Order (ISP), a dynamic formal verification tool for MPI C/C++ programs, developed at the University of Utah. Here we describe ISP in detail to provide a better understanding of how it works.

ISP can be used by anyone who can write simple MPI C/C++ programs, and requires no special training. ISP allows you to formally verify your MPI C/C++ programs automatically without any extra efforts on your part (apart from compiling and making your examples) and flags the following errors:

In addition, it helps you understand as well as step through all relevant process interleavings (schedules). Notice our use of the word relevant: even a short MPI program may have too many (an "exponential number") of interleavings. For example, an MPI program with five processes, each containing five MPI calls, can have well in excess of 1000 interleavings. However, ISP generates a new interleaving only when it represents a truly new (as yet unexamined) behavior of your program.

As examples:

How ISP Works:

ISP works by intercepting the MPI calls made by the target program and making decisions on when to send these MPI calls to the MPI library. This is accomplished by the two main components of ISP: the Profiler and the Scheduler. An overview of ISP’s components and their interaction with the program as well as the MPI library is provided in Figure 1 below.


Figure 1

Profiler:

The interception of MPI calls is accomplished by compiling the ISP Profiler together with the target program’s source code. The Profiler makes use of MPI’s profiling mechanism (PMPI). Then the Profiler provides its own version of MPI f (replacing the original) for each corresponding MPI function f. Within each of these MPI f , the profiler communicates with the scheduler using TCP sockets to send information about the MPI call the process wants to make. It will then wait for the scheduler to make a decision on whether to send the MPI call to the MPI library or to postpone it until later. When the permission to fire f is given from the scheduler, the corresponding PMPI f will be issued to the MPI run-time. Since all MPI libraries come with functions such as PMPI f for every MPI function f, this approach provides a portable and light-weight instrumentation mechanism for MPI programs being verified [6].

Scheduler:

The ISP Scheduler helps carry out the POE algorithm. As it turns out, the POE algorithm is based on exploiting MPI’s out-of-order completion semantics. In other words,

Description of POE Algorithm:

POE stands for “POR with out-of-order and elusive interleavings"

MPI programs suffer from “elusive" interleavings in the presence of MPI wildcard Receive R(*) . Issuing the MPI sends in different orders does not help as the MPI runtime provides no guarantees based on when the Send is actually issued. POE employs dynamic source re-writing in place of wildcard so that every interleaving explored by POE algorithm is deterministic. Also, the presence of collectives such as MPI\_Barrier can cause traditional runtime verification techniques such as “Dynamic Partial Order Reduction" to fail since it is not always possible to issue instructions in different orders in the presence of collectives. POE also addresses this issue by creating “match sets".

POE algorithm works by creating a graph structure with the MPI operations of processes as nodes and by adding Intra-Completes-Before (IntraCB) edges between these nodes. As the name suggests, IntraCB edges are only added between the MPI operations within the MPI processes. The IntraCB edges are added between the MPI operations based on the MPI semantics of the operations. MPI functions like MPI_Barriers, MPI_Wait etc have the semantics, that no later MPI operations can be issued until these MPI operations complete. We call such operations as fence operations. Note that instructions issued before the fence operations can linger even after the fence operation is complete.

The IntraCB edges are added based on the following rules: Let i and j be the MPI operations of a process P such that i < j (i.e., i comes before j in the program order. There is an IntraCB edge between i and j is one of the following is true:

If there is a path from i to j then i is called the ancestor of j. POE algorithm uses these IntraCB edges to form match-sets to be issued into the MPI runtime. The following is the (informal) algorithm:

An MPI operation is “matched" if it has been issued to the MPI runtime by the POE algorithm.

  1. Run the MPI processes until they all reach fence operations.
  2. Let e denote the MPI operations of all processes such that for each ie, all ancestors of i have been matched.
  3. Let m denote the match-sets formed out of e as follows:
  4. Issue the match sets to MPI runtime in the following priority order: C-Match > W-Match > SR-Match > SR*-Match.
  5. If any of the processes issued a fence operation then goto step 1. Else, goto step 4.
  6. If there are no match sets and all processes have issued MPI_Finalize, check if any wildcard match sets are yet to be explored. If yes, then restart the processes and goto step 1, otherwise, exit.
  7. If there are no match sets and there is atleast one process that has not issued MPI_Finalize, then there is a deadlock. Flag a deadlock and kill all the MPI processes.

Obtaining ISP and Examples

ISP can be downloaded from the Gauss Group’s ISP Release Page.

What you will receive are the full sources as well as the following examples in the directory isp-tests:

Source Code Analyzer

This is a unique user interface which visually displays the output that ISP generated by highlighting lines in the source file. It shows both the current MPI call, and any matching point-to-point or collective operation. It also allows the user to examine MPI calls for a particular rank with an easy to use Rank Locking feature.

The source code analyzer also allows examination of flagged errors with one click. The Examine Error feature will highlight the problem line in the source code within the Eclipse editor. In addition, it offers features to browse calls by rank and by interleaving as well as listing any resource leaks found in the source code. One click on the flagged leak in the Resource Leak Browser takes the user to the suspect line of code in the Eclipse editor.

Why the Trident?

Because It is a universal symbol for "slaying the evil bug"!

 


References

 


 

Back to Top | Back to Table of Contents

 


School of Computing * 50 S. Central Campus Dr. Rm. 3190 * Salt Lake City, UT 84112 * isp-dev@cs.utah.edu
License