Presenting and Evaluating Performance Data A Few Caveats

In the previous sections of this chapter we discussed how to do performance modeling for parallel programs, and we will here briefly consider a few important points to keep in mind when presenting performance data for a parallel algorithm or evaluating performance data reported in the literature. [Pg.86]

A number of ways to report misleading parallel performance data have been discussed elsewhere/ including how to boost performance data by comparing with code that is nonoptimal in a number of ways. Performance data are most often presented in the form of speedup curves, and it is therefore important to ascertain that the presented speedups are, in fact, representative of the typical parallel performance of the algorithm. Below we will discuss a couple of commonly encoimtered practices for presenting speedups that can lead to misrepresentation of performance data. [Pg.87]

We have used expressions involving the latency, a, and inverse bandwidth, /3, to model the communication time. An alternative model, the Hockney model, is sometimes used for the communication time in a parallel algorithm. The Hockney model expresses the time required to send a message between two processes in terms of the parameters Too and ni, which represent the asymptotic bandwidth and the message length for which half of the asymptotic bandwidth is attained, respectively. Metrics other than the speedup and efficiency are used in parallel computing. One such metric is the Karp-Flatt metric, also referred to as the experimentally determined serial fraction. This metric is intended to be used in addition to the speedup and efficiency, and it is easily computed. The Karp-Flatt metric can provide information on parallel performance characteristics that caimot be obtained from the speedup and efficiency, for instance, whether degrading parallel performance is caused by incomplete parallelization or by other factors such as load imbalance and communication overhead. [Pg.90]

Johnson. Computers and Intractability A Guide to the Theory of NP-Completeness, chapter 1. New York W. H. Freeman and Company, 1979. [Pg.90]

A Linux duster cxjnsisting of nodes with two single-core 3.06 GHz Intel Xeon processors (each with 512 KiB of L2 cache) connected via a 4x Single Data Rate InfiniBand network with a full fat tree topology. [Pg.91]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...