Parallel programs performance modeling

When developing, parallelizing or porting an MD program, performance models can be of great help in understanding how a particular algorithm or implementation behaves and where the major performance bottlenecks are located in the code [23,39]. [Pg.241]

To cope with the bewildering variety of software approaches and hardware platforms it is beneficial to make use of models for these different aspects. Thus, we need to detail programming models, parallel computer models and performance models. Designing these types of models are large subjects in themselves [12,22,23]. For the case of MD we need to select the models that fit our needs best and put them to work. [Pg.237]

In this chapter we will consider issues pertaining to parallel performance modeling. We first introduce some network performance characteristics for parallel computers that must be considered when modeling parallel performance. We then present several performance measures for parallel programs, and we discuss how to develop a performance model for a parallel algorithm. Finally, we will discuss how to evaluate performance data and illustrate how reported performance data can be potentially misleading. [Pg.71]

The network performance characteristics for a parallel computer may greatly influence the performance that can be obtained with a parallel application. The latency and bandwidth are among the most important performance characteristics because their values determine the communication overhead for a parallel program. Let us consider how to determine these parameters and how to use them in performance modeling. To model the communication time required for a parallel program, one first needs a model for the time required to send a message between two processes. For most purposes, this time can... [Pg.71]

The values for these machine-specific parameters can be somewhat dependent on the application. For example, the flop rate can vary significantly depending on the type of operations performed. The accuracy of a performance model may be improved by using values for the machine-specific parameters that are obtained for the type of application in question, and the use of such empirical data can also simplify performance modeling. Thus, if specific, well-defined types of operations are to be performed in a parallel program (for instance, certain collective communication operations or specific computational tasks), simple test programs using these types of operations can be written to provide the appropriate values for the pertinent performance parameters. We will show examples of the determination of application specific values for a, and y in section 5.3.2. [Pg.81]

The presence of nonuniform computational tasks whose sizes are not known in advance is often the cause of load imbalance, and quantitative modeling of load imbalance can therefore be difficult to do. However, simulations that involve distribution of nonuniform tasks can sometimes provide empirical data that can be used to model load imbalance. In chapter 7 we will illustrate the use of empirical data to model load imbalance in the computation of the two-electron integrals, which is a required step in many quantum chemical methods. Parallel programs involving uniform computational tasks will experience load imbalance whenever the number of tasks is not a multiple of the number of processes. This kind of load imbalance is easier to include in a performance model because the amount of work assigned to the process... [Pg.82]

In the previous sections of this chapter we discussed how to do performance modeling for parallel programs, and we will here briefly consider a few important points to keep in mind when presenting performance data for a parallel algorithm or evaluating performance data reported in the literature. [Pg.86]

An important, basic step performed in most quantum chemistry programs is the computation of fhe fwo-elecfron integrals. Schemes for parallel computa-fion of fhese infegrals and defailed performance models incorporating load imbalance are discussed. [Pg.225]

These steps are rarely performed in such rigid lock-step manner but all of the actions described must be accomplished prior to successful completion of executing a real parallel problem on a cluster. The effectiveness achieved in programming a cluster is difficult to measure (although some metrics have been devised to this end). Nonetheless, the ease of parallel programming is strongly influenced by the execution model assumed and the tools available to assist in the process. [Pg.9]

The use of enhanced parallel programming models such as this one proposed for Ada, will allow for the compiler (with optional parameters/annotations provided by the programmer) to generate the task graphs (similar to the Parallel Control Flow Graphs [14]) which can be used to perform the required schedulability analysis [15]. [Pg.205]

Thus, even a small fraction of serial code may dramatically reduce the maximum attainable speedup (e.g., / = I /20 limits the maximum speedup to 20). The serial fraction of code can be made essentially zero for quantum chemistry programs developed specifically for parallel machines, even for modestly sized molecules. Once the sequential sections of code have been practically eliminated, load imbalance and communication overhead become the obstacles preventing perfect speedup. In this case, a more sophisticated performance model is needed to predict the actual speedup. [Pg.1992]

Here we have briefly discussed performance characteristics of parallel computers and presented performance models for a few of the classic quantum chemistry algorithms as implemented on these machines. It is our hope that this will lx)th elucidate the programming of parallel computers and serve as a guide to understanding the performance of new algorithms on parallel machines. [Pg.1999]

Artificial neural networks (ANNs) are computer programs designed to model the relationships between independent and dependent variables. They are based on the attempt to model the neural networks of the brain [50], Functions are performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks to which various units are assigned. [Pg.1016]

Furthermore, hardware like multiprocessor workstations, which provide near-supercomputer performance within the UPSM programming model, are becoming available from several vendors (see chapter appendix). These machines are capable of exploiting the shared-memory parallelism that is already represented in code libraries such as LAPACK. Another important positive sign is that issues of scalable library construction have become more visible—for example, as an IEEE-sponsored workshop. " Such efforts, combined with the availability of software like ScaLAPACK as seed code, may well serve to crystallize the development of common data layout and program structure conventions. [Pg.235]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...