Shared memory

As noted above, one of the goals of NAMD 2 is to take advantage of clusters of symmetric multiprocessor workstations and other non-uniform memory access platforms. This can be achieved in the current design by allowing multiple compute objects to run concurrently on different processors via kernel-level threads. Because compute objects interact in a controlled manner with patches, access controls need only be applied to a small number of structures such as force and energy accumulators. A shared memory environment will therefore contribute almost no parallel overhead and generate communication equal to that of a single-processor node. [Pg.480]

Supercomputers from vendors such as Cray, NEC, and Eujitsu typically consist of between one and eight processors in a shared memory architecture. Peak vector speeds of over 1 GELOP (1000 MELOPS) per processor are now available. Main memories of 1 gigabyte (1000 megabytes) and more are also available. If multiple processors can be tied together to simultaneously work on one problem, substantially greater peak speeds are available. This situation will be further examined in the section on parallel computers. [Pg.91]

The most commercially successful of these systems has been the Convex series of computers. Ironically, these are traditional vector machines, with one to four processors and shared memory. Their Craylike characteristics were always a strong selling point. Interestingly, SCS, which marketed a minisupercomputer that was fully binary compatible with Cray, went out of business. Marketing appears to have played as much a role here as the inherent merits of the underlying architecture. [Pg.94]

MIMD Multicomputers. Probably the most widely available parallel computers are the shared-memory multiprocessor MIMD machines. Examples include the multiprocessor vector supercomputers, IBM mainframes, VAX minicomputers. Convex and AUiant rninisupercomputers, and SiUcon... [Pg.95]

This decreasing efficiency is a general characteristic of shared memory, shared bus computers. This example shows unusually high efficiency compared with many other programs. This may be because LINPACK is such a common benchmark that much effort has been devoted to optimising it for both vector and parallel computers. [Pg.96]

Transputers. At higher levels of coimectedness there is a wide variety of parallel computers. A great many parallel computers have been built using INMOS Transputer chips. Individual Transputer chips mn at 2 MELOPS or greater. Transputer chips have four communication channels, so the chips can readily be intercoimected into a two-dimensional mesh network or into any other interconnection scheme where individual nodes are four-coimected. Most Transputer systems have been built as additions to existing host computers and are SIMD type. Each Transputer has a relatively small local memory as well as access to the host s memory through the interconnection network. Not surprisingly, problems that best utilize local memory tend to achieve better performance than those that make more frequent accesses to host memory. Systems that access fast local memory and slower shared memory are often referred to as NUMA, nonuniform memory access, architecture. [Pg.96]

To address the shortcoming of the simple MPI model, Medvedev and coworkers [40] developed a hybrid OpenMP/MPI method that takes advantage of both distributed and shared memory features of these clusters of multiprocessor nodes. The features of this model are ... [Pg.30]

The shared memory OpenMP library is used for parallelization within each node. The evaluation of the action of potential energy, rotational kinetic energy, and T2 kinetic energy are local to each node. These local calculations are performed with the help of a task farm. Each thread dynamically obtains a triple (/2, il, ir) of radial indices and performs evaluation of first the kinetic energy and then the potential energy contribution to hps local( , i2, il, ir) for all rotational indices. [Pg.32]

LES/FDF-approach. An In situ Adaptive Tabulation (ISAT) technique (due to Pope) was used to greatly reduce (by a factor of 5) the CPU time needed to solve the set of stiff differential equations describing the fast LDPE kinetics. Fig. 17 shows some of the results of interest the occurrence of hot spots in the tubular LDPE reactor provided with some feed pipe through which the initiator (peroxide) is supplied. The 2004-simulations were carried out on 34 CPU s (3 GHz) with 34 GB shared memory, but still required 34 h per macroflow time scale they served as a demo of the method. The 2006-simulations then demonstrated the impact of installing mixing promoters and of varying the inlet temperature of the initiator added. [Pg.215]

In working through process control examples, we found that many calculations, data checks, rate checks and other computationally intensive tasks are done at the first level of inference. Considerations of computational efficiency led to a design utilizing two parallel processors with a shared memory (Figure 1). One of the processors is a 68010 programmed in C code. This processor performs computationally intensive, low level tasks which are directed by the expert system in the LISP processor. [Pg.71]

Figure 1. Design for the LMI system for process control using two parallel processors with a shared memory.

Communication with the simulation tool should be done by bit or word oriented shared memory areas. Complex technological systems for real operations consist of several physical components, controllers, sensors and actuators, which define their behaviour. Information runs between these parts via links with bar systems. Different types of sensors, actuators and other micro machines are available on the market - corresponding to real sensors like light bars, distance accelerators and other instruments. The positioning can be done graphically or textually (Figure 7). [Pg.389]

J. Nieplocha, R.J. Harrison and R.J. Littlefield, Global arrays A portable "shared-memory" programming model for distributed memory computers, in Supercomputing 94 (Washington D.C., 1994). [Pg.113]

Due to the nature of FFT, or spectral transforms in general, parallelization of the solver is more suited to a shared memory architecture than to the Message Passing Interface approach. Further, since the most time-consuming portion of the code deals with calculating the RHS of the equation, we choose to apply a single threaded library for ODE solvers (gsl in our case) and only parallelize the calculation of derivatives needed in the ODE solver. [Pg.262]

System Interconnect Reliability From the standpoint of reliability, the shared memory system in the global bus both have problems in the area of single-point failures If a failure of the bus or the central memory occurs, the entire system is incapacitated A ring system, when bypass hardware is employed, demonstrates very good fault tolerant characteristics. [Pg.250]

System Interconnect Expandability From the standpoint of expansion limitations, the shared memory system has problems in that the number of ports are fixed. Expanders can be used to alleviate this problem to some degree, but physical construction problems are ultimately met. Also, the memory bandwidth of the shared memory system is fixed and is relatively slow, thus limiting the degree of practical expansion. [Pg.250]

The shared memory system is the most expensive of the four generalized architectures, with the global bus system coming in at close second. The fully interconnected system is about 5 times more cost-effective than a global bus approach for a 30-processor system however, the ring system is superior to all when the process is partitioned to take advantage of the unique bandwidth characteristics that a ring connected architecture provides. [Pg.252]

Lee KW, Nakamura T, One T, Yamada Y, Mizukusa T, Hasimoto H, Park KT, Kurino H, Koyanagi M. Three dimensional shared memory fabricated using wafer stacking technology. Digest of International Electron Device Meeting 2000. p 165-168. [Pg.460]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...