Shared memory architecture

Supercomputers from vendors such as Cray, NEC, and Eujitsu typically consist of between one and eight processors in a shared memory architecture. Peak vector speeds of over 1 GELOP (1000 MELOPS) per processor are now available. Main memories of 1 gigabyte (1000 megabytes) and more are also available. If multiple processors can be tied together to simultaneously work on one problem, substantially greater peak speeds are available. This situation will be further examined in the section on parallel computers. [Pg.91]

Due to the nature of FFT, or spectral transforms in general, parallelization of the solver is more suited to a shared memory architecture than to the Message Passing Interface approach. Further, since the most time-consuming portion of the code deals with calculating the RHS of the equation, we choose to apply a single threaded library for ODE solvers (gsl in our case) and only parallelize the calculation of derivatives needed in the ODE solver. [Pg.262]

In Figure 2.15, a single HCA is connected, which provides connectivity to the other nodes on the network. The node shown has a shared memory architecture in which all the sixteen processors have direct access to all the memory. In a shared memory architecture, data from the same memory can be read by several processors and, hence, may be stored in multiple caches if one of the processors then changes its data, the system may have two inconsistent copies of this data, resident in different cache entries. This situation is dealt with by cache-coherency support in hardware, which will invalidate data in other caches whenever the data in the local caches change. Such invalidation generates additional traffic on the network connecting the processors within the node, and this can impact the performance of the computer, especially when the number of processors in the node is large. [Pg.33]

Shared-memory architecture The processors and disks have access to a common memory, typically via a bus or through an intercormection network. [Pg.238]

FIGURE 12 A shared-memory architecture illustrated for a shuffle-exchange switching network. [Pg.89]

The most commercially successful of these systems has been the Convex series of computers. Ironically, these are traditional vector machines, with one to four processors and shared memory. Their Craylike characteristics were always a strong selling point. Interestingly, SCS, which marketed a minisupercomputer that was fully binary compatible with Cray, went out of business. Marketing appears to have played as much a role here as the inherent merits of the underlying architecture. [Pg.94]

Transputers. At higher levels of coimectedness there is a wide variety of parallel computers. A great many parallel computers have been built using INMOS Transputer chips. Individual Transputer chips mn at 2 MELOPS or greater. Transputer chips have four communication channels, so the chips can readily be intercoimected into a two-dimensional mesh network or into any other interconnection scheme where individual nodes are four-coimected. Most Transputer systems have been built as additions to existing host computers and are SIMD type. Each Transputer has a relatively small local memory as well as access to the host s memory through the interconnection network. Not surprisingly, problems that best utilize local memory tend to achieve better performance than those that make more frequent accesses to host memory. Systems that access fast local memory and slower shared memory are often referred to as NUMA, nonuniform memory access, architecture. [Pg.96]

The shared memory system is the most expensive of the four generalized architectures, with the global bus system coming in at close second. The fully interconnected system is about 5 times more cost-effective than a global bus approach for a 30-processor system however, the ring system is superior to all when the process is partitioned to take advantage of the unique bandwidth characteristics that a ring connected architecture provides. [Pg.252]

Shared-memory parallel processing was certainly more successful for QC in earlier applications and continues to play a significant role in high performance computational chemistry. A coarse-grained parallel implementation scheme for the direct SCF method by Liithi et al. allowed for a near-asymptotic speed-up involving a very low parallelization overhead without compromising the vector performance of vector-parallel architectures. [Pg.247]

Nowadays, immersive multi-screen displays hke CAVEs are driven by off-the-shelf PC clusters with consumer graphics cards instead of multipipe, shared memory graphics computers. This reduces the costs for IVR infrastructure dramatically. VR toolkits supporting PC clusters must inherently have a distributed software architecture, and data synchronization is an issue in such frameworks. Besides a client-server approach, where the scenegraph is distributed over the cluster nodes, a master-slave approach is most often... [Pg.286]

Shared memory computers in which all processors have equal access to all memory in the system are referred to as symmetric multiprocessors (SMP), and may also be called uniform memory access (UMA) computers. In the node shown in Figure 2.15, references to memory may need to pass through one, two, or three crossbar switches, depending on where the referenced memory is located. Thus, this node technically has a nonuniform memory access (NUMA) architecture, and, since the node is cache-coherent, this architecture is called ccNUMA. However, since the crossbar switches in the quad-core AMD Opteron implementation of ccNUMA exhibit high performance, this particular node would typically be considered to be an SMP. [Pg.33]

Teodosiu, D., J. Baxter, K. Govil, J. Chapin, M. Rosenblum, and M. Horowitz. 1997. Hardware fault containment in scalable shared-memory multiprocessors. Pp. 73-84 in Proceedings of the 24 International Symposium on Computer Architecture, Denver, Colo., June 2-4, 1997. Washington, D.C. IEEE Computer Society Press. [Pg.18]

Shared-disk architecture All processors can access all disks directly via an interconnection network, but the processors have private memories. Shared-disk systems are also called clusters. [Pg.238]

Shared-nothing architecture The processors share neither a common memory nor a common disk. Each node is independent and acts like a server for the disk it owns. [Pg.238]

Many companies and parallel processing architectures have come and gone since 1958, but the most popular parallel computer in the twenty-first century consists of multiple nodes connected by a high-speed bus or network, where each node contains many processors connected by shared memory or a high-speed bus or network, and each processor is either pipelined or multi-core. [Pg.1409]

The processing node of a cluster incorporates all of the facilities and functionality necessary to perform a complete computation. Nodes are most often structured either as uniprocessor systems or as SMPs although some clusters, especially constellations, have incorporated nodes that were distributed shared memory (DSM) systems. Nodes are distinguished by the architecture of... [Pg.5]

Shared memory A memory that is directly accessed by more than one node of a concurrent processor. This is an important architectural feature of many supercomputer designs. [Pg.79]

We recognize that it is difficult if not impossible to scale shared memory to hundreds or thousands of cores. One possibility is to address scalability through clustering, where cores are divided into shared memory clusters (current architectures use 4/8/16 cores per cluster), with clusters communicating through a network on chip. Scalability is dependent on data placement if the problem is such that the compiler can partition the data into the local memory of each core, the number of cores can actually increase. The communication can be exphcit or transparent (depending on design decisions, and the availability of tools to partition the data set and create the required communication patterns). [Pg.205]

We close this subsection with a discussion for the parallelization of the PT2 approach. The data communications with respect to the wave functions can be avoided in PT2 since the set of the unperturbed wave functions is not updated throughout the simulation provided that the molecular geometry of the solute is fixed. The QM/MM simulation based on PT2 is, therefore, quite amenable to parallelization both on the shared and distributed memory architectures. For the shared-memory machines the double loops associated with the calculations of and Fp defined, respectively, by Eqs. (6.38) and (6.42) are the major targets for the thread parallel which can be readily implemented using standard OMP directives. In the MPI parallelization, we first divide the zeroth-order wave functions either in the orbital space or in the real space, and, distribute the portions to computational nodes that constitute a parallel machine. Once the wave functions are assigned to the nodes before simulation, one only needs to reduce at each MD step the scalar values evaluated individually on the nodes to the master node to construct the complete values for eP and Fp It is expected that high parallel efiftciency can be achieved with modest additional effort for programming. [Pg.187]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...