Nonuniform Memory Access NUMA

NUMA is one of the fundamental concepts needed to understand the design of a parallel software application. Every modern computer has several levels of memory, and parallel computers tend to have more levels than uniprocessors. Typical memory levels in a parallel computer include the processor registers, local cache memory, local main memory, and remote memory. If the parallel computer supports virtual memory, local and remote disk are added to this hierarchy. These levels vary in size, speed, and method of access. In this chapter, we will lump all these differences under the general term nonuniform memory access (NUMA). Note that this is a broader use of the term than is often found in computer science literature, where NUMA often refers only to differences in the speed with which given memory items can be accessed using the same method. In our use, memory access is often synonymous with data transfer. [Pg.213]

In designing a complex parallel program, it is useful to think of any parallel computer in terms of a NUMA model in which all memory, anywhere in the system, can be accessed by any processor as needed. One can then focus independently on the questions of memory access methods (which determine coding style for an application) and memory access costs (which ultimately determine program efficiency). [Pg.213]

The choice of memory access methods, which is determined by the programming model and tools, is subject to great variation. This topic is discussed in a later section that deals with models and tools. Typical memory access [Pg.213]

Memory access costs are determined by the interaction between program structure and the performance characteristics of the computer system. Understanding these interactions at a high level of detail is typically complicated. However, a useful rough approximation of the memory access cost for a program can be obtained by modeling each transfer between memory levels as a fixed start-up cost (S) plus an incremental transfer cost per data unit (X). For example, the cost for a processor to fetch L data units at once from a remote memory into its local memory, cache, and registers can be modeled as follows [Pg.214]

Usually the start-up cost is large compared to the transfer cost for a single data unit, and both costs increase sharply (order of magnitude) for each level in the memory hierarchy. Because parallel computers have more levels than sequential ones, these transfer costs are significantly more intrusive on parallel computers and are evident in the performance of parallel applications. [Pg.214]

Shared memory computers in which all processors have equal access to all memory in the system are referred to as symmetric multiprocessors (SMP), and may also be called uniform memory access (UMA) computers. In the node shown in Figure 2.15, references to memory may need to pass through one, two, or three crossbar switches, depending on where the referenced memory is located. Thus, this node technically has a nonuniform memory access (NUMA) architecture, and, since the node is cache-coherent, this architecture is called ccNUMA. However, since the crossbar switches in the quad-core AMD Opteron implementation of ccNUMA exhibit high performance, this particular node would typically be considered to be an SMP. [Pg.33]

Transputers. At higher levels of coimectedness there is a wide variety of parallel computers. A great many parallel computers have been built using INMOS Transputer chips. Individual Transputer chips mn at 2 MELOPS or greater. Transputer chips have four communication channels, so the chips can readily be intercoimected into a two-dimensional mesh network or into any other interconnection scheme where individual nodes are four-coimected. Most Transputer systems have been built as additions to existing host computers and are SIMD type. Each Transputer has a relatively small local memory as well as access to the host s memory through the interconnection network. Not surprisingly, problems that best utilize local memory tend to achieve better performance than those that make more frequent accesses to host memory. Systems that access fast local memory and slower shared memory are often referred to as NUMA, nonuniform memory access, architecture. [Pg.96]

NUMA Nonuniform memory access describing a configuration in which locations in memory may vary in the cost and mechanism of access. [Pg.286]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...