Memory latency

Keywords 2.5-D integration, crossbar, Rambus DRAM, reconfigurable data-path, microprocessor, memory, latency. [Pg.42]

Compared with bandwidth, memory latency is improved at an even slower pace. The classical way to hide the excessive latency is to place frequently used... [Pg.56]

The DRAM is also assumed to be built in a 0.13 pm process. The memory bus between L2 cache and DRAM is placed in the middle of the L2 cache, which is marked as red rectangles in Fig. 3.10. The DRAM will be placed on the top of the microprocessor in a way illustrated by Fig. 3.11. As for the main memory, we selected a high-end, Rambus DRAM with a clock of 1 GHz. Since the CPU clock has a frequency of 4 GHz, the access cycle time of the main memory is 4 CPU cycles. In the remaining part of this section, the word cycle always refers to a CPU cycle. On the other hand, the memory latency value is usually determined by the specific machine configuration and typically values are in the range of 100 cycles to 500 cycles (e.g., [20]). Because our target microprocessor is very aggressively clocked, we assume a memory latency of 400 cycles. [Pg.59]

With the stacked DRAM, the off-chip delay can be almost completely removed since no off-chip wires and I/O pads are necessary. Of course, it still too early to use an exact latency value for the stacked DRAM at the present due to the inviolability of technology parameters. Assuming a 400-cycle off-chip memory latency, we set the upper bound latency (pessimistic estimation) of stacked DRAM is 300 cycles, while a lower bound latency (optimistic estimation) is 200 cycles. [Pg.61]

With the above model, we could derive the distribution of CPI metric with regard to the L2 cache miss rate and the main memory latency. The memory latency is measured in the number of CPU clock cycles. [Pg.65]

Figure 3.14 CPI with regard to mam memory latency and L2 cache miss rate see colour plate)...

Detailed Performance Simulation for Reduced Memory Latency... [Pg.66]

We then performed detailed instruction simulation on the benchmarks listed in Table 3.3. We differentiate three configurations (1) a baseline 2-D microprocessor with off-chip main memory (Memory latency = 400 cycles), (2) a 2.5-D integrated... [Pg.66]

IXBT.com. Two methods for measuring memory latency on Intel Pentium 4 platform in RightMark memory analyzer—how to choose the right one http //www.digit-hfe.com/ articles2/cpu/rmma-p4-latency.html. [Pg.73]

We presented a novel software-only, lightweight, adaptive preexecution algorithm, which—when applied to a data parallel, data-intensive application—masks, memory latency by bringing data into the cache of the CPU at the right time. The model we advocate attacks memory latency at a high level and does not require detailed knowledge and profiling of the applications. [Pg.37]

Luk C-K (2001) Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. ACM SIGARCH Comput Architect News 29(2) 40-51... [Pg.37]

Memory latency is a major problem for high-performance architectures. The disparity in on-chip to off-chip access latencies has prompted the suggestion (Hwang, 1993) that there are four complementary approaches to latency hiding ... [Pg.2012]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...