MEMORY, MICROPROCESSOR, and ASIC phần 8 pot

11-8 Memory, Microprocessor, and ASIC

back to memory. The memory system is constructed of basic semiconductor DRAM units called

modules or banks.

There are several properties of memory, including speed, capacity, and cost, that play an important

role in the overall system performance. The speed of a memory system is the key performance parameter

in the design of the microprocessor system. The latency (L) of the memory is defined as the time delay

from when the processor first requests data from memory until the processor receives the data. Bandwidth

is defined as the rate which information can be transferred from the memory system. Memory bandwidth

and latency are related to the number of outstanding requests (R) that the memory system can service:

(11.4)

Bandwidth plays an important role in keeping the processor busy with work. However, technology

trade-offs to optimize latency and improve bandwidth often conflict with the need to increase the

capacity and reduce the cost of the memory system.

Cache Memory

Cache memory, or simply cache, is a small, fast memory constructed using semiconductor SRAM. In

modern computer systems, there is usually a hierarchy of cache memories. The top-level cache is

closest to the processor and the bottom level is closest to the main memory. Each higher level cache is

about 5 to 10 times faster than the next level. The purpose of a cache hierarchy is to satisfy most of the

processor memory accesses in one or a small number of clock cycles. The top-level cache is often split

into an instruction cache and a data cache to allow the processor to perform simultaneous accesses for

instructions and data. Cache memories were first used in the IBM mainframe computers in the 1960s.

Since 1985, cache memories have become a standard feature for virtually all microprocessors.

Cache memories exploit the principle of locality of reference. This principle dictates that some

memory locations are referenced more frequently than others, based on two program properties. Spatial

locality is the property that an access to a memory location increases the probability that the nearby

memory location will also be accessed. Spatial locality is predominantly based on sequential access to

program code and structured data. Temporal locality is the property that access to a memory location greatly

increases the probability that the same location will be accessed in the near future. Together, the two

properties ensure that most memory references will be satisfied by the cache memory.

There are several different cache memory designs: direct-mapped, fully associative, and set-associative.

Figure 11.6 illustrates the two basic schemes of cache memory: direct-mapped and set-associative.

Direct-mapped cache, shown in Fig. 11.6(a), allows each memory block to have one place to reside

within a cache. Fully associative cache, shown in Fig. 11.6(b), allows a block to be placed anywhere in

the cache. Set associative cache restricts a block to a limited set of places in the cache.

Cache misses are said to occur when the data requested does not reside in any of the possible cache

locations. Misses in caches can be classified into three categories: conflict, compulsory, and capacity.

Conflict misses are misses that would not occur for fully associative caches with least recently used

(LRU) replacement. Compulsory misses are misses required in cache memories for initially referencing

a memory location. Capacity misses occur when the cache size is not sufficient to contain data between

references. Complete cache miss definitions are provided in Ref. 4.

Unlike memory system properties, the latency in cache memories is not fixed and depends on the

delay and frequency of cache misses. A performance metric that accounts for the penalty of cache

misses is effective latency. Effective latency depends on the two possible latencies: hit latency (LHIT),

the latency experienced for accessing data residing in the cache, and miss latency (LMISS), the

latency experienced when accessing data not residing in the cache. Effective latency also depends

on the hit rate (H), the percentage of memory accesses that are hits in the cache, and the miss rate (M

or 1–H), the percentage of memory accesses that miss in the cache. Effective latency in a cache system

is calculated as:

Architecture 11-9

(11.5)

In addition to the base cache design and size issues, there are several other cache parameters that affect

the overall cache performance and miss rate in a system. The main memory update method indicates

when the main memory will be updated by store operations. In write-through cache, each write is

immediately reflected to the main memory. In write-back cache, the writes are reflected to the main

memory only when the respective cache block is replaced. Cache block allocation is another parameter

and designates whether the cache block is allocated on writes or reads. Last, block replacement

algorithms for associative structures can be designed in various ways to extract additional cache

performance. These include least recently used (LRU), least frequently used (LFU), random, and firstin, first-out (FIFO). These cache management strategies attempt to exploit the properties of locality.

Spatial locality is exploited by deciding which memory block is placed in cache, and temporal locality

is exploited by deciding which cache block is replaced. Traditionally, when cache service misses, they

would block all new requests. However, non-blocking cache can be designed to service multiple miss

requests simultaneously, thus alleviating delay in accessing memory data.

In addition to the multiple levels of cache hierarchy, additional memory buffers can be used to

improve cache performance. Two such buffers are a streaming/prefetch buffer and a victim cache.2

Figure 11.7 illustrates the relation of the streaming buffer and victim cache to the primary cache of a

memory system. A streaming buffer is used as a prefetching mechanism for cache misses. When a cache

miss occurs, the streaming buffer begins prefetching successive lines starting at the miss target. A victim

cache is typically a small, fully associative cache loaded only with cache lines that are removed from the

primary cache. In the case of a miss in the primary cache, the victim cache may hold additional data.

The use of a victim cache can improve performance by reducing the number of conflict misses. Figure

11.7 illustrates how cache accesses are processed through the streaming buffer into the primary cache

on cache requests, and from the primary cache through the victim cache to the secondary level of

memory on cache misses.

Overall, cache memory is constructed to hold the most important portions of memory. Techniques

using either hardware or software can be used to select which portions of main memory to store in

cache. However, cache performance is strongly influenced by program behavior and numerous hardware

design alternatives.

FIGURE 11.6 Cache memory: (a) direct-mapped design, (b) two-way set-associative design.

11-10 Memory, Microprocessor, and ASIC

Virtual Memory

Cache memory illustrated the principle that the memory address of data can be separate from a particular

storage location. Similar address abstractions exist in the two-level memory hierarchy of main memory and

disk storage. An address generated by a program is called a virtual address, which needs to be translated into

a physical address or location in main memory. Virtual memory management is a mechanism which provides

the programmers with a simple, uniform method to access both main and secondary memories. With

virtual memory management, the programmers are given a virtual space to hold all the instructions and

data. The virtual space is organized as a linear array of locations. Each location has an address for convenient access. Instructions and data have to be stored somewhere in the real system; these virtual space

locations must correspond to some physical locations in the main and secondary memory. Virtual memory

management assigns (or maps) the virtual space locations into the main and secondary memory locations.

The mapping of virtual space locations to the main and secondary memory is managed by the virtual

memory management. The programmers are not concerned with the mapping.

The most popular memory management scheme today is demand paging virtual memory management,

where each virtual space is divided into pages indexed by the page number (PN). Each page consists

of several consecutive locations in the virtual space indexed by the page index (PI). The number of

locations in each page is an important system design parameter called page size. Page size is usually

defined as a power of two so that the virtual space can be divided into an integer number of pages.

Pages are the basic unit of virtual memory management. If any location in a page is assigned to the main

memory, the other locations in that page are also assigned to the main memory. This reduces the size of

the mapping information.

The part of the secondary memory to accommodate pages of the virtual space is called the swap

space. Both the main memory and the swap space are divided into page frames. Each page frame can

host a page of the virtual space. If a page is mapped into the main memory, it is also hosted by a page

frame in the main memory. The mapping record in the virtual memory management keeps track of the

association between pages and page frames.

When a virtual space location is requested, the virtual memory management looks up the mapping

record. If the mapping record shows that the page containing requested virtual space location is in

main memory, the management performs the access without any further complication. Otherwise, a

secondary memory access has to be performed. Accessing the secondary memory is usually a complicated

task and is usually performed as an operating system service. In order to access a piece of information

stored in the secondary memory, an operating system service usually has to be requested to transfer the

information into the main memory. This also applies to virtual memory management. When a page is

mapped into the secondary memory, the virtual memory management has to request a service in the

operating system to transfer the requested virtual space location into the main memory, update its

FIGURE 11.7 Advanced cache memory system.

Architecture 11-11

mapping record, and then perform the access. The operating system service thus performed is called

the page fault handler.

The core process of virtual memory management is a memory access algorithm. A one-level virtual

address translation algorithm is illustrated in Fig. 11.8. At the start of the translation, the memory access

algorithm receives a virtual address in a memory address register (MAR), looks up the mapping record,

requests an operating system service to transfer the required page if necessary, and performs the main

memory access. The mapping is recorded in a data structure called the Page Table located in main

memory at a designated location marked by the page table base register (PTBR).

The page table index and the PTBR form the physical address (PAPTE) of the respective page

table entry. Each PTE keeps track of the mapping of a page in the virtual space. It includes two fields:

a hit/miss bit and a page frame number. If the hit/miss (H/M) bit is set (hit), the corresponding page

is in main memory. In this case, the page frame hosting the requested page is pointed to by the page

frame number (PFN). The final physical address (PAD) of the requested data is then formed using the

PFN and PI. The data is returned and placed in the memory buffer register (MBR) and the processor

is informed of the completed memory access. Otherwise (miss), a secondary memory access has to be

performed. In this case, the page frame number should be ignored. The fault handler has to be invoked

to access the secondary memory. The hardware component that performs the address translation

algorithm is called the memory management unit (MMU).

The complexity of the algorithm depends on the mapping structure. A very simple mapping structure

is used in this section to focus on the basic principles of the memory access algorithms. However, more

complex two-level schemes are often used due to the size of the virtual address space. The size of the

page table designated may be quite large for a range of main memory sizes. As such, it becomes

necessary to map portions of page table into a second page table. In such designs, only the second-level

page table is stored in a reserved region of main memory, while the first page table is mapped just like

the data in the virtual spaces. There are also requirements for such designs in a multiprogramming

system, where there are multiple processes active at the same time. Each processor has its own virtual

space and therefore its own page table. As a result, these systems need to keep multiple page tables at

the same time. It usually take too much main memory to accommodate all the active page tables. Again,

the natural solution to this problem is to provide other levels of mapping.

FIGURE 11.8 Virtual memory translation.

Thư viện tri thức trực tuyến

Tài liệu đang bị lỗi

MEMORY, MICROPROCESSOR, and ASIC phần 8 pot

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

MEMORY, MICROPROCESSOR, and ASIC phần 4 ppsx

MEMORY, MICROPROCESSOR, and ASIC phần 6 pot

MEMORY, MICROPROCESSOR, and ASIC phần 5 potx

MEMORY, MICROPROCESSOR, and ASIC phần 2 pot

MEMORY, MICROPROCESSOR, and ASIC phần 1 pps

MEMORY, MICROPROCESSOR, and ASIC phần 9 pps