Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

MEMORY, MICROPROCESSOR, and ASIC phần 8 pot
Nội dung xem thử
Mô tả chi tiết
11-8 Memory, Microprocessor, and ASIC
back to memory. The memory system is constructed of basic semiconductor DRAM units called
modules or banks.
There are several properties of memory, including speed, capacity, and cost, that play an important
role in the overall system performance. The speed of a memory system is the key performance parameter
in the design of the microprocessor system. The latency (L) of the memory is defined as the time delay
from when the processor first requests data from memory until the processor receives the data. Bandwidth
is defined as the rate which information can be transferred from the memory system. Memory bandwidth
and latency are related to the number of outstanding requests (R) that the memory system can service:
(11.4)
Bandwidth plays an important role in keeping the processor busy with work. However, technology
trade-offs to optimize latency and improve bandwidth often conflict with the need to increase the
capacity and reduce the cost of the memory system.
Cache Memory
Cache memory, or simply cache, is a small, fast memory constructed using semiconductor SRAM. In
modern computer systems, there is usually a hierarchy of cache memories. The top-level cache is
closest to the processor and the bottom level is closest to the main memory. Each higher level cache is
about 5 to 10 times faster than the next level. The purpose of a cache hierarchy is to satisfy most of the
processor memory accesses in one or a small number of clock cycles. The top-level cache is often split
into an instruction cache and a data cache to allow the processor to perform simultaneous accesses for
instructions and data. Cache memories were first used in the IBM mainframe computers in the 1960s.
Since 1985, cache memories have become a standard feature for virtually all microprocessors.
Cache memories exploit the principle of locality of reference. This principle dictates that some
memory locations are referenced more frequently than others, based on two program properties. Spatial
locality is the property that an access to a memory location increases the probability that the nearby
memory location will also be accessed. Spatial locality is predominantly based on sequential access to
program code and structured data. Temporal locality is the property that access to a memory location greatly
increases the probability that the same location will be accessed in the near future. Together, the two
properties ensure that most memory references will be satisfied by the cache memory.
There are several different cache memory designs: direct-mapped, fully associative, and set-associative.
Figure 11.6 illustrates the two basic schemes of cache memory: direct-mapped and set-associative.
Direct-mapped cache, shown in Fig. 11.6(a), allows each memory block to have one place to reside
within a cache. Fully associative cache, shown in Fig. 11.6(b), allows a block to be placed anywhere in
the cache. Set associative cache restricts a block to a limited set of places in the cache.
Cache misses are said to occur when the data requested does not reside in any of the possible cache
locations. Misses in caches can be classified into three categories: conflict, compulsory, and capacity.
Conflict misses are misses that would not occur for fully associative caches with least recently used
(LRU) replacement. Compulsory misses are misses required in cache memories for initially referencing
a memory location. Capacity misses occur when the cache size is not sufficient to contain data between
references. Complete cache miss definitions are provided in Ref. 4.
Unlike memory system properties, the latency in cache memories is not fixed and depends on the
delay and frequency of cache misses. A performance metric that accounts for the penalty of cache
misses is effective latency. Effective latency depends on the two possible latencies: hit latency (LHIT),
the latency experienced for accessing data residing in the cache, and miss latency (LMISS), the
latency experienced when accessing data not residing in the cache. Effective latency also depends
on the hit rate (H), the percentage of memory accesses that are hits in the cache, and the miss rate (M
or 1–H), the percentage of memory accesses that miss in the cache. Effective latency in a cache system
is calculated as:
Architecture 11-9
(11.5)
In addition to the base cache design and size issues, there are several other cache parameters that affect
the overall cache performance and miss rate in a system. The main memory update method indicates
when the main memory will be updated by store operations. In write-through cache, each write is
immediately reflected to the main memory. In write-back cache, the writes are reflected to the main
memory only when the respective cache block is replaced. Cache block allocation is another parameter
and designates whether the cache block is allocated on writes or reads. Last, block replacement
algorithms for associative structures can be designed in various ways to extract additional cache
performance. These include least recently used (LRU), least frequently used (LFU), random, and firstin, first-out (FIFO). These cache management strategies attempt to exploit the properties of locality.
Spatial locality is exploited by deciding which memory block is placed in cache, and temporal locality
is exploited by deciding which cache block is replaced. Traditionally, when cache service misses, they
would block all new requests. However, non-blocking cache can be designed to service multiple miss
requests simultaneously, thus alleviating delay in accessing memory data.
In addition to the multiple levels of cache hierarchy, additional memory buffers can be used to
improve cache performance. Two such buffers are a streaming/prefetch buffer and a victim cache.2
Figure 11.7 illustrates the relation of the streaming buffer and victim cache to the primary cache of a
memory system. A streaming buffer is used as a prefetching mechanism for cache misses. When a cache
miss occurs, the streaming buffer begins prefetching successive lines starting at the miss target. A victim
cache is typically a small, fully associative cache loaded only with cache lines that are removed from the
primary cache. In the case of a miss in the primary cache, the victim cache may hold additional data.
The use of a victim cache can improve performance by reducing the number of conflict misses. Figure
11.7 illustrates how cache accesses are processed through the streaming buffer into the primary cache
on cache requests, and from the primary cache through the victim cache to the secondary level of
memory on cache misses.
Overall, cache memory is constructed to hold the most important portions of memory. Techniques
using either hardware or software can be used to select which portions of main memory to store in
cache. However, cache performance is strongly influenced by program behavior and numerous hardware
design alternatives.
FIGURE 11.6 Cache memory: (a) direct-mapped design, (b) two-way set-associative design.
11-10 Memory, Microprocessor, and ASIC
Virtual Memory
Cache memory illustrated the principle that the memory address of data can be separate from a particular
storage location. Similar address abstractions exist in the two-level memory hierarchy of main memory and
disk storage. An address generated by a program is called a virtual address, which needs to be translated into
a physical address or location in main memory. Virtual memory management is a mechanism which provides
the programmers with a simple, uniform method to access both main and secondary memories. With
virtual memory management, the programmers are given a virtual space to hold all the instructions and
data. The virtual space is organized as a linear array of locations. Each location has an address for convenient access. Instructions and data have to be stored somewhere in the real system; these virtual space
locations must correspond to some physical locations in the main and secondary memory. Virtual memory
management assigns (or maps) the virtual space locations into the main and secondary memory locations.
The mapping of virtual space locations to the main and secondary memory is managed by the virtual
memory management. The programmers are not concerned with the mapping.
The most popular memory management scheme today is demand paging virtual memory management,
where each virtual space is divided into pages indexed by the page number (PN). Each page consists
of several consecutive locations in the virtual space indexed by the page index (PI). The number of
locations in each page is an important system design parameter called page size. Page size is usually
defined as a power of two so that the virtual space can be divided into an integer number of pages.
Pages are the basic unit of virtual memory management. If any location in a page is assigned to the main
memory, the other locations in that page are also assigned to the main memory. This reduces the size of
the mapping information.
The part of the secondary memory to accommodate pages of the virtual space is called the swap
space. Both the main memory and the swap space are divided into page frames. Each page frame can
host a page of the virtual space. If a page is mapped into the main memory, it is also hosted by a page
frame in the main memory. The mapping record in the virtual memory management keeps track of the
association between pages and page frames.
When a virtual space location is requested, the virtual memory management looks up the mapping
record. If the mapping record shows that the page containing requested virtual space location is in
main memory, the management performs the access without any further complication. Otherwise, a
secondary memory access has to be performed. Accessing the secondary memory is usually a complicated
task and is usually performed as an operating system service. In order to access a piece of information
stored in the secondary memory, an operating system service usually has to be requested to transfer the
information into the main memory. This also applies to virtual memory management. When a page is
mapped into the secondary memory, the virtual memory management has to request a service in the
operating system to transfer the requested virtual space location into the main memory, update its
FIGURE 11.7 Advanced cache memory system.
Architecture 11-11
mapping record, and then perform the access. The operating system service thus performed is called
the page fault handler.
The core process of virtual memory management is a memory access algorithm. A one-level virtual
address translation algorithm is illustrated in Fig. 11.8. At the start of the translation, the memory access
algorithm receives a virtual address in a memory address register (MAR), looks up the mapping record,
requests an operating system service to transfer the required page if necessary, and performs the main
memory access. The mapping is recorded in a data structure called the Page Table located in main
memory at a designated location marked by the page table base register (PTBR).
The page table index and the PTBR form the physical address (PAPTE) of the respective page
table entry. Each PTE keeps track of the mapping of a page in the virtual space. It includes two fields:
a hit/miss bit and a page frame number. If the hit/miss (H/M) bit is set (hit), the corresponding page
is in main memory. In this case, the page frame hosting the requested page is pointed to by the page
frame number (PFN). The final physical address (PAD) of the requested data is then formed using the
PFN and PI. The data is returned and placed in the memory buffer register (MBR) and the processor
is informed of the completed memory access. Otherwise (miss), a secondary memory access has to be
performed. In this case, the page frame number should be ignored. The fault handler has to be invoked
to access the secondary memory. The hardware component that performs the address translation
algorithm is called the memory management unit (MMU).
The complexity of the algorithm depends on the mapping structure. A very simple mapping structure
is used in this section to focus on the basic principles of the memory access algorithms. However, more
complex two-level schemes are often used due to the size of the virtual address space. The size of the
page table designated may be quite large for a range of main memory sizes. As such, it becomes
necessary to map portions of page table into a second page table. In such designs, only the second-level
page table is stored in a reserved region of main memory, while the first page table is mapped just like
the data in the virtual spaces. There are also requirements for such designs in a multiprogramming
system, where there are multiple processes active at the same time. Each processor has its own virtual
space and therefore its own page table. As a result, these systems need to keep multiple page tables at
the same time. It usually take too much main memory to accommodate all the active page tables. Again,
the natural solution to this problem is to provide other levels of mapping.
FIGURE 11.8 Virtual memory translation.