Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

MEMORY, MICROPROCESSOR, and ASIC phần 9 pps
Nội dung xem thử
Mô tả chi tiết
13-1
13
Logic Synthesis for
Field Programmable
Gate Array (FPGA)
Technology
13.1 Introduction 13–1
13.2 FPGA Structures 13–2
Look-up Table (LUT)-Based CLB • PLA-Based CLB •
Multiplexer-Based CLB • Interconnect
13.3 Logic Synthesis 13–4
Technology Independent Optimization • Technology
Mapping
13.4 Look-up Table (LUT) Synthesis 13–6
Library-Based Mapping • Direct Approaches
13.5 Chortle 13–7
Tree Mapping Algorithm • Example • Chortle-crf • Chortle-d
13.6 Two-Step Approaches 13–12
First Step: Decomposition • Second Step: Node Elimination
MIS-pga 2: A Framework for TLU-Logic Optimization
13.7 Conclusion 13–16
13.1 Introduction
Field Programmable Gate Arrays (FPGAs) enable rapid development and implementation of complex
digital circuits. FPGA devices can be reprogrammed and reused, allowing the same hardware to be
employed for entirely new designs or for new iterations of the same design. While much of traditional
IC logic synthesis methods apply, FPGA circuits have special requirements that affect synthesis.
The FPGA device consists of a number of configurable logic blocks (CLBs) interconnected by a
routing matrix. Pass transistors are used in the routing matrix to connect segments of metal lines. There
are three major types of CLBs: those based on PLAs, those based on multiplexers, and those based on
table lookup (TLU) functions.
Automated logic synthesis tools are used to optimize the mapping of the Boolean network to the
FPGA device. FPGA synthesis is an extension to the general problem of multi-level logic synthesis.
FPGA logic synthesis is usually solved in two phases. The technology-independent phase uses a general
multi-level logic optimization tool (such as Berkeley’s MIS) to reduce the complexity of the Boolean
network. Next, a technology-dependent optimization phase is used to optimize the logic for the particular
type of device. In the case of the TLU-based FPGA, each CLB can implement an arbitrary logic
John W.Lockwood
Washington University
0–8493–1737–1/03/$0.00+$ 1.50
© 2003 by CRC Press LLC
13-2 Memory, Microprocessor, and ASIC
function of a limited number of variables. FPGA optimization algorithms aim to minimize the number
of CLBs used, the logic depth, and the routing density.
The Chortle algorithm is a direct method that uses dynamic programming to map the logic into
TLU-based CLBs. It converts the Boolean network into a forest of directed acyclic graphs (DAGs);
then it evaluates and records the optimal subsolutions to the logic mapping problem as it traverses the
DAG. The two-step algorithms operate by first decomposing the nodes, and then performing a node
elimination. Later sections of this chapter discuss in detail the Xmap, Hydra, and MIS-pga algorithms.
FPGA devices are fabricated using the same sub-micron geometries as other silicon devices. As
such, the devices benefit from the rapid advances in device-technology. The overhead of the programming
bits, general function generators, and general routing structures, however, reduce the total amount of
logic available to the end user.
13.2 FPGA Structures
An FPGA consists of reconfigurable logic elements, flip-flops, and a reprogrammable interconnect
structure. The logic elements are typically arranged in a matrix. The interconnect is arranged as a mesh
of variable-length metal wires and pass transistors to interconnect the logic elements. The logic elements
are programmed by downloading binary control information from an external ROM, a build-in EPROM,
or a host processor. After download, the control information is stored on the device and used to
determine the function of the logic elements and the state of the pass transistors. Unlike a PLA, the
FPGA can be used for multi-level logic functions.
The granularity of an FPGA refers to the complexity of the individual logic elements. A fine-grain
logic block appears to the user to be much like a standard mask-programmable gate array. Each logic
block consists of only a few transistors, and is limited to implementing only simple functions of a few
variables. A course-grain logic block (such as those from Xilinx, Actel, Quicklogic, and Altera) provides
more general functions of a larger number of variables. Each Xilinx 4000-series logic block, for example,
can implement any Boolean function of five variables, or two Boolean functions of four variables.
It has been found that the course-grain logic blocks generally provide better performance than the
fine-grain logic blocks, as the course-grained devices require less space for interconnect and routing by
combining multiple logic functions into one logic block. In particular, it has been shown that a fourinput logic block uses the minimal chip area for a large variety of benchmark circuits.1 The expense of
a few extra underutilized logic blocks outweighs the area required for the larger number of finegrained logic blocks and their associated larger interconnect matrix and pass transistors. This chapter
focuses on the logic synthesis for course-grained logic elements.
A course-grained configurable logic block (CLB) can be implemented using a PLA-based AND/
OR elements, multiplexers, or SRAM-based table look-up (LUT) elements. These configurations are
described below in detail.
13.2.1 Look-up Table (LUT)-Based CLB
The basic unit of look-up table (LUT)-based FPGAs is the configurable logic block (CLB), implemented
as an SRAM of size 2n× 1. Each CLB can implement any arbitrary logic function of n variables, for a
total of 2n functions.
An example of an LUT-based FPGA is the Xilinx 4000-series FPGA, as illustrated in Fig. 13.1. Each
CLB has three LUT generators and two flip-flops.2 The first two LUTs implement any function of four
variables, while the third LUT implements any function of three variables. Separately, each CLB can
implement two functions of four variables. Combined, each CLB can implement any one function of
five variables, or some restricted functions of nine variables (such as AND, OR, XOR).
Logic Synthesis for Field Programmable Gate Array (FPGA) Technology 13-3
13.2.2 PLA-Based CLB
PLA-based FPGA devices evolved from the traditional PLDs. Each basic logic block is an AND-OR
block consisting of wide fan-in AND gates feeding a few-input OR gate. The advantage of this
structure is that many logic functions can be implemented using only a few levels of logic, due of the
large number of literals that can be used at each block. It is, however, difficult to make efficient use of
all inputs to all gates. Even so, the amount of wasted area is minimized by the high packing density of
the wired-AND gates.
To further improve the density, another type of logic block, called the logic expander, has been
introduced. It is a wide-input NAND gate whose output could be connected to the input of the
AND-OR block. While its delay is similar, the NAND block uses less area than the AND-OR block,
and thus increases the effective number of product terms available to a logic block.
13.2.3 Multiplexer-Based CLB
Multiplexer-based FPGAs utilize a multiplexer to implement different logic function by connecting
each input to a constant or a signal.3 The ACT-1 logic block, for example, has three multiplexers and
one logic gate. Each block has eight inputs and one output, implementing:
Multiplexer-based FPGAs can provide a large degree of functionality for a relatively small number of
transistors. Multiplexer-based CLBs, however, place high demands on routing resources due to the large
number of inputs.
13.2.4 Interconnect
In all structures, a reprogrammable routing matrix interconnects
the configurable logic blocks. A portion of the routing matrix
for the Xilinx 4000-series FPGA, for example, is illustrated in
Fig. 13.2. Local interconnects are used to join adjacent CLBs.
Global routing modules are used to route signals across the chip.
The routing and placement issues for the FPGAs are
somewhat different from those of custom logic. For a large
fan-out node, for example, an optimal placement for the
elements for the fan-out would be along a single row or column,
where the routing could be done using a long line. For custom
FIGURE 13.1 Xilinx 4000-series CLB.
FIGURE 13.2 Xilinx routing matrix.