MEMORY, MICROPROCESSOR, and ASIC phần 9 pps

13-1

Logic Synthesis for

Field Programmable

Gate Array (FPGA)

Technology

13.1 Introduction 13–1

13.2 FPGA Structures 13–2

Look-up Table (LUT)-Based CLB • PLA-Based CLB •

Multiplexer-Based CLB • Interconnect

13.3 Logic Synthesis 13–4

Technology Independent Optimization • Technology

Mapping

13.4 Look-up Table (LUT) Synthesis 13–6

Library-Based Mapping • Direct Approaches

13.5 Chortle 13–7

Tree Mapping Algorithm • Example • Chortle-crf • Chortle-d

13.6 Two-Step Approaches 13–12

First Step: Decomposition • Second Step: Node Elimination

MIS-pga 2: A Framework for TLU-Logic Optimization

13.7 Conclusion 13–16

13.1 Introduction

Field Programmable Gate Arrays (FPGAs) enable rapid development and implementation of complex

digital circuits. FPGA devices can be reprogrammed and reused, allowing the same hardware to be

employed for entirely new designs or for new iterations of the same design. While much of traditional

IC logic synthesis methods apply, FPGA circuits have special requirements that affect synthesis.

The FPGA device consists of a number of configurable logic blocks (CLBs) interconnected by a

routing matrix. Pass transistors are used in the routing matrix to connect segments of metal lines. There

are three major types of CLBs: those based on PLAs, those based on multiplexers, and those based on

table lookup (TLU) functions.

Automated logic synthesis tools are used to optimize the mapping of the Boolean network to the

FPGA device. FPGA synthesis is an extension to the general problem of multi-level logic synthesis.

FPGA logic synthesis is usually solved in two phases. The technology-independent phase uses a general

multi-level logic optimization tool (such as Berkeley’s MIS) to reduce the complexity of the Boolean

network. Next, a technology-dependent optimization phase is used to optimize the logic for the particular

type of device. In the case of the TLU-based FPGA, each CLB can implement an arbitrary logic

John W.Lockwood

Washington University

0–8493–1737–1/03/$0.00+$ 1.50

13-2 Memory, Microprocessor, and ASIC

function of a limited number of variables. FPGA optimization algorithms aim to minimize the number

of CLBs used, the logic depth, and the routing density.

The Chortle algorithm is a direct method that uses dynamic programming to map the logic into

TLU-based CLBs. It converts the Boolean network into a forest of directed acyclic graphs (DAGs);

then it evaluates and records the optimal subsolutions to the logic mapping problem as it traverses the

DAG. The two-step algorithms operate by first decomposing the nodes, and then performing a node

elimination. Later sections of this chapter discuss in detail the Xmap, Hydra, and MIS-pga algorithms.

FPGA devices are fabricated using the same sub-micron geometries as other silicon devices. As

such, the devices benefit from the rapid advances in device-technology. The overhead of the programming

bits, general function generators, and general routing structures, however, reduce the total amount of

logic available to the end user.

13.2 FPGA Structures

An FPGA consists of reconfigurable logic elements, flip-flops, and a reprogrammable interconnect

structure. The logic elements are typically arranged in a matrix. The interconnect is arranged as a mesh

of variable-length metal wires and pass transistors to interconnect the logic elements. The logic elements

are programmed by downloading binary control information from an external ROM, a build-in EPROM,

or a host processor. After download, the control information is stored on the device and used to

determine the function of the logic elements and the state of the pass transistors. Unlike a PLA, the

FPGA can be used for multi-level logic functions.

The granularity of an FPGA refers to the complexity of the individual logic elements. A fine-grain

logic block appears to the user to be much like a standard mask-programmable gate array. Each logic

block consists of only a few transistors, and is limited to implementing only simple functions of a few

variables. A course-grain logic block (such as those from Xilinx, Actel, Quicklogic, and Altera) provides

more general functions of a larger number of variables. Each Xilinx 4000-series logic block, for example,

can implement any Boolean function of five variables, or two Boolean functions of four variables.

It has been found that the course-grain logic blocks generally provide better performance than the

fine-grain logic blocks, as the course-grained devices require less space for interconnect and routing by

combining multiple logic functions into one logic block. In particular, it has been shown that a fourinput logic block uses the minimal chip area for a large variety of benchmark circuits.1 The expense of

a few extra underutilized logic blocks outweighs the area required for the larger number of finegrained logic blocks and their associated larger interconnect matrix and pass transistors. This chapter

focuses on the logic synthesis for course-grained logic elements.

A course-grained configurable logic block (CLB) can be implemented using a PLA-based AND/

OR elements, multiplexers, or SRAM-based table look-up (LUT) elements. These configurations are

described below in detail.

13.2.1 Look-up Table (LUT)-Based CLB

The basic unit of look-up table (LUT)-based FPGAs is the configurable logic block (CLB), implemented

as an SRAM of size 2n× 1. Each CLB can implement any arbitrary logic function of n variables, for a

total of 2n functions.

An example of an LUT-based FPGA is the Xilinx 4000-series FPGA, as illustrated in Fig. 13.1. Each

CLB has three LUT generators and two flip-flops.2 The first two LUTs implement any function of four

variables, while the third LUT implements any function of three variables. Separately, each CLB can

implement two functions of four variables. Combined, each CLB can implement any one function of

five variables, or some restricted functions of nine variables (such as AND, OR, XOR).

Logic Synthesis for Field Programmable Gate Array (FPGA) Technology 13-3

13.2.2 PLA-Based CLB

PLA-based FPGA devices evolved from the traditional PLDs. Each basic logic block is an AND-OR

block consisting of wide fan-in AND gates feeding a few-input OR gate. The advantage of this

structure is that many logic functions can be implemented using only a few levels of logic, due of the

large number of literals that can be used at each block. It is, however, difficult to make efficient use of

all inputs to all gates. Even so, the amount of wasted area is minimized by the high packing density of

the wired-AND gates.

To further improve the density, another type of logic block, called the logic expander, has been

introduced. It is a wide-input NAND gate whose output could be connected to the input of the

AND-OR block. While its delay is similar, the NAND block uses less area than the AND-OR block,

and thus increases the effective number of product terms available to a logic block.

13.2.3 Multiplexer-Based CLB

Multiplexer-based FPGAs utilize a multiplexer to implement different logic function by connecting

each input to a constant or a signal.3 The ACT-1 logic block, for example, has three multiplexers and

one logic gate. Each block has eight inputs and one output, implementing:

Multiplexer-based FPGAs can provide a large degree of functionality for a relatively small number of

transistors. Multiplexer-based CLBs, however, place high demands on routing resources due to the large

number of inputs.

13.2.4 Interconnect

In all structures, a reprogrammable routing matrix interconnects

the configurable logic blocks. A portion of the routing matrix

for the Xilinx 4000-series FPGA, for example, is illustrated in

Fig. 13.2. Local interconnects are used to join adjacent CLBs.

Global routing modules are used to route signals across the chip.

The routing and placement issues for the FPGAs are

somewhat different from those of custom logic. For a large

fan-out node, for example, an optimal placement for the

elements for the fan-out would be along a single row or column,

where the routing could be done using a long line. For custom

FIGURE 13.1 Xilinx 4000-series CLB.

FIGURE 13.2 Xilinx routing matrix.

Thư viện tri thức trực tuyến

MEMORY, MICROPROCESSOR, and ASIC phần 9 pps

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

MEMORY, MICROPROCESSOR, and ASIC phần 4 ppsx

MEMORY, MICROPROCESSOR, and ASIC phần 6 pot

MEMORY, MICROPROCESSOR, and ASIC phần 5 potx

MEMORY, MICROPROCESSOR, and ASIC phần 2 pot

MEMORY, MICROPROCESSOR, and ASIC phần 1 pps

MEMORY, MICROPROCESSOR, and ASIC phần 8 pot