Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Hardware Acceleration of EDA Algorithms

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Kanupriya Gulati · Sunil P. Khatri

Hardware Acceleration

of EDA Algorithms

Custom ICs, FPGAs and GPUs

123

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Kanupriya Gulati

109 Branchwood Trl

Coppell TX 75019

USA

[email protected]

Sunil P. Khatri

Department of Electrical & Computer

Engineering

Texas A & M University

College Station TX

77843-3128

214 Zachry Engineering Center

USA

[email protected]

ISBN 978-1-4419-0943-5 e-ISBN 978-1-4419-0944-2

DOI 10.1007/978-1-4419-0944-2

Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2010920238

c Springer Science+Business Media, LLC 2010

permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,

NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in

connection with any form of information storage and retrieval, electronic adaptation, computer

software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if

they are not identified as such, is not to be taken as an expression of opinion as to whether or not

they are subject to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

To our parents and our teachers

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Foreword

Single-threaded software applications have ceased to see significant gains in performance on a general-purpose CPU, even with further scaling in very large scale

integration (VLSI) technology. This is a significant problem for electronic design

automation (EDA) applications, since the design complexity of VLSI integrated

circuits (ICs) is continuously growing. In this research monograph, we evaluate

custom ICs, field-programmable gate arrays (FPGAs), and graphics processors as

platforms for accelerating EDA algorithms, instead of the general-purpose singlethreaded CPU. We study applications which are used in key time-consuming steps

of the VLSI design flow. Further, these applications also have different degrees of

inherent parallelism in them. We study both control-dominated EDA applications

and control plus data parallel EDA applications. We accelerate these applications

on these different hardware platforms. We also present an automated approach for

accelerating certain uniprocessor applications on a graphics processor.

This monograph compares custom ICs, FPGAs, and graphics processing units

(GPUs) as potential platforms to accelerate EDA algorithms. It also provides details

of the programming model used for interfacing with the GPUs. As an example of a

control-dominated EDA problem, Boolean satisfiability (SAT) is accelerated using

the following hardware implementations: (i) a custom IC-based hardware approach

in which the traversal of the implication graph and conflict clause generation are

performed in hardware, in parallel, (ii) an FPGA-based hardware approach to accelerate SAT in which the entire SAT search algorithm is implemented in the FPGA,

and (iii) a complete SAT approach which employs a new GPU-enhanced variable

ordering heuristic.

In this monograph, several EDA problems with varying degrees of control and

data parallelisms are accelerated using a general-purpose graphics processor. In particular we accelerate Monte Carlo based statistical static timing analysis, device

model evaluation (for accelerating circuit simulation), fault simulation, and fault

table generation on a graphics processor, with speedups of up to 800×. Additionally, an automated approach is presented that accelerates (on a graphics processor) uniprocessor code that is executed multiple times on independent data sets

in an application. The key idea here is to partition the software into kernels in an

automated fashion, such that multiple independent instances of these kernels, when

vii

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

viii Foreword

executed in parallel on the GPU, can maximally benefit from the GPU’s hardware

resources.

We hope that this monograph can serve as a valuable reference to individuals

interested in exploring alternative hardware platforms and to those interested in

accelerating various EDA applications by harnessing the parallelism in these platforms.

College Station, TX Kanupriya Gulati

College Station, TX Sunil P. Khatri

October 2009

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Preface

In recent times, serial software applications have no longer enjoyed significant

gains in performance with process scaling, since microprocessor performance gains

have been hampered due to increases in power and manufacturability issues, which

accompany scaling. With the continuous growth of IC design complexities, this

problem is particularly significant for EDA applications. In this research monograph, we evaluate the feasibility of hardware platforms such as custom ICs, FPGAs,

and graphics processors, for accelerating EDA algorithms. We choose applications

which contribute significantly to the total runtime of the VLSI design flow and

which have varied degrees of inherent parallelism in them. We study the acceleration of such algorithms on these alternative platforms. We also present an automated approach to accelerate certain specific types of uniprocessor subroutines on

the GPU.

This research monograph consists of four parts. The alternative hardware platforms, along with the details of the programming model used for interfacing with

the graphics processing units, are discussed in the first part of this monograph.

The second part of this monograph studies the acceleration of an algorithm in

the control-dominated category, namely Boolean satisfiability (SAT). The third part

studies the acceleration of some algorithms in the control plus data parallel category, namely Monte Carlo based statistical static timing analysis, circuit simulation,

fault simulation and fault table generation. In the fourth part of the monograph, we

present the automated approach to generate GPU code to accelerate certain software

subroutines.

Book Outline

This research monograph is organized into four parts. In Part I of this research

monograph, we discuss alternative hardware platforms. We also provide details of

the programming model used for interfacing with the graphics processor. In Chapter 2, we compare and contrast the hardware platforms that are considered in this

monograph. In particular, we discuss custom-designed ICs, reconfigurable architectures such as FPGAs, and streaming processors such as graphics processing units

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

x Preface

(GPUs). This comparison is performed over various criteria such as architecture,

expected performance, programming model and environment, scalability, time to

market, security, and cost of hardware. In Chapter 3, we describe the programming

environment used for interfacing with the GPUs.

In Part II of this monograph we present hardware implementations of a controldominated EDA problem, namely Boolean satisfiability (SAT). We present

approaches to accelerate SAT using each of the three hardware platforms under

consideration. In Chapter 4, we present a custom IC-based hardware approach to

accelerate SAT. In this approach, the traversal of the implication graph and conflict clause generation are performed in hardware, in parallel. Further, we propose a

hardware approach to extract the minimum unsatisfiable core for any unsatisfiable

formula. In Chapter 5, we discuss an FPGA-based hardware approach to accelerate

SAT. In this approach, we store the clauses in the FPGA slices. In order to solve

large SAT instances, we partition the instance into ‘bins,’ each of which can fit in

the FPGA. The solution of SAT clauses of any bin is performed in parallel. Our

approach also handles (in hardware) the fact that the original SAT instance is partitioned into bins. In Chapter 6, we present a SAT approach which employs a new

GPU-enhanced variable ordering heuristic. In this approach, we augment a CPUbased complete procedure (MiniSAT), with a GPU-based approximate procedure

(survey propagation). In this manner, the complete procedure benefits from the high

parallelism of the GPU.

In Part III of this book, we study the acceleration of several EDA problems,

with varying amounts of control and data parallelism, on a GPU. In Chapter 7, we

exploit the parallelism in Monte Carlo based statistical static timing analysis and

accelerate it on a graphics processor. In this approach, we map the Monte Carlo

based SSTA computations to the large number of threads that can be computed in

parallel on a GPU. Our approach performs multiple delay simulations of a single

gate in parallel and further benefits from a parallel implementation of the Mersenne

Twister pseudo-random number generator on the GPU, followed by Box–Muller

transformations (also implemented on the GPU). In Chapter 8, we study the acceleration of fault simulation on a GPU. Fault simulation is inherently parallelizable

and requires a large number of gate evaluations to be performed for each gate in

a design. The large number of threads that can be computed in parallel on a GPU

can be employed to perform a large number of these gate evaluations in parallel. We

implement a pattern and fault parallel fault simulator, which fault-simulates a circuit

in a levelized fashion. We ensure that all threads of the GPU compute identical

instructions, but on different data. We study the generation of a fault table using a

GPU in Chapter 9. We employ a pattern parallel approach, which utilizes both bit

parallelism and thread-level parallelism. In Chapter 10, we explore the GPU-based

acceleration of the model card evaluation of a circuit simulator. Our resulting code

is integrated into a commercial fast SPICE tool, and the overall speedup obtained

is measured. With careful engineering, we maximally harness the GPU’s immense

memory bandwidth and high computational power.

In Part IV of this book, we present an automated approach to accelerate uniprocessor subroutines which are required to be executed multiple times within an

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Preface xi

application, on independent data sets. The target hardware platform is a generalpurpose graphics platform. The key idea here is to partition the subroutine into

kernels in an automated fashion, such that multiple instances of these kernels, when

executed in parallel on the GPU, can maximally benefit from the GPU’s hardware

resources. This approach is detailed in Chapter 11.

The approaches presented in this monograph collectively aim to contribute

toward enabling the VLSI CAD community to accelerate EDA algorithms on different hardware platforms.

College Station, TX Kanupriya Gulati

College Station, TX Sunil P. Khatri

October 2009

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Acknowledgments

The work presented in this research monograph would not have been possible without the tremendous amount of help and encouragement we have received from our

families, friends, and colleagues.

In particular, we are grateful to Mandar Waghmode, who contributed toward the

custom IC-based engine for accelerating Boolean satisfiability; Dr. Srinivas Patil,

Dr. Abhijit Jas, and Suganth Paul, for their assistance on the FPGA-based approach

for accelerating Boolean satisfiability; and Dr. John Croix and Rahm Shastry, who

helped in integrating our GPU-based accelerated code for model card evaluation

into a commercial fast SPICE tool.

We acknowledge the insightful comments of Dr. Peng Li, Dr. Hank Walker,

Dr. Desmond Kirkpatrick, and Dr. Jim Ji. We would also like to thank Intel Corporation, Nascentric Inc., Accelicon Technologies Inc., and NVIDIA Corporation,

for supporting this research through research grants and an NVIDIA fellowship,

respectively.

xiii

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Thư viện tri thức trực tuyến

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Hardware Acceleration of EDA Algorithms- P9 docx

Accelerating the BWA MEM algorithm on a hardware platform

Hardware Reference Guide

{Hardware} cau truc may tinh chuong1

Hardware of the 3D Surface Magnetic Field Distribution Testing Apparatus (Hardware Instruction)

Hardware description NE40E x8a