Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Manning OpenCL in Action pdf
PREMIUM
Số trang
458
Kích thước
8.3 MB
Định dạng
PDF
Lượt xem
1237

Tài liệu Manning OpenCL in Action pdf

Nội dung xem thử

Mô tả chi tiết

MANNING

Matthew Scarpino

How to accelerate graphics and computation

IN ACTION

OpenCL in Action

Download from Wow! eBook <www.wowebook.com>

Download from Wow! eBook <www.wowebook.com>

OpenCL in Action

HOW TO ACCELERATE GRAPHICS AND COMPUTATION

MATTHEW SCARPINO

MANNING

SHELTER ISLAND

Download from Wow! eBook <www.wowebook.com>

For online information and ordering of this and other Manning books, please visit

www.manning.com. The publisher offers discounts on this book when ordered in quantity.

For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 261

Shelter Island, NY 11964

Email: [email protected]

©2012 by Manning Publications Co. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in

any form or by means electronic, mechanical, photocopying, or otherwise, without prior written

permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are

claimed as trademarks. Where those designations appear in the book, and Manning

Publications was aware of a trademark claim, the designations have been printed in initial caps

or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have

the books we publish printed on acid-free paper, and we exert our best efforts to that end.

Recognizing also our responsibility to conserve the resources of our planet, Manning books are

printed on paper that is at least 15 percent recycled and processed without the use of elemental

chlorine.

Manning Publications Co. Development editor: Maria Townsley

20 Baldwin Road Copyeditor: Andy Carroll

PO Box 261 Proofreader: Maureen Spencer

Shelter Island, NY 11964 Typesetter: Gordan Salinovic

Cover designer: Marija Tudor

ISBN 9781617290176

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11

Download from Wow! eBook <www.wowebook.com>

v

brief contents

PART 1 FOUNDATIONS OF OPENCL PROGRAMMING...........................1

1 ■ Introducing OpenCL 3

2 ■ Host programming: fundamental data structures 16

3 ■ Host programming: data transfer and partitioning 43

4 ■ Kernel programming: data types and device memory 68

5 ■ Kernel programming: operators and functions 94

6 ■ Image processing 123

7 ■ Events, profiling, and synchronization 140

8 ■ Development with C++ 167

9 ■ Development with Java and Python 196

10 ■ General coding principles 221

PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL...................235

11 ■ Reduction and sorting 237

12 ■ Matrices and QR decomposition 258

13 ■ Sparse matrices 278

14 ■ Signal processing and the fast Fourier transform 295

Download from Wow! eBook <www.wowebook.com>

vi BRIEF CONTENTS

PART 3 ACCELERATING OPENGL WITH OPENCL .........................319

15 ■ Combining OpenCL and OpenGL 321

16 ■ Textures and renderbuffers 340

Download from Wow! eBook <www.wowebook.com>

vii

contents

preface xv

acknowledgments xvii

about this book xix

PART 1 FOUNDATIONS OF OPENCL PROGRAMMING ..............1

1 Introducing OpenCL 3

1.1 The dawn of OpenCL 4

1.2 Why OpenCL? 5

Portability 6 ■ Standardized vector processing 6 ■ Parallel

programming 7

1.3 Analogy: OpenCL processing and a game of cards 8

1.4 A first look at an OpenCL application 10

1.5 The OpenCL standard and extensions 13

1.6 Frameworks and software development kits (SDKs) 14

1.7 Summary 14

Download from Wow! eBook <www.wowebook.com>

viii CONTENTS

2 Host programming: fundamental data structures 16

2.1 Primitive data types 17

2.2 Accessing platforms 18

Creating platform structures 18 ■ Obtaining platform

information 19 ■ Code example: testing platform extensions 20

2.3 Accessing installed devices 22

Creating device structures 22 ■ Obtaining device

information 23 ■ Code example: testing device extensions 24

2.4 Managing devices with contexts 25

Creating contexts 26 ■ Obtaining context information 28

Contexts and the reference count 28 ■ Code example: checking

a context’s reference count 29

2.5 Storing device code in programs 30

Creating programs 30 ■ Building programs 31 ■ Obtaining

program information 33 ■ Code example: building a program from

multiple source files 35

2.6 Packaging functions in kernels 36

Creating kernels 36 ■ Obtaining kernel information 37

Code example: obtaining kernel information 38

2.7 Collecting kernels in a command queue 39

Creating command queues 40 ■ Enqueuing kernel execution

commands 40

2.8 Summary 41

3 Host programming: data transfer and partitioning 43

3.1 Setting kernel arguments 44

3.2 Buffer objects 45

Allocating buffer objects 45 ■ Creating subbuffer objects 47

3.3 Image objects 48

Creating image objects 48 ■ Obtaining information about image

objects 51

3.4 Obtaining information about buffer objects 52

3.5 Memory object transfer commands 54

Read/write data transfer 54 ■ Mapping memory objects 58

Copying data between memory objects 59

Download from Wow! eBook <www.wowebook.com>

CONTENTS ix

3.6 Data partitioning 62

Loops and work-items 63 ■ Work sizes and offsets 64 ■ A simple

one-dimensional example 65 ■ Work-groups and compute units 65

3.7 Summary 67

4 Kernel programming: data types and device memory 68

4.1 Introducing kernel coding 69

4.2 Scalar data types 70

Accessing the double data type 71 ■ Byte order 72

4.3 Floating-point computing 73

The float data type 73 ■ The double data type 74 ■ The half

data type 75 ■ Checking IEEE-754 compliance 76

4.4 Vector data types 77

Preferred vector widths 79 ■ Initializing vectors 80 ■ Reading

and modifying vector components 80 ■ Endianness and memory

access 84

4.5 The OpenCL device model 85

Device model analogy part 1: math students in school 85 ■ Device

model analogy part 2: work-items in a device 87 ■ Address spaces

in code 88 ■ Memory alignment 90

4.6 Local and private kernel arguments 90

Local arguments 91 ■ Private arguments 91

4.7 Summary 93

5 Kernel programming: operators and functions 94

5.1 Operators 95

5.2 Work-item and work-group functions 97

Dimensions and work-items 98 ■ Work-groups 99 ■ An

example application 100

5.3 Data transfer operations 101

Loading and storing data of the same type 101 ■ Loading vectors

from a scalar array 101 ■ Storing vectors to a scalar array 102

5.4 Floating-point functions 103

Arithmetic and rounding functions 103 ■ Comparison

functions 105 ■ Exponential and logarithmic functions 106

Trigonometric functions 106 ■ Miscellaneous floating-point

functions 108

Download from Wow! eBook <www.wowebook.com>

x CONTENTS

5.5 Integer functions 109

Adding and subtracting integers 110 ■ Multiplication 111

Miscellaneous integer functions 112

5.6 Shuffle and select functions 114

Shuffle functions 114 ■ Select functions 116

5.7 Vector test functions 118

5.8 Geometric functions 120

5.9 Summary 122

6 Image processing 123

6.1 Image objects and samplers 124

Image objects on the host: cl_mem 124 ■ Samplers on the host:

cl_sampler 125 ■ Image objects on the device: image2d_t and

image3d_t 128 ■ Samplers on the device: sampler_t 129

6.2 Image processing functions 130

Image read functions 130 ■ Image write functions 132

Image information functions 133 ■ A simple example 133

6.3 Image scaling and interpolation 135

Nearest-neighbor interpolation 135 ■ Bilinear interpolation 136

Image enlargement in OpenCL 138

6.4 Summary 139

7 Events, profiling, and synchronization 140

7.1 Host notification events 141

Associating an event with a command 141 ■ Associating an event

with a callback function 142 ■ A host notification example 143

7.2 Command synchronization events 145

Wait lists and command events 145 ■ Wait lists and user

events 146 ■ Additional command synchronization

functions 148 ■ Obtaining data associated with events 150

7.3 Profiling events 153

Configuring command profiling 153 ■ Profiling data

transfer 155 ■ Profiling data partitioning 157

7.4 Work-item synchronization 158

Barriers and fences 159 ■ Atomic operations 160 ■ Atomic

commands and mutexes 163 ■ Asynchronous data transfer 164

7.5 Summary 166

Download from Wow! eBook <www.wowebook.com>

CONTENTS xi

8 Development with C++ 167

8.1 Preliminary concerns 168

Vectors and strings 168 ■ Exceptions 169

8.2 Creating kernels 170

Platforms, devices, and contexts 170 ■ Programs and kernels 173

8.3 Kernel arguments and memory objects 176

Memory objects 177 ■ General data arguments 181 ■ Local

space arguments 182

8.4 Command queues 183

Creating CommandQueue objects 183 ■ Enqueuing kernel￾execution commands 183 ■ Read/write commands 185

Memory mapping and copy commands 187

8.5 Event processing 189

Host notification 189 ■ Command synchronization 191

Profiling events 192 ■ Additional event functions 193

8.6 Summary 194

9 Development with Java and Python 196

9.1 Aparapi 197

Aparapi installation 198 ■ The Kernel class 198 ■ Work-items

and work-groups 200

9.2 JavaCL 201

JavaCL installation 202 ■ Overview of JavaCL development 202

Creating kernels with JavaCL 203 ■ Setting arguments and

enqueuing commands 206

9.3 PyOpenCL 210

PyOpenCL installation and licensing 210 ■ Overview of PyOpenCL

development 211 ■ Creating kernels with PyOpenCL 212 ■ Setting

arguments and executing kernels 215

9.4 Summary 219

10 General coding principles 221

10.1 Global size and local size 222

Finding the maximum work-group size 223 ■ Testing kernels and

devices 224

10.2 Numerical reduction 225

OpenCL reduction 226 ■ Improving reduction speed with vectors 228

Download from Wow! eBook <www.wowebook.com>

xii CONTENTS

10.3 Synchronizing work-groups 230

10.4 Ten tips for high-performance kernels 231

10.5 Summary 233

PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL.......235

11 Reduction and sorting 237

11.1 MapReduce 238

Introduction to MapReduce 238 ■ MapReduce and

OpenCL 240 ■ MapReduce example: searching for text 242

11.2 The bitonic sort 244

Understanding the bitonic sort 244 ■ Implementing the bitonic sort

in OpenCL 247

11.3 The radix sort 254

Understanding the radix sort 254 ■ Implementing the radix sort

with vectors 254

11.4 Summary 256

12 Matrices and QR decomposition 258

12.1 Matrix transposition 259

Introduction to matrices 259 ■ Theory and implementation of

matrix transposition 260

12.2 Matrix multiplication 262

The theory of matrix multiplication 262 ■ Implementing matrix

multiplication in OpenCL 263

12.3 The Householder transformation 265

Vector projection 265 ■ Vector reflection 266 ■ Outer products

and Householder matrices 267 ■ Vector reflection in

OpenCL 269

12.4 The QR decomposition 269

Finding the Householder vectors and R 270 ■ Finding the

Householder matrices and Q 272 ■ Implementing QR

decomposition in OpenCL 273

12.5 Summary 276

13 Sparse matrices 278

13.1 Differential equations and sparse matrices 279

Download from Wow! eBook <www.wowebook.com>

CONTENTS xiii

13.2 Sparse matrix storage and the Harwell-Boeing collection 280

Introducing the Harwell-Boeing collection 281 ■ Accessing data in

Matrix Market files 281

13.3 The method of steepest descent 285

Positive-definite matrices 285 ■ Theory of the method of steepest

descent 286 ■ Implementing SD in OpenCL 288

13.4 The conjugate gradient method 289

Orthogonalization and conjugacy 289 ■ The conjugate gradient

method 291

13.5 Summary 293

14 Signal processing and the fast Fourier transform 295

14.1 Introducing frequency analysis 296

14.2 The discrete Fourier transform 298

Theory behind the DFT 298 ■ OpenCL and the DFT 305

14.3 The fast Fourier transform 306

Three properties of the DFT 306 ■ Constructing the fast Fourier

transform 309 ■ Implementing the FFT with OpenCL 312

14.4 Summary 317

PART 3 ACCELERATING OPENGL WITH OPENCL.............319

15 Combining OpenCL and OpenGL 321

15.1 Sharing data between OpenGL and OpenCL 322

Creating the OpenCL context 323 ■ Sharing data between OpenGL

and OpenCL 325 ■ Synchronizing access to shared data 328

15.2 Obtaining information 329

Obtaining OpenGL object and texture information 329 ■ Obtaining

information about the OpenGL context 330

15.3 Basic interoperability example 331

Initializing OpenGL operation 331 ■ Initializing OpenCL

operation 331 ■ Creating data objects 332 ■ Executing the

kernel 333 ■ Rendering graphics 334

15.4 Interoperability and animation 334

Specifying vertex data 335 ■ Animation and display 336

Executing the kernel 337

15.5 Summary 338

Download from Wow! eBook <www.wowebook.com>

xiv CONTENTS

16 Textures and renderbuffers 340

16.1 Image filtering 341

The Gaussian blur 343 ■ Image sharpening 344 ■ Image

embossing 344

16.2 Filtering textures with OpenCL 345

The init_gl function 345 ■ The init_cl function 345 ■ The

configure_shared_data function 346 ■ The execute_kernel

function 347 ■ The display function 348

16.3 Summary 349

appendix A Installing and using a software development kit 351

appendix B Real-time rendering with OpenGL 364

appendix C The minimalist GNU for Windows and OpenCL 398

appendix D OpenCL on mobile devices 412

index 415

Download from Wow! eBook <www.wowebook.com>

Tải ngay đi em, còn do dự, trời tối mất!