Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Manning OpenCL in Action pdf
Nội dung xem thử
Mô tả chi tiết
MANNING
Matthew Scarpino
How to accelerate graphics and computation
IN ACTION
OpenCL in Action
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>
OpenCL in Action
HOW TO ACCELERATE GRAPHICS AND COMPUTATION
MATTHEW SCARPINO
MANNING
SHELTER ISLAND
Download from Wow! eBook <www.wowebook.com>
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Email: [email protected]
©2012 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without the use of elemental
chlorine.
Manning Publications Co. Development editor: Maria Townsley
20 Baldwin Road Copyeditor: Andy Carroll
PO Box 261 Proofreader: Maureen Spencer
Shelter Island, NY 11964 Typesetter: Gordan Salinovic
Cover designer: Marija Tudor
ISBN 9781617290176
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11
Download from Wow! eBook <www.wowebook.com>
v
brief contents
PART 1 FOUNDATIONS OF OPENCL PROGRAMMING...........................1
1 ■ Introducing OpenCL 3
2 ■ Host programming: fundamental data structures 16
3 ■ Host programming: data transfer and partitioning 43
4 ■ Kernel programming: data types and device memory 68
5 ■ Kernel programming: operators and functions 94
6 ■ Image processing 123
7 ■ Events, profiling, and synchronization 140
8 ■ Development with C++ 167
9 ■ Development with Java and Python 196
10 ■ General coding principles 221
PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL...................235
11 ■ Reduction and sorting 237
12 ■ Matrices and QR decomposition 258
13 ■ Sparse matrices 278
14 ■ Signal processing and the fast Fourier transform 295
Download from Wow! eBook <www.wowebook.com>
vi BRIEF CONTENTS
PART 3 ACCELERATING OPENGL WITH OPENCL .........................319
15 ■ Combining OpenCL and OpenGL 321
16 ■ Textures and renderbuffers 340
Download from Wow! eBook <www.wowebook.com>
vii
contents
preface xv
acknowledgments xvii
about this book xix
PART 1 FOUNDATIONS OF OPENCL PROGRAMMING ..............1
1 Introducing OpenCL 3
1.1 The dawn of OpenCL 4
1.2 Why OpenCL? 5
Portability 6 ■ Standardized vector processing 6 ■ Parallel
programming 7
1.3 Analogy: OpenCL processing and a game of cards 8
1.4 A first look at an OpenCL application 10
1.5 The OpenCL standard and extensions 13
1.6 Frameworks and software development kits (SDKs) 14
1.7 Summary 14
Download from Wow! eBook <www.wowebook.com>
viii CONTENTS
2 Host programming: fundamental data structures 16
2.1 Primitive data types 17
2.2 Accessing platforms 18
Creating platform structures 18 ■ Obtaining platform
information 19 ■ Code example: testing platform extensions 20
2.3 Accessing installed devices 22
Creating device structures 22 ■ Obtaining device
information 23 ■ Code example: testing device extensions 24
2.4 Managing devices with contexts 25
Creating contexts 26 ■ Obtaining context information 28
Contexts and the reference count 28 ■ Code example: checking
a context’s reference count 29
2.5 Storing device code in programs 30
Creating programs 30 ■ Building programs 31 ■ Obtaining
program information 33 ■ Code example: building a program from
multiple source files 35
2.6 Packaging functions in kernels 36
Creating kernels 36 ■ Obtaining kernel information 37
Code example: obtaining kernel information 38
2.7 Collecting kernels in a command queue 39
Creating command queues 40 ■ Enqueuing kernel execution
commands 40
2.8 Summary 41
3 Host programming: data transfer and partitioning 43
3.1 Setting kernel arguments 44
3.2 Buffer objects 45
Allocating buffer objects 45 ■ Creating subbuffer objects 47
3.3 Image objects 48
Creating image objects 48 ■ Obtaining information about image
objects 51
3.4 Obtaining information about buffer objects 52
3.5 Memory object transfer commands 54
Read/write data transfer 54 ■ Mapping memory objects 58
Copying data between memory objects 59
Download from Wow! eBook <www.wowebook.com>
CONTENTS ix
3.6 Data partitioning 62
Loops and work-items 63 ■ Work sizes and offsets 64 ■ A simple
one-dimensional example 65 ■ Work-groups and compute units 65
3.7 Summary 67
4 Kernel programming: data types and device memory 68
4.1 Introducing kernel coding 69
4.2 Scalar data types 70
Accessing the double data type 71 ■ Byte order 72
4.3 Floating-point computing 73
The float data type 73 ■ The double data type 74 ■ The half
data type 75 ■ Checking IEEE-754 compliance 76
4.4 Vector data types 77
Preferred vector widths 79 ■ Initializing vectors 80 ■ Reading
and modifying vector components 80 ■ Endianness and memory
access 84
4.5 The OpenCL device model 85
Device model analogy part 1: math students in school 85 ■ Device
model analogy part 2: work-items in a device 87 ■ Address spaces
in code 88 ■ Memory alignment 90
4.6 Local and private kernel arguments 90
Local arguments 91 ■ Private arguments 91
4.7 Summary 93
5 Kernel programming: operators and functions 94
5.1 Operators 95
5.2 Work-item and work-group functions 97
Dimensions and work-items 98 ■ Work-groups 99 ■ An
example application 100
5.3 Data transfer operations 101
Loading and storing data of the same type 101 ■ Loading vectors
from a scalar array 101 ■ Storing vectors to a scalar array 102
5.4 Floating-point functions 103
Arithmetic and rounding functions 103 ■ Comparison
functions 105 ■ Exponential and logarithmic functions 106
Trigonometric functions 106 ■ Miscellaneous floating-point
functions 108
Download from Wow! eBook <www.wowebook.com>
x CONTENTS
5.5 Integer functions 109
Adding and subtracting integers 110 ■ Multiplication 111
Miscellaneous integer functions 112
5.6 Shuffle and select functions 114
Shuffle functions 114 ■ Select functions 116
5.7 Vector test functions 118
5.8 Geometric functions 120
5.9 Summary 122
6 Image processing 123
6.1 Image objects and samplers 124
Image objects on the host: cl_mem 124 ■ Samplers on the host:
cl_sampler 125 ■ Image objects on the device: image2d_t and
image3d_t 128 ■ Samplers on the device: sampler_t 129
6.2 Image processing functions 130
Image read functions 130 ■ Image write functions 132
Image information functions 133 ■ A simple example 133
6.3 Image scaling and interpolation 135
Nearest-neighbor interpolation 135 ■ Bilinear interpolation 136
Image enlargement in OpenCL 138
6.4 Summary 139
7 Events, profiling, and synchronization 140
7.1 Host notification events 141
Associating an event with a command 141 ■ Associating an event
with a callback function 142 ■ A host notification example 143
7.2 Command synchronization events 145
Wait lists and command events 145 ■ Wait lists and user
events 146 ■ Additional command synchronization
functions 148 ■ Obtaining data associated with events 150
7.3 Profiling events 153
Configuring command profiling 153 ■ Profiling data
transfer 155 ■ Profiling data partitioning 157
7.4 Work-item synchronization 158
Barriers and fences 159 ■ Atomic operations 160 ■ Atomic
commands and mutexes 163 ■ Asynchronous data transfer 164
7.5 Summary 166
Download from Wow! eBook <www.wowebook.com>
CONTENTS xi
8 Development with C++ 167
8.1 Preliminary concerns 168
Vectors and strings 168 ■ Exceptions 169
8.2 Creating kernels 170
Platforms, devices, and contexts 170 ■ Programs and kernels 173
8.3 Kernel arguments and memory objects 176
Memory objects 177 ■ General data arguments 181 ■ Local
space arguments 182
8.4 Command queues 183
Creating CommandQueue objects 183 ■ Enqueuing kernelexecution commands 183 ■ Read/write commands 185
Memory mapping and copy commands 187
8.5 Event processing 189
Host notification 189 ■ Command synchronization 191
Profiling events 192 ■ Additional event functions 193
8.6 Summary 194
9 Development with Java and Python 196
9.1 Aparapi 197
Aparapi installation 198 ■ The Kernel class 198 ■ Work-items
and work-groups 200
9.2 JavaCL 201
JavaCL installation 202 ■ Overview of JavaCL development 202
Creating kernels with JavaCL 203 ■ Setting arguments and
enqueuing commands 206
9.3 PyOpenCL 210
PyOpenCL installation and licensing 210 ■ Overview of PyOpenCL
development 211 ■ Creating kernels with PyOpenCL 212 ■ Setting
arguments and executing kernels 215
9.4 Summary 219
10 General coding principles 221
10.1 Global size and local size 222
Finding the maximum work-group size 223 ■ Testing kernels and
devices 224
10.2 Numerical reduction 225
OpenCL reduction 226 ■ Improving reduction speed with vectors 228
Download from Wow! eBook <www.wowebook.com>
xii CONTENTS
10.3 Synchronizing work-groups 230
10.4 Ten tips for high-performance kernels 231
10.5 Summary 233
PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL.......235
11 Reduction and sorting 237
11.1 MapReduce 238
Introduction to MapReduce 238 ■ MapReduce and
OpenCL 240 ■ MapReduce example: searching for text 242
11.2 The bitonic sort 244
Understanding the bitonic sort 244 ■ Implementing the bitonic sort
in OpenCL 247
11.3 The radix sort 254
Understanding the radix sort 254 ■ Implementing the radix sort
with vectors 254
11.4 Summary 256
12 Matrices and QR decomposition 258
12.1 Matrix transposition 259
Introduction to matrices 259 ■ Theory and implementation of
matrix transposition 260
12.2 Matrix multiplication 262
The theory of matrix multiplication 262 ■ Implementing matrix
multiplication in OpenCL 263
12.3 The Householder transformation 265
Vector projection 265 ■ Vector reflection 266 ■ Outer products
and Householder matrices 267 ■ Vector reflection in
OpenCL 269
12.4 The QR decomposition 269
Finding the Householder vectors and R 270 ■ Finding the
Householder matrices and Q 272 ■ Implementing QR
decomposition in OpenCL 273
12.5 Summary 276
13 Sparse matrices 278
13.1 Differential equations and sparse matrices 279
Download from Wow! eBook <www.wowebook.com>
CONTENTS xiii
13.2 Sparse matrix storage and the Harwell-Boeing collection 280
Introducing the Harwell-Boeing collection 281 ■ Accessing data in
Matrix Market files 281
13.3 The method of steepest descent 285
Positive-definite matrices 285 ■ Theory of the method of steepest
descent 286 ■ Implementing SD in OpenCL 288
13.4 The conjugate gradient method 289
Orthogonalization and conjugacy 289 ■ The conjugate gradient
method 291
13.5 Summary 293
14 Signal processing and the fast Fourier transform 295
14.1 Introducing frequency analysis 296
14.2 The discrete Fourier transform 298
Theory behind the DFT 298 ■ OpenCL and the DFT 305
14.3 The fast Fourier transform 306
Three properties of the DFT 306 ■ Constructing the fast Fourier
transform 309 ■ Implementing the FFT with OpenCL 312
14.4 Summary 317
PART 3 ACCELERATING OPENGL WITH OPENCL.............319
15 Combining OpenCL and OpenGL 321
15.1 Sharing data between OpenGL and OpenCL 322
Creating the OpenCL context 323 ■ Sharing data between OpenGL
and OpenCL 325 ■ Synchronizing access to shared data 328
15.2 Obtaining information 329
Obtaining OpenGL object and texture information 329 ■ Obtaining
information about the OpenGL context 330
15.3 Basic interoperability example 331
Initializing OpenGL operation 331 ■ Initializing OpenCL
operation 331 ■ Creating data objects 332 ■ Executing the
kernel 333 ■ Rendering graphics 334
15.4 Interoperability and animation 334
Specifying vertex data 335 ■ Animation and display 336
Executing the kernel 337
15.5 Summary 338
Download from Wow! eBook <www.wowebook.com>
xiv CONTENTS
16 Textures and renderbuffers 340
16.1 Image filtering 341
The Gaussian blur 343 ■ Image sharpening 344 ■ Image
embossing 344
16.2 Filtering textures with OpenCL 345
The init_gl function 345 ■ The init_cl function 345 ■ The
configure_shared_data function 346 ■ The execute_kernel
function 347 ■ The display function 348
16.3 Summary 349
appendix A Installing and using a software development kit 351
appendix B Real-time rendering with OpenGL 364
appendix C The minimalist GNU for Windows and OpenCL 398
appendix D OpenCL on mobile devices 412
index 415
Download from Wow! eBook <www.wowebook.com>