Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Multimedia Communication Technology
Nội dung xem thử
Mô tả chi tiết
Jens-Rainer Ohm
Multimedia Communication Technology
Springer-Verlag Berlin Heidelberg GmbH
Engineering ONLINE LlBRARY
springeronline.com
Jens-Rainer Ohm
Multimedia
Communication
Technology
Representation, Transmission and Identification
of Multimedia Signals
With 441 Figures
, Springer
Professor Jens-Rainer Ohm
RWTH Aachen University
Chair and Institute of Communications Engineering
Melatener Str. 23
52074 Aachen
Germany
Cataloging-in-Publication Data applied for
ISBN 978-3-642-62277-9 ISBN 978-3-642-18750-6 (eBook)
DOI 10.1007/978-3-642-18750-6
This work is subject to copyright. AlI rights are reserved, whether the whole or part of the material is
concemed, specifically the rights of translation, reprinting, reuse of illustratţons, recitation,
broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions ofthe German Copyright Law
of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under German Copyright Law.
springeronline.com
C Springer-Verlag Berlin Heidelberg 2004
Originally published by Springer-Verlag Berlin Heidelberg New York in 2004
Softcover reprint of the hardcover 18t edition 2004
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Typesetting: Digital data supplied by author
Cover-Design: Design & Production, Heidelberg
Printed on acid-free paper 62/3020 Rw 5432 1 O
Preface
Information technology provides a plenty of new ways to process, store, distribute
and access audiovisual information. Beyond traditional broadcast and telephone
channels and analog storage media like film or tapes, the emerging Internet, mobile networks and digital storage are going to revolutionize the terms of distribution and access. This development is ruled by the convergence of audiovisual
media technology, information technology and telecommunications technology. By
capabilities of digital processing, established media like photography, movie, television and radio are changing their roles and are becoming subsumed by new integrated services which are mobile, interactive, pervasive, usable from anywhere,
giving freedom to play with, and penetrating everyday life. Multimedia communication establishes new forms of communication between people, between people
and machines, allows also communication between machines using audiovisual
information or related feature parameters. Intelligent media interfaces are becoming increasingly important, and machine assistance in accessing media, in acquiring, organizing, distributing, manipulating and consuming audiovisual information
becomes inevitable in the future.
This book intends to provide a deep insight into important enabling technologies of multimedia communication systems, which are methods of multimedia
signal processing, analysis, identification and recognition, and schemes for multimedia signal representation, compression and expression by features or other
properties. All these are lively and highly innovative areas at present, where this
book reviews state-of-the-art technology and its scientific foundations, but shall
primarily support systematic understanding of underlying methods, algorithms and
their theoretical foundations. It is strongly believed that this is the best approach to
contribute to future improvements in the field.
In part, the book is a substantially upgraded translation ofmy German language
textbook on digital image and video coding, which was published by the mid '90s.
Since then, the progress that was made in compression of audiovisual data has
been breath-taking, and consequently newest developments are reflected, including
the Advanced Video Coding standard and motion-compensated Wavelet coding.
The second basis for this book are my lectures on topics of multimedia communications held regularly at RWTH Aachen University. These treat all aspects of
image, video and audio compression, including networking interfaces, and also
include multimedia signal identification and recognition . These latter aspects,
topically related to the MPEG-7 multimedia content description standard, establish
a profound basis for intelligent multimedia systems.
Most chapters are supplemented by homework problems, for which solutions are
available from http://www.ient.rwth-aachen.de.
VI
The book would not have been possible without contributions of numerous students and many other people who have worked with me on topics of image, video
and audio processing, encoding and recognition over more than 15 years. These
are (in alphabetical order) Sven Bauer, Michael Becker, Markus Beermann, Sven
Brandau, Nicole Brandenburg, Michael Briinig, Ferry Bunjamin, Kai Cliiver, Emmanuelle Come, Holger Crysandt, Sila Ekmekci, Christoph Fehn, Ingo Feldmann,
Oliver Fromm, Karsten Griineberg, Karsten Griinheit, Jens Guther, Hafez Hadinejad, Konstantin Hanke, Guido Heising, Hans Dieter Holme, Michael Hoynck,
Laetitia Hue, Ebroul Izquierdo, Peter Kauff, Jorg Kramer, Silko Kruse, Patrick
Laurent, Thomas Ledworuski, Wolfram Liebsch, Oliver Lietz, Phuong Ma, Bela
Makai, Claudia Mayer, Bernd Menser, Domingo Mery, Karsten Muller, Patrick
Ndjiki-Nya, Bernhard Pasewaldt, Andreas Praatz, Lars Prokop, Oliver Rockinger,
Katrin Riimtnler, Thomas Rusert, Mihaela van der Schaar, Ansgar Schiffler, Oliver
Schreer, Holger Schulz, Aljoscha Smolic, Frank Sperling, Peter Stammnitz, Jens
Wellhausen, Mathias Wien and DetlefZier. Please forgive me ifI forgot anybody.
Very special thanks are also directed to my scientific mentors Peter Noll, Hans
Dieter Luke and Irmfried Hartmann, all people ofIENT and to my family.
Aachen, August 15, 2003
Jens-Rainer Ohm
Table of Contents
1 Introduction 1
1.1 Concepts and Terminology 1
1.1.1 Signal Representation by Source Coding .4
1.1.2 Optimization ofTransmission 6
1.1.3 Content Identification 7
1.2 Signal Sources and Acquisit ion 9
1.3 Digital Representation ofMultimedia Signals 13
1.3.1 Image and Video Signals 13
1.3.2 Speech and Audio Signals 18
1.4 Problems 19
Part A: Multimedia Signal Processing and Analysis 21
2 Signals and Sampling 23
2.1 Signals and Fourier Spectra 23
2.1.1 Spatial Signals and Two-dimensional Spectra 24
2.1.2 Spatio-temporal Signals .30
2.2 Sampling ofMultimedia Signals 33
2.2.1 The Sampling Theorem 33
2.2.2 Separable Two-dimensional Sampling 35
2.2.3 Non-separable Two-dimensional Sampling 37
2.2.4 Sampling ofVideo Signals .42
2.3 Problems 46
3 Statistical Analysis of Multimedia Signals .49
3.1 Properties Related to Sample Statistics .49
3.2 Joint Statistical Properties 54
3.3 Spectral Properties 63
3.4 Statistical Modeling and Tests 68
3.5 Statistical Foundations ofInfonnation Theory 73
3.6 Problems 77
4 Linear Systems and Transforms 79
4.1 Two- and Multi-dimensional Linear Systems 79
4.1.1 Properties ofTwo-dimensional Filters 79
4.1.2 Frequency Transfer Functions of Multi-dimensional Filters 85
4.1.3 Image filtering by Matrix Operations 91
4.1.4 Realization ofTwo-dimensional Filters 93
VIII
4.2 Linear Prediction 96
4.2.1 One- and Two-dimensional Autoregressive Models 96
4.2.2 Linear Prediction 104
4.3 Linear Block Transforms 109
4.3.1 Orthogonal Basis Functions 109
4.3.2 Basis Functions of Orthogonal Transforms 113
4.3.3 Efficiency ofTransforms 126
4.3.4 Fast Transform Algorithms 129
4.3.5 Transforms with Block Overlap 130
4.4 Filterbank Transforms 133
4.4.1 Decimation and Interpolation 135
4.4.2 Properties of Subband Filters 138
4.4.3 Implementation ofFilterbank Structures 145
4.4.4 Wavelet Transform 151
4.4.5 Two- and Multi-dimensional Filter Banks 160
4.4.6 Pyramid Decomposition 164
4.5 Problems 167
5 Pre- and Postprocessing 171
5.1 Nonlinear Filters 171
5.1.1 Median Filters and Rank Order Filters 172
5.1.2 Morphological Filters 175
5.1.3 Polynomial Filters 179
5.2 Signal Enhancement 180
5.3 Amplitude-value transformations 182
5.3.1 Amplitude Mapping Functions 183
5.3.2 Probability Distribution Modification and Equalization 185
5.4 Interpolation 187
5.4.1 Zero- and First-order Interpolators 188
5.4.2 Interpolation using linear Filters 190
5.4.3 Interpolation based on Frequency Extension 193
5.4.4 Spline and Lagrangian Interpolation 194
5.4.5 Interpolation on Irregular 2D Grids 198
5.5 Problems 200
Part B: Content-related Multimedia Signal Analysis 203
6 Perceptual Properties of Vision and Hearing 205
6.1 Properties ofVision 205
6.1.1 Physiology ofthe Eye 205
6.1.2 Sensitivity Functions 207
6.1.3 Color Vision 210
6.2 Properties ofHearing 211
6.2.1 Physiology of the Ear 211
6.2.2 Sensitivity Functions 212
IX
7 Features of Multimedia Signals 217
7.1 Color 217
7.1.1 Color Space Transformations 218
7.1.2 Representation ofColor Features 223
7.2 Texture 228
7.2.1 Statistical Texture Analysis 229
7.2.2 Spectral Features ofTexture 235
7.3 Edge Analysis 242
7.3.1 Edge Detection by Gradient Operators 242
7.3.2 Edge Characterization by second Derivative 244
7.3.3 Edge Finding and Consistency Analysis 247
7.3.4 Edge Model Fitting 249
7.3.5 Description and Analysis of Edge Properties 251
7.4 Contour and Shape Analysis 253
7.4.1 Contour fitting 253
7.4.2 Contour Description by Orientation and Curvature 259
7.4.3 Geometric Features and Binary Shape Features 263
7.4.4 Projection and geometric mapping 267
7.4.5 Moment analysis 274
7.4.6 Shape Analysis by Basis Functions 278
7.4.7 Three-dimensional Shapes 279
7.5 Correspondence analysis 284
7.6 Motion Analysis 288
7.6.1 Mapping of motion into the image plane 288
7.6.2 Motion Estimation by the Optical Flow Principle 292
7.6.3 Motion Estimation by Matching 297
7.6.4 Estimation ofParameters for Warping Grids 307
7.6.5 Estimation of non-translational Motion Parameters 310
7.6.6 Estimation ofMotion Vector Fields at Object Boundaries 313
7.6.7 Analysis of 3D Motion 315
7.7 Disparity and Depth Analysis 316
7.7.1 Central Projection in Stereoscopic and Multiple-camera Systems 321
7.7.2 Epipolar Geometry for arbitrary Camera Configurations 323
7.8 Mosaics 326
7.9 Face Detection and Description 328
7.10 Audio Signal Features 331
7.10.1 Basic Features 332
7.10.2 Speech Signal Analysis 333
7.10.3 Musical Signals, Instruments and Sounds 334
7.10.4 Room Properties 344
7.11 Problems 346
8 Signal and Parameter Estimation 353
8.1 Observation and Degradation Models 353
x
8.2 Estimation based on linear filters 355
8.2.1 Inverse Filtering 355
8.2.2 Wiener Filtering 356
8.3 Least Squares Estimation 358
8.4 Singular Value Decomposition 361
8.5 ML and MAP Estimation 363
8.6 Kalman Estimation 366
8.7 Outlier rejection in estimation 370
8.8 Problems 373
9 Feature Transforms and Classification 375
9.1 Feature Transforms 375
9.1.1 Eigenvector Analysis ofFeature Value Sets 376
9.1.2 Independent Component Analysis 377
9.1.3 Generalized Hough Transform .378
9.2 Feature Value Normalization and Weighting 379
9.2.1 Normalization ofFeature Values 380
9.2.2 Simple Distance Metrics 381
9.2.3 Distance Metrics related to Statistical Distributions 382
9.2.4 Distance Metrics based on Class Features 385
9.2.5 Reliability measures 387
9.3 Feature-based Comparison 389
9.4 Feature-based Classification 391
9.4.1 Linear Classification oftwo Classes 393
9.4.2 Generalization of Linear Classification 398
9.4.3 Nearest-neighbor and Cluster-based Methods .400
9.4.4 Maximum a Posteriori (Bayes) Classification .404
9.4.5 Artificial Neural Networks 407
9.4.6 Hidden Markov Models .414
9.5 Problems .415
10 Signal Decomposition .....................•...............•..............••••..•........•.••..•........•..•••..417
10.1 Segmentation ofImage Signals .418
10.1.1 Pixel-based Segmentation .418
10.1.2 Region-based Methods 423
10.1.3 Texture Elimination .425
10.1.4 Relaxation Methods .428
10.1.5 Image Region Labeling .433
10.2 Segmentation ofVideo Signals .434
10.2.1 Temporal Segmentation for Scene Changes .434
10.2.2 Combination of Spatial and Temporal Segmentation .436
10.2.3 Segmentation ofObjects based on Motion Information 438
10.3 Segmentation and Decomposition ofAudio Signals .440
XI
10.4 Problems .441
Part C: Coding of Multimedia Signals 443
11 Quantization and Coding 445
11.1 Scalar Quantization 445
11.2 Coding Theory 450
11.2.1 Source Coding Theorem and Rate Distortion Function .450
11.2.2 Rate-Distortion Function for Correlated Signals .451
11.2.3 Rate Distortion Function for Multi-dimensional Signals .454
11.3 Rate-Distortion Optimization of Quantizers .456
11.4 Entropy Coding .461
11.4.1 Properties ofVariable-length Codes .461
11.4.2 Huffman Codes 464
11.4.3 Systematic Variable-length Codes .466
11.4.4 Arithmetic Coding .470
11.4.5 Context-dependent Entropy Coding .475
11.4.6 Adaptive Entropy Coding .476
11.4.7 Entropy Coding and Transmission Errors .478
11.4.8 Run-length Coding .479
11.4.9 Lempel-Ziv Coding .481
11.5 Vector Quantization .483
11.5.1 Basic Principles ofVector Quantization 483
11.5.2 Vector Quantization with Uniform Codebooks .488
11.5.3 Vector Quantization with Non-uniform Codebooks .491
11.5.4 Structured Codebooks 494
11.5.5 Rate-constrained Vector Quantization .498
11.6 Sliding Block Coding 501
11.6.I Trellis Coding .502
11.6.2 Tree Coding 504
11.7 Problems 506
12 Still Image Coding 509
12.1 Compression ofBinary Images 509
12.2 Vector Quantization ofImages 514
12.3 Predictive Coding 521
12.3.1 DPCM Systems 521
12.3.2 Predictor filters in 2D DPCM 524
12.3.3 Quantization and Encoding ofPrediction Errors 526
12.3.4 Error propagation in DPCM 531
12.4 Transform Coding 533
12.4.1 Block Transform Coding 533
12.4.2 Subband and Wavelet Transform Coding 544
12.4.3 Vector Quantization ofTransform Coefficients 554
12.4.4 Adaptation oftransform bases to signal properties 557
XII
12.4.5 Transform coding and transmission losses 559
12.5 Fractal Coding 562
12.5.1 Principles of Fractal Transforms 563
12.5.2 Collage Theorem 563
12.5.3 Fractal Decoding 565
12.6 Region-based coding 571
12.6.1 Binary Shape Coding 57I
12.6.2 Contour shape coding 573
12.6.3 Coding within arbitrary-shaped Regions 575
12.7 Problems 578
13 Video Coding 583
13.1 Methods without Motion Compensation 583
13.1.1 Frame Replenishment 585
13.1.2 3D Transform and Subband coding 586
13.2 Hybrid Video Coding 590
13.2.1 Motion-compensated Hybrid Coders 590
13.2.2 Characteristics ofInterframe Prediction Error Signals 592
13.2.3 Quantization error feedback and error propagation 595
13.2.4 Forward, Backward and Multiframe Prediction 598
13.2.5 Bi-directional Prediction 600
13.2.6 Improved Methods ofmotion compensation 604
13.2.7 Hybrid Coding ofInteriaced Video Signals 611
13.2.8 Scalable Hybrid Coding 613
13.2.9 Multiple-description Video Coding 624
13.2.10 Optimization ofHybrid Encoders 627
13.3 MC Prediction Coding using the Wavelet Transform 629
13.3.1 Wavelet Transform in the Prediction Loop 630
13.3.2 Frequency Coding with In-band Motion Compensation 63I
13.4 Spatio-temporal Frequency Coding with MC 637
13.4.1 Temporal-axis Haar Filters with MC 638
13.4.2 Temporal-axis Lifting Filters for arbitrary MC 643
13.4.3 Improvements on Motion Compensation 653
13.4.4 Quantization and Encoding of 3D Wavelet Coefficients 656
13.4.5 Delay and Complexity onD Wavelet Coders 662
13.5 Encoding ofMotion Parameters 666
13.5.1 Spatial Contexts in Motion Coding 666
13.5.2 Temporal Contexts in Motion Coding 668
13.5.3 Fractal Video Coding 670
13.6 Problems 671
14 Audio Coding 673
14.1 Coding of Speech Signals 673
14.2 Waveform Coding ofAudio signals 676
XIII
14.3 Parametric Coding of Audio and Sound Signals 681
Part D: Applications and Standards 685
15 Transmission and Storage 687
15.1 Convergence ofDigital Multimedia Services 687
15.2 Adaptation to Channel Characteristics 690
15.2.1 Rate and Transmission Control 693
15.2.2 Error Control 697
15.3 Digital Broadcast. 703
15.4 Media Streaming 706
15.5 Content-based Media Access 711
15.6 Content Protection 715
16 Signal Composition, Rendering and Presentation 717
16.1 Composition and Mixing of Visual Signals 718
16.2 Warping and Morphing 724
16.3 Viewpoint Adaptation 725
16.4 Frame Rate Conversion 728
16.5 Rendering of Image and Video Signals 732
16.6 Composition and Rendering of Audio Signals 735
17 Multimedia Representation Standards 739
17.1 Interoperabilityand Compatibility 739
17.2 Definitions at Systems Level.. 745
17.3 Still Image Coding 751
17.3.1 The JBIG Standards 75I
17.3.2 The JPEG Standards 752
17.3.3 MPEG-4 Still Texture Coding 760
17.4 Video Coding 760
17.4.1 ITU-T Recommendations H.261 and H.263 761
17.4.2 MPEG-I and MPEG-2 764
17.4.3 MPEG-4 Visual 769
17.4.4 H.264/MPEG-4 Part 10 Advanced Video Coding (AVC) 774
17.5 Audio Coding 778
17.5.1 Speech Coding 778
17.5.2 Music and Sound Coding 779
17.6 Multimedia Content Description Standard MPEG-7 783
17.6.1 Elements of MPEG-7 Descriptions 785
17.6.2 Generic Multimedia Description Concepts 786
17.6.3 Visual Descriptors 789
XIV
17.6.4 Audio Descriptors 794
17.7 Multimedia Framework MPEG-21 797
Appendices 801
A Quality Measurement 803
A.l Signal Quality 803
A.l .l Objective Signal Quality Measurements 803
A.l .2 Subjective Assessment 806
A.2 Classification Quality 808
B Vector and Matrix Algebra 813
C Symbols and Variables 819
D Acronyms 823
References 829
Index 853
Part A: Multimedia Signal Processing and Analysis