Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Mastering algorithms with Perl
Nội dung xem thử
Mô tả chi tiết
Page iii
Mastering Algorithms with Perl
Jon Orwant, Jarkko Hietaniemi,
and John Macdonald
Page iv
Mastering Algorithms with Perl
by Jon Orwant, Jarkko Hietaniemi. and John Macdonald
Copyright © 1999 O'Reilly & Associates, Inc. All rights reserved.
Printed in the United States of America.
Cover illustration by Lorrie LeJeune, Copyright © 1999 O'Reilly & Associates, Inc.
Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
Editors: Andy Oram and Jon Orwant
Production Editor: Melanie Wang
Printing History:
August 1999: First Edition.
Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and
sellers to distinguish their products are claimed as trademarks. Where those designations
appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the
designations have been printed in caps or initial caps. The association between the image of a
wolf and the topic of Perl algorithms is a trademark of O'Reilly & Associates, Inc.
While every precaution has been taken in the preparation of this book, the publisher assumes no
responsibility for errors or omissions, or for damages resulting from the use of the information
contained herein.
ISBN: 1-56592-398-7 [1/00]
[M]]break
Page v
Table of Contents
Preface xi
1. Introduction 1
What Is an Algorithm? 1
Efficiency 8
Recurrent Themes in Algorithms 20
2. Basic Data Structures 24
Perl's Built-in Data Structures 25
Build Your Own Data Structure 26
A Simple Example 27
Perl Arrays: Many Data Structures in One 37
3. Advanced Data Structures 46
Linked Lists 47
Circular Linked Lists 60
Garbage Collection in Perl 62
Doubly-Linked Lists 65
Doubly-Linked Lists 65
Infinite Lists 71
The Cost of Traversal 72
Binary Trees 73
Heaps 91
Binary Heaps 92
Janus Heap 99
Page vi
The Heaps Module 99
Future CPAN Modules 101
4. Sorting 102
An Introduction to Sorting 102
All Sorts of Sorts 119
Sorting Algorithms Summary 151
5. Searching 157
Hash Search and Other Non-Searches 158
Lookup Searches 159
Generative Searches 175
6. Sets 203
Venn Diagrams 204
Creating Sets 205
Set Union and Intersection 209
Set Differences 217
Set Differences 217
Counting Set Elements 222
Set Relations 223
The Set Modules of CPAN 227
Sets of Sets 233
Multivalued Sets 240
Sets Summary 242
7. Matrices 244
Creating Matrices 246
Manipulating Individual Elements 246
Finding the Dimensions of a Matrix 247
Displaying Matrices 247
Adding or Multiplying Constants 248
Transposing a Matrix 254
Multiplying Matrices 256
Extracting a Submatrix 259
Combining Matrices 260
Inverting a Matrix 261
Computing the Determinant 262
Gaussian Elimination 263
Eigenvalues and Eigenvectors 266
Page vii
The Matrix Chain Product 269
The Matrix Chain Product 269
Delving Deeper 272
8. Graphs 273
Vertices and Edges 276
Derived Graphs 281
Graph Attributes 286
Graph Representation in Computers 287
Graph Traversal 301
Paths and Bridges 310
Graph Biology: Trees, Forests, DAGS, Ancestors, and Descendants 312
Edge and Graph Classes 316
CPAN Graph Modules 351
9. Strings 353
Perl Builtins 354
String-Matching Algorithms 357
Phonetic Algorithms 388
Stemming and Inflection 389
Parsing 394
Compression 411
10. Geometric Algorithms 425
Distance 426
Area, Perimeter, and Volume 429
Direction 433
Intersection 435
Intersection 435
Inclusion 443
Boundaries 449
Closest Pair of Points 457
Geometric Algorithms Summary 464
CPAN Graphics Modules 464
11. Number Systems 469
Integers and Reals 469
Strange Systems 480
Trigonometry 491
Significant Series 492
Page viii
12. Number Theory 499
Basic Number Theory 499
Prime Numbers 504
Unsolved Problems 522
13. Cryptography 526
Legal Issues 527
Authorizing People with Passwords 528
Authorization of Data: Checksums and More 533
Obscuring Data: Encryption 538
Hiding Data: Steganography 555
Winnowing and Chaffing 558
Winnowing and Chaffing 558
Encrypted Perl Code 562
Other Issues 564
14. Probability 566
Random Numbers 567
Events 569
Permutations and Combinations 571
Probability Distributions 574
Rolling Dice: Uniform Distributions 576
Loaded Dice and Candy Colors: Nonuniform Discrete Distributions 582
If the Blue Jays Score Six Runs: Conditional Probability 589
Flipping Coins over and Over: Infinite Discrete Distributions 590
How Much Snow? Continuous Distributions 591
Many More Distributions 592
15. Statistics 599
Statistical Measures 600
Significance Tests 608
Correlation 620
16. Numerical Analysis 626
Computing Derivatives and Integrals 627
Solving Equations 634
Interpolation, Extrapolation, and Curve Fitting 642
Page ix
A. Further Reading 649
B. ASCII Character Set 652
Index 657
Page xi
Preface
Perl's popularity has soared in recent years. It owes its appeal first to its technical superiority:
Perl's unparalleled portability, speed, and expressiveness have made it the language of choice
for a million programmers worldwide.
Those programmers have extended Perl in ways unimaginable with languages controlled by
committees or companies. Of all languages, Perl has the largest base of free utilities, thanks to
the Comprehensive Perl Archive Network (abbreviated CPAN; see
http://www.perl.com/CPAN/). The modules and scripts you'll find there have made Perl the
most popular language for web; text, and database programming.
But Perl can do more than that. You can solve complex problems in Perl more quickly, and in
fewer lines, than in any other language.
This ease of use makes Perl an excellent tool for exploring algorithms. Computer science
embraces complexity; the essence of programming is the clean dissection of a seemingly
insurmountable problem into a series of simple, computable steps. Perl is ideal for tackling the
tougher nuggets of computer science because its liberal syntax lets the programmer express his
or her solution in the manner best suited to the task. (After all, Perl's motto is There's More
Than One Way To Do It.) Algorithms are complex enough; we don't need a computer language
making it any tougher.
Most books about computer algorithms don't include working programs. They express their
ideas in quasi-English pseudocode instead, which allows the discussion to focus on concepts
without getting bogged down in implementation details. But sometimes the details are what
matter—the inefficiencies of a bad implementation sometimes cancel the speedup that a good
algorithm provides. The devil is in the details.break
Page xii
And while converting ideas to programs is often a good exercise, it's also just plain
time-consuming. So, in this book we've supplied you with not just explanations, but
implementations as well. If you read this book carefully, you'll learn more about both
algorithms and Perl.
About This Book
This book is written for two kinds of people: those who want cut and paste solutions and those
who want to hone their programming skills. You'll see how we solve some of the classic
problems of computer science and why we solved them the way we did.
Theory or Practice?
Like the wolf featured on the cover, this book is sometimes fierce and sometimes playful. The
fierce part is the computer science: we'll often talk like computer scientists talk and discuss
problems that matter little to the practical Perl programmer. Other times, we'll playfully
explain the problem and simply tell you about ready-made solutions you can find on the Internet
(almost always on CPAN).
Deciding when to be fierce and when to be playful hasn't been easy for us. For instance, every
algorithms textbook has a chapter on all of the different ways to sort a collection of items. So
do we, even though Perl provides its own sort() function that might be all you ever need.
We do this for four reasons. First, we don't want you thinking you've Mastered Algorithms
without understanding the algorithms covered in every college course on the subject. Second,
the concepts, processes, and strategies underlying those algorithms will come in handy for
more than just sorting. Third, it helps to know how Perl's sort() works under the hood, why
its particular algorithm (quicksort) was used, and how to avoid some of the inefficiencies that
even experienced Perl programmers fall prey to. Finally, sort() isn't always the best
solution! Someday, you might need another of the techniques we provide.
When it comes to the inevitable tradeoffs between theory and practice, programmers' tastes
vary. We have chosen a middle course, swiftly pouncing from one to the other with feral
abandon. If your tastes are exclusively theoretical or practical, we hope you'll still appreciate
the balanced diet you'll find here.
Organization of This Book
The chapters in this book can be read in isolation; they typically don't require knowledge from
previous chapters. However, we do recommend that you read at least Chapter 1, Introduction,
and Chapter 2, Basic Data Structures, which provide the basic material necessary for
understanding the rest of the book.break
Page xiii
Chapter 1 describes the basics of Perl and algorithms, with an emphasis on speed and general
problem-solving techniques.
Chapter 2 explains how to use Perl to create simple and very general representations, like
queues and lists of lists.
Chapter 3, Advanced Data Structures, shows how to build the classic computer science data
structures.
Chapter 4, Sorting, looks at techniques for ordering data and compares the advantages of each
technique.
Chapter 5, Searching, investigates ways to extract individual pieces of information from a
larger collection.
Chapter 6, Sets, discusses the basics of set theory and Perl implementations of set operations.
Chapter 7, Matrices, examines techniques for manipulating large arrays of data and solving
problems in linear algebra.
Chapter 8, Graphs, describes tools for solving problems that are best represented as a graph:
a collection of nodes connected by edges.
Chapter 9, Strings, explains how to implement algorithms for searching, filtering, and parsing
strings of text.
Chapter 10, Geometric Algorithms, looks at techniques for computing with two-and
three-dimensional constructs.
Chapter 11, Number Systems, investigates methods for generating important constants,
functions, and number series, as well as manipulating numbers in alternate coordinate systems.
Chapter 12, Number Theory, examines algorithms for factoring numbers, modular arithmetic,
and other techniques for computing with integers.
Chapter 13, Cryptography, demonstrates Perl utilities to conceal your data from prying eyes.
Chapter 14, Probability, discusses how to use Perl for problems involving chance.
Chapter 15, Statistics, describes methods for analyzing the accuracy of hypotheses and
characterizing the distribution of data.
Chapter 16, Numerical Analysis, looks at a few of the more common problems in scientific
computing.
Appendix A, Further Reading, contains an annotated bibliography.break
Page xiv
Appendix B, ASCII Character Set, lists the seven-bit ASCII character set used by default when
Perl sorts strings.
Conventions Used in This Book
Italic
Used for filenames, directory names, URLs, and occasional emphasis.
Constant width
Used for elements of programming languages, text manipulated by programs, code
examples, and output.
Constant width bold
Used for user input and for emphasis in code.
Constant width italic
Used for replaceable values.
What You Should Know before Reading This Book
Algorithms are typically the subject of an entire upper-level undergraduate course in computer
science departments. Obviously, we cannot hope to provide all of the mathematical and
programming background you'll need to get the most out of this book. We believe that the best
way to teach is never to coddle, but to explain complex concepts in an entertaining fashion and
thoroughly ground them in applications whenever possible. You don't need to be a computer
scientist to read this book, but once you've read it you might feel justified calling yourself one.
That said, if you don't know Perl, you don't want to start here. We recommend you begin with
either of these books published by O'Reilly & Associates: Randal L. Schwartz and Tom
Christiansen's Learning Perl if you're new to programming, and Larry Wall, Tom Christiansen,
and Randal L. Schwartz's Programming Perl if you're not.
If you want more rigorous explanations of the algorithms discussed in this book, we
recommend either Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest's
Introduction to Algorithms, published by MIT Press, or Donald Knuth's The Art of Computer
Programming, Volume 1 (Fundamental Algorithms) in particular. See Appendix A for full
bibliographic information.
What You Should Have before Reading This Book
This book assumes you have Perl 5.004 or better. If you don't, you can download it for free
from http://www.perl.com/CPAN/src.
This book often refers to CPAN modules, which are packages of Perl code you can download
for free from http://www.perl.com/CPAN/modules/by-module/. In partic-soft
Page xv
ular, the CPAN.pm module (http://www.perl.com/CPAN/modules/by-module/CPAN) can
automatically download, build, and install CPAN modules for you.
Typically, the modules in CPAN are usually quite robust because they're tested and used by
large user populations. You can check the Modules List (reachable by a link from
http://www.perl.com/CPAN/CPAN.html) to see how authors rate their modules; as a module
rating moves through ''idea," "under construction," "alpha," "beta," and finally to "Released,"
there is an increasing likelihood that it will behave properly.
Online Information about This Book
All of the programs in this book are available online from ftp://ftp.oreilly.com/, in the
directory /pub/examples/perl/algorithms/examples.tar.gz. If we learn of any errors in this
book, you'll be able to find them at /pub/examples/perl/algorithms/errata.txt.
Acknowledgments
Jon Orwant: I would like to thank all of the biological and computational entities that have
made this book possible. At the Media Laboratory, Walter Bender has somehow managed to
look the other way for twelve years while my distractions got the better of me. Various past
and present Media Labbers helped shape this book, knowingly or not: Nathan Abramson, Amy
Bruckman, Bill Butera, Pascal Chesnais, Judith Donath, Klee Dienes, Roger Kermode, Doug
Koen, Michelle Mcdonald, Chris Metcalfe, Warren Sack, Sunil Vemuri, and Chris Verplaetse.
The Miracle Crew helped in ways intangible, so thanks to Alan Blount, Richard Christie,
Diego Garcia, Carolyn Grantham, and Kyle Pope.
When Media Lab research didn't steal time from algorithms, The Perl Journal did, and so I'd
like to thank the people who helped ease the burden of running the magazine: Graham Barr,
David Blank-Edelman, Alan Blount, Sean M. Burke, Mark-Jason Dominus, Brian D. Foy,
Jeffrey Friedl, Felix Gallo, Kevin Lenzo, Steve Lidie, Tuomas J. Lukka, Chris Nandor, Sara
Ontiveros, Tim O'Reilly, Randy Ray, John Redford, Chip Salzenberg, Gurusamy Sarathy,
Lincoln D. Stein, Mike Stok, and all of the other contributors. Fellow philologist Tom
Christiansen helped birth the magazine, fellow sushi-lover Sara Ontiveros helped make
operations bearable, and fellow propagandist Nathan Torkington soon became indispensable.
Sandy Aronson, Francesca Pardo, Kim Scearce, and my parents, Jack and Carol, have all
tolerated and occasionally even encouraged my addiction to the computational arts. Finally,
Alan Blount and Nathan Torkington remain strikingly kindred spirits, and Robin Lucas has been
a continuous source of comfort and joy.break
Page xvi
Jarkko, John, and I would like to thank our team of technical reviewers: Tom Christiansen,
Damian Conway, Mark-Jason Dominus, Daniel Dreilinger, Dan Gruhl, Andi Karrer, Mike
Stok, Jeff Sumler, Sekhar Tatikonda, Nathan Torkington, and the enigmatic Abigail. Their
boundless expertise made this book substantially better. Abigail, Mark-Jason, Nathan, Tom,
and Damian went above and beyond the call of duty.
We would also like to thank the talented staff at O'Reilly for making this book possible, and for
their support of Perl in general. Andy Oram prodded us just the right amount, and his acute
editorial eye helped the book in countless ways. Melanie Wang, our production editor, paid
unbelievably exquisite attention to the tiniest details; Rhon Porter and Rob Romano made our
illustrations crisp and clean; and Lenny Muellner coped with our SGML.
As an editor and publisher, I've learned (usually the hard way) about the difficulties of editing
and disseminating Perl content. Having written a Perl book with another publisher, I've learned
how badly some of the publishing roles can be performed. And I quite simply cannot envision a
better collection of talent than the folks at O'Reilly. So in addition to the people who worked
on our book, I'd personally like to thank Gina Blaber, Mark Brokering, Mark Jacobsen, Lisa
Mann, Linda Mui, Tim O'Reilly, Madeleine Schnapp, Ellen Silver, Lisa Sloan, Linda Walsh,
Frank Willison, and all the other people I've had the pleasure of working with at O'Reilly &
Associates. Keep up the good work. Finally, we would all like to thank Larry Wall and the rest
of the Perl community for making the language as fun as it is.
Jarkko Hietaniemi: I want to thank my parents for their guidance, which led me to become so
hopelessly interested in so many things, including algorithms and Perl. My little sister I want to
thank for being herself. Nokia Research Center I need to thank for allowing me to write this
book even though it took much longer than originally planned. My friends and colleagues I must
thank for goading me on by constantly asking how the book was doing.
John Macdonald: First and foremost, I want to thank my wife, Chris. Her love, support, and
assistance was unflagging, even when the "one year offline" to write the book continued to
extend through the entirety of her "one year offline" to pursue further studies at university. An
additional special mention goes to Ailsa for many weekends of child-sitting while both parents
were offline. Much thanks to Elegant Communications for providing access to significant
amounts of computer resources, many dead trees, and much general assistance. Thanks to Bill
Mustard for the two-year loan of a portion of his library and for acting as a sounding board on
numerous occasions. I've also received a great deal of support and encouragement from many
other family members, friends, and co-workers (these groups overlap).break
Page xvii
Comments and Questions
Please address comments and questions concerning this book to the publisher:
O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
800-998-9938 (in the U.S. or Canada)
707-829-0515 (international/local)
707-829-0104 (FAX)
You can also send us messages electronically. To be put on our mailing list or to request a
catalog, send email to:
To ask technical questions or comment on the book, send email to:
Page 1
1—
Introduction
Computer Science is no more about computers than astronomy is about
telescopes.
—E. W. Dijkstra
In this chapter, we'll discuss how to "think algorithms"—how to design and analyze programs
that solve problems. We'll start with a gentle introduction to algorithms and a not-so-gentle
introduction to Perl, then consider some of the tradeoffs involved in choosing the right
implementation for your needs, and finally introduce some themes pervading the field:
recursion, divide-and-conquer, and dynamic programming.
What Is an Algorithm?
An algorithm is simply a technique—not necessarily computational—for solving a problem
step by step. Of course, all programs solve problems (except for the ones that create
problems). What elevates some techniques to the hallowed status of algorithm is that they
embody a general, reusable method that solves an entire class of problems. Programs are
created; algorithms are invented. Programs eventually become obsolete; algorithms are
permanent.
Of course, some algorithms are better than others. Consider the task of finding a word in a
dictionary. Whether it's a physical book or an online file containing one word per line, there
are different ways to locate the word you're looking for. You could look up a definition with a
linear search, by reading the dictionary from front to back until you happen across your word.
That's slow, unless your word happens to be at the very beginning of the alphabet. Or, you
could pick pages at random and scan them for your word. You might get lucky. Still, there's
obviously a better way. That better way is the binary search algorithm, which you'll
learncontinue
Page 2
about in Chapter 5, Searching. In fact, the binary search is provably the best algorithm for this
task.
A Sample Algorithm:
Binary Search
We'll use binary search to explore what an algorithm is, how we implement one in Perl, and
what it means for an algorithm to be general and efficient. In what follows, we'll assume that
we have an alphabetically ordered list of words, and we want to determine where our chosen
word appears in the list, if it even appears at all. In our program, each word is represented in
Perl as a scalar, which can be an integer, a floating-point number, or (as in this case) a string
of characters. Our list of words is stored in a Perl array: an ordered list of scalars. In Perl, all
scalars begin with an $ sign, and all arrays begin with an @ sign. The other common datatype in
Perl is the hash, denoted with a % sign. Hashes "map" one set of scalars (the "keys") to other
scalars (the "values").
Here's how our binary search works. At all times, there is a range of words, called a window,
that the algorithm is considering. If the word is in the list, it must be inside the window.
Initially, the window is the entire list: no surprise there. As the algorithm operates, it shrinks
the window. Sometimes it moves the top of the window down, and sometimes it moves the
bottom of the window up. Eventually, the window contains only the target word, or it contains
nothing at all and we know that the word must not be in the list.
The window is defined with two numbers: the lowest and highest locations (which we'll call
indices, since we're searching through an array) where the word might be found. Initially, the
window is the entire array, since the word could be anywhere. The lower bound of the window
is $low, and the higher bound is $high.
We then look at the word in the middle of the window; that is, the element with index ($low
+ $high) / 2. However, that expression might have a fractional value, so we wrap it in an
int() to ensure that we have an integer, yielding int(($low + $high) / 2). If that
word comes after our word alphabetically, we can decrease $high to this index. Likewise, if
the word is too low, we increase $low to this index.
Eventually, we'll end up with our word—or an empty window, in which case our subroutine
returns undef to signal that the word isn't present.
Before we show you the Perl program for binary search, let's first look at how this might be
written in other algorithm books. Here's a pseudocode "implementation" of binary search:break
BINARY-SEARCH(A, w)
1. low ¨ 0
2. high ¨ length[A]
Page 3
3. while low < high
4. do try ¨ int ((low + high) / 2)
5. if A[try] > w
6. then high ¨ try
7. else if A[try] < w
8. then low ¨ try + 1
9. else return try
10. end if
11. end if
12. end do
13. return NO_ELEMENT
And now the Perl program. Not only is it shorter, it's an honest-to-goodness working
subroutine.
# $index = binary_search( \@array, $word )
# @array is a list of lowercase strings in alphabetical order.
# $word is the target word that might be in the list.
# binary_search() returns the array index such that $array[$index]
# is $word.
sub binary_search {
my ($array, $word) = @_;
my ($low, $high) = ( 0, @$array - 1 );
while ( $low <= $high ) { # While the window is open
my $try = int( ($low+$high) /2 ); # Try the middle element
$low = $try+1, next if $array->[$try] lt $word; # Raise bottom
$high = $try-1, next if $array->[$try] gt $word; # Lower top
return $try; # We've found the word!
}
return; # The word isn't there.
}
Depending on how much Perl you know, this might seem crystal clear or hopelessly opaque. As
the preface said, if you don't know Perl, you probably don't want to learn it with this book.
Nevertheless, here's a brief description of the Perl syntax used in the binary_search()
subroutine.