Mastering algorithms with Perl

Page iii

Mastering Algorithms with Perl

Jon Orwant, Jarkko Hietaniemi,

and John Macdonald

Page iv

Mastering Algorithms with Perl

by Jon Orwant, Jarkko Hietaniemi. and John Macdonald

Printed in the United States of America.

Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.

Editors: Andy Oram and Jon Orwant

Production Editor: Melanie Wang

Printing History:

August 1999: First Edition.

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered

trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and

sellers to distinguish their products are claimed as trademarks. Where those designations

appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the

designations have been printed in caps or initial caps. The association between the image of a

wolf and the topic of Perl algorithms is a trademark of O'Reilly & Associates, Inc.

While every precaution has been taken in the preparation of this book, the publisher assumes no

responsibility for errors or omissions, or for damages resulting from the use of the information

contained herein.

ISBN: 1-56592-398-7 [1/00]

[M]]break

Page v

Table of Contents

Preface xi

1. Introduction 1

What Is an Algorithm? 1

Efficiency 8

Recurrent Themes in Algorithms 20

2. Basic Data Structures 24

Perl's Built-in Data Structures 25

Build Your Own Data Structure 26

A Simple Example 27

Perl Arrays: Many Data Structures in One 37

3. Advanced Data Structures 46

Linked Lists 47

Circular Linked Lists 60

Garbage Collection in Perl 62

Doubly-Linked Lists 65

Infinite Lists 71

The Cost of Traversal 72

Binary Trees 73

Heaps 91

Binary Heaps 92

Janus Heap 99

Page vi

The Heaps Module 99

Future CPAN Modules 101

4. Sorting 102

An Introduction to Sorting 102

All Sorts of Sorts 119

Sorting Algorithms Summary 151

5. Searching 157

Hash Search and Other Non-Searches 158

Lookup Searches 159

Generative Searches 175

6. Sets 203

Venn Diagrams 204

Creating Sets 205

Set Union and Intersection 209

Set Differences 217

Counting Set Elements 222

Set Relations 223

The Set Modules of CPAN 227

Sets of Sets 233

Multivalued Sets 240

Sets Summary 242

7. Matrices 244

Creating Matrices 246

Manipulating Individual Elements 246

Finding the Dimensions of a Matrix 247

Displaying Matrices 247

Adding or Multiplying Constants 248

Transposing a Matrix 254

Multiplying Matrices 256

Extracting a Submatrix 259

Combining Matrices 260

Inverting a Matrix 261

Computing the Determinant 262

Gaussian Elimination 263

Eigenvalues and Eigenvectors 266

Page vii

The Matrix Chain Product 269

Delving Deeper 272

8. Graphs 273

Vertices and Edges 276

Derived Graphs 281

Graph Attributes 286

Graph Representation in Computers 287

Graph Traversal 301

Paths and Bridges 310

Graph Biology: Trees, Forests, DAGS, Ancestors, and Descendants 312

Edge and Graph Classes 316

CPAN Graph Modules 351

9. Strings 353

Perl Builtins 354

String-Matching Algorithms 357

Phonetic Algorithms 388

Stemming and Inflection 389

Parsing 394

Compression 411

10. Geometric Algorithms 425

Distance 426

Area, Perimeter, and Volume 429

Direction 433

Intersection 435

Inclusion 443

Boundaries 449

Closest Pair of Points 457

Geometric Algorithms Summary 464

CPAN Graphics Modules 464

11. Number Systems 469

Integers and Reals 469

Strange Systems 480

Trigonometry 491

Significant Series 492

Page viii

12. Number Theory 499

Basic Number Theory 499

Prime Numbers 504

Unsolved Problems 522

13. Cryptography 526

Legal Issues 527

Authorizing People with Passwords 528

Authorization of Data: Checksums and More 533

Obscuring Data: Encryption 538

Hiding Data: Steganography 555

Winnowing and Chaffing 558

Encrypted Perl Code 562

Other Issues 564

14. Probability 566

Random Numbers 567

Events 569

Permutations and Combinations 571

Probability Distributions 574

Rolling Dice: Uniform Distributions 576

Loaded Dice and Candy Colors: Nonuniform Discrete Distributions 582

If the Blue Jays Score Six Runs: Conditional Probability 589

Flipping Coins over and Over: Infinite Discrete Distributions 590

How Much Snow? Continuous Distributions 591

Many More Distributions 592

15. Statistics 599

Statistical Measures 600

Significance Tests 608

Correlation 620

16. Numerical Analysis 626

Computing Derivatives and Integrals 627

Solving Equations 634

Interpolation, Extrapolation, and Curve Fitting 642

Page ix

A. Further Reading 649

B. ASCII Character Set 652

Index 657

Page xi

Preface

Perl's popularity has soared in recent years. It owes its appeal first to its technical superiority:

Perl's unparalleled portability, speed, and expressiveness have made it the language of choice

for a million programmers worldwide.

Those programmers have extended Perl in ways unimaginable with languages controlled by

committees or companies. Of all languages, Perl has the largest base of free utilities, thanks to

the Comprehensive Perl Archive Network (abbreviated CPAN; see

http://www.perl.com/CPAN/). The modules and scripts you'll find there have made Perl the

most popular language for web; text, and database programming.

But Perl can do more than that. You can solve complex problems in Perl more quickly, and in

fewer lines, than in any other language.

This ease of use makes Perl an excellent tool for exploring algorithms. Computer science

embraces complexity; the essence of programming is the clean dissection of a seemingly

insurmountable problem into a series of simple, computable steps. Perl is ideal for tackling the

tougher nuggets of computer science because its liberal syntax lets the programmer express his

or her solution in the manner best suited to the task. (After all, Perl's motto is There's More

Than One Way To Do It.) Algorithms are complex enough; we don't need a computer language

making it any tougher.

Most books about computer algorithms don't include working programs. They express their

ideas in quasi-English pseudocode instead, which allows the discussion to focus on concepts

without getting bogged down in implementation details. But sometimes the details are what

matter—the inefficiencies of a bad implementation sometimes cancel the speedup that a good

algorithm provides. The devil is in the details.break

Page xii

And while converting ideas to programs is often a good exercise, it's also just plain

time-consuming. So, in this book we've supplied you with not just explanations, but

implementations as well. If you read this book carefully, you'll learn more about both

algorithms and Perl.

About This Book

This book is written for two kinds of people: those who want cut and paste solutions and those

who want to hone their programming skills. You'll see how we solve some of the classic

problems of computer science and why we solved them the way we did.

Theory or Practice?

Like the wolf featured on the cover, this book is sometimes fierce and sometimes playful. The

fierce part is the computer science: we'll often talk like computer scientists talk and discuss

problems that matter little to the practical Perl programmer. Other times, we'll playfully

explain the problem and simply tell you about ready-made solutions you can find on the Internet

(almost always on CPAN).

Deciding when to be fierce and when to be playful hasn't been easy for us. For instance, every

algorithms textbook has a chapter on all of the different ways to sort a collection of items. So

do we, even though Perl provides its own sort() function that might be all you ever need.

We do this for four reasons. First, we don't want you thinking you've Mastered Algorithms

without understanding the algorithms covered in every college course on the subject. Second,

the concepts, processes, and strategies underlying those algorithms will come in handy for

more than just sorting. Third, it helps to know how Perl's sort() works under the hood, why

its particular algorithm (quicksort) was used, and how to avoid some of the inefficiencies that

even experienced Perl programmers fall prey to. Finally, sort() isn't always the best

solution! Someday, you might need another of the techniques we provide.

When it comes to the inevitable tradeoffs between theory and practice, programmers' tastes

vary. We have chosen a middle course, swiftly pouncing from one to the other with feral

abandon. If your tastes are exclusively theoretical or practical, we hope you'll still appreciate

the balanced diet you'll find here.

Organization of This Book

The chapters in this book can be read in isolation; they typically don't require knowledge from

previous chapters. However, we do recommend that you read at least Chapter 1, Introduction,

and Chapter 2, Basic Data Structures, which provide the basic material necessary for

understanding the rest of the book.break

Page xiii

Chapter 1 describes the basics of Perl and algorithms, with an emphasis on speed and general

problem-solving techniques.

Chapter 2 explains how to use Perl to create simple and very general representations, like

queues and lists of lists.

Chapter 3, Advanced Data Structures, shows how to build the classic computer science data

structures.

Chapter 4, Sorting, looks at techniques for ordering data and compares the advantages of each

technique.

Chapter 5, Searching, investigates ways to extract individual pieces of information from a

larger collection.

Chapter 6, Sets, discusses the basics of set theory and Perl implementations of set operations.

Chapter 7, Matrices, examines techniques for manipulating large arrays of data and solving

problems in linear algebra.

Chapter 8, Graphs, describes tools for solving problems that are best represented as a graph:

a collection of nodes connected by edges.

Chapter 9, Strings, explains how to implement algorithms for searching, filtering, and parsing

strings of text.

Chapter 10, Geometric Algorithms, looks at techniques for computing with two-and

three-dimensional constructs.

Chapter 11, Number Systems, investigates methods for generating important constants,

functions, and number series, as well as manipulating numbers in alternate coordinate systems.

Chapter 12, Number Theory, examines algorithms for factoring numbers, modular arithmetic,

and other techniques for computing with integers.

Chapter 13, Cryptography, demonstrates Perl utilities to conceal your data from prying eyes.

Chapter 14, Probability, discusses how to use Perl for problems involving chance.

Chapter 15, Statistics, describes methods for analyzing the accuracy of hypotheses and

characterizing the distribution of data.

Chapter 16, Numerical Analysis, looks at a few of the more common problems in scientific

computing.

Appendix A, Further Reading, contains an annotated bibliography.break

Page xiv

Appendix B, ASCII Character Set, lists the seven-bit ASCII character set used by default when

Perl sorts strings.

Conventions Used in This Book

Italic

Used for filenames, directory names, URLs, and occasional emphasis.

Constant width

Used for elements of programming languages, text manipulated by programs, code

examples, and output.

Constant width bold

Used for user input and for emphasis in code.

Constant width italic

Used for replaceable values.

What You Should Know before Reading This Book

Algorithms are typically the subject of an entire upper-level undergraduate course in computer

science departments. Obviously, we cannot hope to provide all of the mathematical and

programming background you'll need to get the most out of this book. We believe that the best

way to teach is never to coddle, but to explain complex concepts in an entertaining fashion and

thoroughly ground them in applications whenever possible. You don't need to be a computer

scientist to read this book, but once you've read it you might feel justified calling yourself one.

That said, if you don't know Perl, you don't want to start here. We recommend you begin with

either of these books published by O'Reilly & Associates: Randal L. Schwartz and Tom

Christiansen's Learning Perl if you're new to programming, and Larry Wall, Tom Christiansen,

and Randal L. Schwartz's Programming Perl if you're not.

If you want more rigorous explanations of the algorithms discussed in this book, we

recommend either Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest's

Introduction to Algorithms, published by MIT Press, or Donald Knuth's The Art of Computer

Programming, Volume 1 (Fundamental Algorithms) in particular. See Appendix A for full

bibliographic information.

What You Should Have before Reading This Book

This book assumes you have Perl 5.004 or better. If you don't, you can download it for free

from http://www.perl.com/CPAN/src.

This book often refers to CPAN modules, which are packages of Perl code you can download

for free from http://www.perl.com/CPAN/modules/by-module/. In partic-soft

Page xv

ular, the CPAN.pm module (http://www.perl.com/CPAN/modules/by-module/CPAN) can

automatically download, build, and install CPAN modules for you.

Typically, the modules in CPAN are usually quite robust because they're tested and used by

large user populations. You can check the Modules List (reachable by a link from

http://www.perl.com/CPAN/CPAN.html) to see how authors rate their modules; as a module

rating moves through ''idea," "under construction," "alpha," "beta," and finally to "Released,"

there is an increasing likelihood that it will behave properly.

Online Information about This Book

All of the programs in this book are available online from ftp://ftp.oreilly.com/, in the

directory /pub/examples/perl/algorithms/examples.tar.gz. If we learn of any errors in this

book, you'll be able to find them at /pub/examples/perl/algorithms/errata.txt.

Acknowledgments

Jon Orwant: I would like to thank all of the biological and computational entities that have

made this book possible. At the Media Laboratory, Walter Bender has somehow managed to

look the other way for twelve years while my distractions got the better of me. Various past

and present Media Labbers helped shape this book, knowingly or not: Nathan Abramson, Amy

Bruckman, Bill Butera, Pascal Chesnais, Judith Donath, Klee Dienes, Roger Kermode, Doug

Koen, Michelle Mcdonald, Chris Metcalfe, Warren Sack, Sunil Vemuri, and Chris Verplaetse.

The Miracle Crew helped in ways intangible, so thanks to Alan Blount, Richard Christie,

Diego Garcia, Carolyn Grantham, and Kyle Pope.

When Media Lab research didn't steal time from algorithms, The Perl Journal did, and so I'd

like to thank the people who helped ease the burden of running the magazine: Graham Barr,

David Blank-Edelman, Alan Blount, Sean M. Burke, Mark-Jason Dominus, Brian D. Foy,

Jeffrey Friedl, Felix Gallo, Kevin Lenzo, Steve Lidie, Tuomas J. Lukka, Chris Nandor, Sara

Ontiveros, Tim O'Reilly, Randy Ray, John Redford, Chip Salzenberg, Gurusamy Sarathy,

Lincoln D. Stein, Mike Stok, and all of the other contributors. Fellow philologist Tom

Christiansen helped birth the magazine, fellow sushi-lover Sara Ontiveros helped make

operations bearable, and fellow propagandist Nathan Torkington soon became indispensable.

Sandy Aronson, Francesca Pardo, Kim Scearce, and my parents, Jack and Carol, have all

tolerated and occasionally even encouraged my addiction to the computational arts. Finally,

Alan Blount and Nathan Torkington remain strikingly kindred spirits, and Robin Lucas has been

a continuous source of comfort and joy.break

Page xvi

Jarkko, John, and I would like to thank our team of technical reviewers: Tom Christiansen,

Damian Conway, Mark-Jason Dominus, Daniel Dreilinger, Dan Gruhl, Andi Karrer, Mike

Stok, Jeff Sumler, Sekhar Tatikonda, Nathan Torkington, and the enigmatic Abigail. Their

boundless expertise made this book substantially better. Abigail, Mark-Jason, Nathan, Tom,

and Damian went above and beyond the call of duty.

We would also like to thank the talented staff at O'Reilly for making this book possible, and for

their support of Perl in general. Andy Oram prodded us just the right amount, and his acute

editorial eye helped the book in countless ways. Melanie Wang, our production editor, paid

unbelievably exquisite attention to the tiniest details; Rhon Porter and Rob Romano made our

illustrations crisp and clean; and Lenny Muellner coped with our SGML.

As an editor and publisher, I've learned (usually the hard way) about the difficulties of editing

and disseminating Perl content. Having written a Perl book with another publisher, I've learned

how badly some of the publishing roles can be performed. And I quite simply cannot envision a

better collection of talent than the folks at O'Reilly. So in addition to the people who worked

on our book, I'd personally like to thank Gina Blaber, Mark Brokering, Mark Jacobsen, Lisa

Mann, Linda Mui, Tim O'Reilly, Madeleine Schnapp, Ellen Silver, Lisa Sloan, Linda Walsh,

Frank Willison, and all the other people I've had the pleasure of working with at O'Reilly &

Associates. Keep up the good work. Finally, we would all like to thank Larry Wall and the rest

of the Perl community for making the language as fun as it is.

Jarkko Hietaniemi: I want to thank my parents for their guidance, which led me to become so

hopelessly interested in so many things, including algorithms and Perl. My little sister I want to

thank for being herself. Nokia Research Center I need to thank for allowing me to write this

book even though it took much longer than originally planned. My friends and colleagues I must

thank for goading me on by constantly asking how the book was doing.

John Macdonald: First and foremost, I want to thank my wife, Chris. Her love, support, and

assistance was unflagging, even when the "one year offline" to write the book continued to

extend through the entirety of her "one year offline" to pursue further studies at university. An

additional special mention goes to Ailsa for many weekends of child-sitting while both parents

were offline. Much thanks to Elegant Communications for providing access to significant

amounts of computer resources, many dead trees, and much general assistance. Thanks to Bill

Mustard for the two-year loan of a portion of his library and for acting as a sounding board on

numerous occasions. I've also received a great deal of support and encouragement from many

other family members, friends, and co-workers (these groups overlap).break

Page xvii

Comments and Questions

Please address comments and questions concerning this book to the publisher:

O'Reilly & Associates, Inc.

101 Morris Street

Sebastopol, CA 95472

800-998-9938 (in the U.S. or Canada)

707-829-0515 (international/local)

707-829-0104 (FAX)

You can also send us messages electronically. To be put on our mailing list or to request a

catalog, send email to:

[email protected]

To ask technical questions or comment on the book, send email to:

[email protected]

Page 1

1—

Introduction

Computer Science is no more about computers than astronomy is about

telescopes.

—E. W. Dijkstra

In this chapter, we'll discuss how to "think algorithms"—how to design and analyze programs

that solve problems. We'll start with a gentle introduction to algorithms and a not-so-gentle

introduction to Perl, then consider some of the tradeoffs involved in choosing the right

implementation for your needs, and finally introduce some themes pervading the field:

recursion, divide-and-conquer, and dynamic programming.

What Is an Algorithm?

An algorithm is simply a technique—not necessarily computational—for solving a problem

step by step. Of course, all programs solve problems (except for the ones that create

problems). What elevates some techniques to the hallowed status of algorithm is that they

embody a general, reusable method that solves an entire class of problems. Programs are

created; algorithms are invented. Programs eventually become obsolete; algorithms are

permanent.

Of course, some algorithms are better than others. Consider the task of finding a word in a

dictionary. Whether it's a physical book or an online file containing one word per line, there

are different ways to locate the word you're looking for. You could look up a definition with a

linear search, by reading the dictionary from front to back until you happen across your word.

That's slow, unless your word happens to be at the very beginning of the alphabet. Or, you

could pick pages at random and scan them for your word. You might get lucky. Still, there's

obviously a better way. That better way is the binary search algorithm, which you'll

learncontinue

Page 2

about in Chapter 5, Searching. In fact, the binary search is provably the best algorithm for this

task.

A Sample Algorithm:

Binary Search

We'll use binary search to explore what an algorithm is, how we implement one in Perl, and

what it means for an algorithm to be general and efficient. In what follows, we'll assume that

we have an alphabetically ordered list of words, and we want to determine where our chosen

word appears in the list, if it even appears at all. In our program, each word is represented in

Perl as a scalar, which can be an integer, a floating-point number, or (as in this case) a string

of characters. Our list of words is stored in a Perl array: an ordered list of scalars. In Perl, all

scalars begin with an $ sign, and all arrays begin with an @ sign. The other common datatype in

Perl is the hash, denoted with a % sign. Hashes "map" one set of scalars (the "keys") to other

scalars (the "values").

Here's how our binary search works. At all times, there is a range of words, called a window,

that the algorithm is considering. If the word is in the list, it must be inside the window.

Initially, the window is the entire list: no surprise there. As the algorithm operates, it shrinks

the window. Sometimes it moves the top of the window down, and sometimes it moves the

bottom of the window up. Eventually, the window contains only the target word, or it contains

nothing at all and we know that the word must not be in the list.

The window is defined with two numbers: the lowest and highest locations (which we'll call

indices, since we're searching through an array) where the word might be found. Initially, the

window is the entire array, since the word could be anywhere. The lower bound of the window

is $low, and the higher bound is $high.

We then look at the word in the middle of the window; that is, the element with index ($low

+ $high) / 2. However, that expression might have a fractional value, so we wrap it in an

int() to ensure that we have an integer, yielding int(($low + $high) / 2). If that

word comes after our word alphabetically, we can decrease $high to this index. Likewise, if

the word is too low, we increase $low to this index.

Eventually, we'll end up with our word—or an empty window, in which case our subroutine

returns undef to signal that the word isn't present.

Before we show you the Perl program for binary search, let's first look at how this might be

written in other algorithm books. Here's a pseudocode "implementation" of binary search:break

BINARY-SEARCH(A, w)

1. low ¨ 0

2. high ¨ length[A]

Page 3

3. while low < high

4. do try ¨ int ((low + high) / 2)

5. if A[try] > w

6. then high ¨ try

7. else if A[try] < w

8. then low ¨ try + 1

9. else return try

10. end if

11. end if

12. end do

13. return NO_ELEMENT

And now the Perl program. Not only is it shorter, it's an honest-to-goodness working

subroutine.

# $index = binary_search( \@array, $word )

# @array is a list of lowercase strings in alphabetical order.

# $word is the target word that might be in the list.

# binary_search() returns the array index such that $array[$index]

# is $word.

sub binary_search {

my ($array, $word) = @_;

my ($low, $high) = ( 0, @$array - 1 );

while ( $low <= $high ) { # While the window is open

my $try = int( ($low+$high) /2 ); # Try the middle element

$low = $try+1, next if $array->[$try] lt $word; # Raise bottom

$high = $try-1, next if $array->[$try] gt $word; # Lower top

return $try; # We've found the word!

}

return; # The word isn't there.

}

Depending on how much Perl you know, this might seem crystal clear or hopelessly opaque. As

the preface said, if you don't know Perl, you probably don't want to learn it with this book.

Nevertheless, here's a brief description of the Perl syntax used in the binary_search()

subroutine.

Thư viện tri thức trực tuyến

Mastering algorithms with Perl

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Mastering algorithms with C

Mastering Algorithms with Perl phần 1 pps

Mastering Algorithms with Perl phần 6 ppsx

Mastering Algorithms with Perl phần 7 ppsx

Mastering Algorithms with Perl phần 8 pot

Mastering Algorithms with Perl phần 5 doc