Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Google hacks
PREMIUM
Số trang
384
Kích thước
2.9 MB
Định dạng
PDF
Lượt xem
1152

Google hacks

Nội dung xem thử

Mô tả chi tiết

Table of Contents

Credits

Foreword

Preface

Chapter 1. Searching Google

1. Setting Preferences

2. Language Tools

3. Anatomy of a Search Result

4. Specialized Vocabularies: Slang and Terminology

5. Getting Around the 10 Word Limit

6. Word Order Matters

7. Repetition Matters

8. Mixing Syntaxes

9. Hacking Google URLs

10. Hacking Google Search Forms

11. Date-Range Searching

12. Understanding and Using Julian Dates

13. Using Full-Word Wildcards

14. inurl: Versus site:

15. Checking Spelling

16. Consulting the Dictionary

17. Consulting the Phonebook

18. Tracking Stocks

19. Google Interface for Translators

20. Searching Article Archives

21. Finding Directories of Information

22. Finding Technical Definitions

23. Finding Weblog Commentary

24. The Google Toolbar

25. The Mozilla Google Toolbar

26. The Quick Search Toolbar

27. GAPIS

28. Googling with Bookmarklets

Chapter 2. Google Special Services and Collections

29. Google Directory

30. Google Groups

31. Google Images

32. Google News

33. Google Catalogs

34. Froogle

35. Google Labs

Chapter 3. Third-Party Google Services

36. XooMLe: The Google API in Plain Old XML

37. Google by Email

38. Simplifying Google Groups URLs

39. What Does Google Think Of...

40. GooglePeople

Chapter 4. Non-API Google Applications

41. Don't Try This at Home

42. Building a Custom Date-Range Search Form

43. Building Google Directory URLs

44. Scraping Google Results

45. Scraping Google AdWords

46. Scraping Google Groups

47. Scraping Google News

48. Scraping Google Catalogs

49. Scraping the Google Phonebook

Chapter 5. Introducing the Google Web API

50. Programming the Google Web API with Perl

51. Looping Around the 10-Result Limit

52. The SOAP::Lite Perl Module

53. Plain Old XML, a SOAP::Lite Alternative

54. NoXML, Another SOAP::Lite Alternative

55. Programming the Google Web API with PHP

56. Programming the Google Web API with Java

57. Programming the Google Web API with Python

58. Programming the Google Web API with C# and .NET

59. Programming the Google Web API with VB.NET

Chapter 6. Google Web API Applications

60. Date-Range Searching with a Client-Side Application

61. Adding a Little Google to Your Word

62. Permuting a Query

63. Tracking Result Counts over Time

64. Visualizing Google Results

65. Meandering Your Google Neighborhood

66. Running a Google Popularity Contest

67. Building a Google Box

68. Capturing a Moment in Time

69. Feeling Really Lucky

70. Gleaning Phonebook Stats

71. Performing Proximity Searches

72. Blending the Google and Amazon Web Services

73. Getting Random Results (On Purpose)

74. Restricting Searches to Top-Level Results

75. Searching for Special Characters

76. Digging Deeper into Sites

77. Summarizing Results by Domain

78. Scraping Yahoo! Buzz for a Google Search

79. Measuring Google Mindshare

80. Comparing Google Results with Those of Other Search Engines

81. SafeSearch Certifying URLs

82. Syndicating Google Search Results

83. Searching Google Topics

84. Finding the Largest Page

85. Instant Messaging Google

Chapter 7. Google Pranks and Games

86. The No-Result Search (Prank)

87. Google Whacking

88. GooPoetry

89. Creating Google Art

90. Google Bounce

91. Google Mirror

92. Finding Recipes

Chapter 8. The Webmaster Side of Google

93. A Webmaster's Introduction to Google

94. Generating Google AdWords

95. Inside the PageRank Algorithm

96. 26 Steps to 15K a Day

97. Being a Good Search Engine Citizen

98. Cleaning Up for a Google Visit

99. Getting the Most out of AdWords

100. Removing Your Materials from Google

Index

Foreword

When we started Google, it was hard to predict how big it would become. That our search engine

would someday serve as a catalyst for so many important web developments was a distant dream.

We are honored by the growing interest in Google and offer many thanks to those who created this

book—the largest and most comprehensive report on Google search technology that has yet to be

published.

Search is an amazing field of study, because it offers infinite possibilities for how we might find

and make information available to people. We join with the authors in encouraging readers to

approach this book with a view toward discovering and creating new ways to search. Google's

mission is to organize the world's information and make it universally accessible and useful, and

we welcome any contribution you make toward achieving this goal.

Hacking is the creativity that fuels the Web. As software developers ourselves, we applaud this

book for its adventurous spirit. We're adventurous, too, and were happy to discover that this book

highlights many of the same experiments we conduct on our free time here at Google.

Google is constantly adapting its search algorithms to match the dynamic growth and changing

nature of the Web. As you read, please keep in mind that the examples in this book are valid today

but, as Google innovates and grows over time, may become obsolete. We encourage you to follow

the latest developments and to participate in the ongoing discussions about search as facilitated by

books such as this one.

Virtually every engineer at Google has used an O'Reilly publication to help them with their jobs.

O'Reilly books are a staple of the Google engineering library, and we hope that Google Hacks will

be as useful to others as the O'Reilly publications have been to Google.

With the largest collection of web documents in the world, Google is a reflection of the Web. The

hacks in this book are not just about Google, they are also about unleashing the vast potential of

the Web today and in the years to come. Google Hacks is a great resource for search enthusiasts,

and we hope you enjoy it as much as we did.

Thanks,

The Google Engineering Team

December 11, 2002

Mountain View, California

Preface

Search engines for large collections of data preceded the World Wide Web by decades. There

were those massive library catalogs, hand-typed with painstaking precision on index cards and

eventually, to varying degrees, automated. There were the large data collections of professional

information companies such as Dialog and LexisNexis. Then there are the still-extant private,

expensive medical, real estate, and legal search services.

Those data collections were not always easy to search, but with a little finesse and a lot of patience,

it was always possible to search them thoroughly. Information was grouped according to

established ontologies, data preformatted according to particular guidelines.

Then came the Web.

Information on the Web—as anyone knows who's ever looked at half-a-dozen web pages knows—

is not all formatted the same way. Nor is it necessarily particularly accurate. Nor up to date. Nor

spellchecked. Nonetheless, search engines cropped up, trying to make sense of the rapidly￾increasing index of information online. Eventually, special syntaxes were added for searching

common parts of the average web page (such as title or URL). Search engines evolved rapidly,

trying to encompass all the nuances of the billions of documents online, and they still continue to

evolve today.

Google™ threw its hat into the ring in 1998. The second incarnation of a search engine service

known as BackRub, the name "Google" was a play on the word "googol," a one followed by a

hundred zeros. From the beginning, Google was different from the other major search engines

online—AltaVista, Excite, HotBot, and others.

Was it the technology? Partially. The relevance of Google's search results was outstanding and

worthy of comment. But more than that, Google's focus and more human face made it stand out

online.

With its friendly presentation and its constantly expanding set of options, it's no surprise that

Google continues to get lots of fans. There are weblogs devoted to it. Search engine newsletters,

such as ResearchBuzz, spend a lot of time covering Google. Legions of devoted fans spend lots of

time uncovering documented features, creating games (like Google whacking) and even coining

new words (like "Googling," the practice of checking out a prospective date or hire via Google's

search engine.)

In April 2002, Google reached out to its fan base by offering the Google API. The Google API

gives developers a legal way to access the Google search results with automated queries (any

other way of accessing Google's search results with automated software is against Google's Terms

of Service.)

Why Google Hacks?

"Hacks" are generally considered to be "quick-n-dirty" solutions to programming problems or

interesting techniques for getting a task done. But what does this kind of hacking have to do with

Google?

Considering the size of the Google index, there are many times when you might want to do a

particular kind of search and you get too many results for the search to be useful. Or you may

want to do a search that the current Google interface does not support.

The idea of Google Hacks is not to give you some exhaustive manual of how every command in

the Google syntax works, but rather to show you some tricks for making the best use of a search

and show applications of the Google API that perform searches that you can't perform using the

regular Google interface. In other words, hacks.

Dozens of programs and interfaces have sprung up from the Google API. Both games and serious

applications using Google's database of web pages are available from everybody from the serious

programmer to the devoted fan (like me).

How This Book Is Organized

The combination of Google's API and over 3 billion pages of constantly shifting data can do

strange things to your imagination and give you lots of new perspectives on how best to search.

This book goes beyond the instruction page to the idea of "hacks"—tips, tricks, and techniques

you can use to make your Google searching experience more fruitful, more fun, or (in a couple of

cases) just more weird. This book is divided into several chapters:

Chapter 1

This chapter describes the fundamentals of how Google's search properties work, with

some tips for making the most of Google's syntaxes and specialty search offerings.

Beyond the list of "this syntax means that," we'll take a look at how to eke every last bit

of searching power out of each syntax—and how to mix syntaxes for some truly monster

searches.

Chapter 2

Google goes beyond web searching into several different arenas, including images,

USENET, and news. Did you know that these collections have their own syntaxes? As

you'll learn in this section, Google's equally adroit at helping you holiday shop or search

for current events.

Chapter 3

Not all the hacks are ones that you want to install on your desktop or web server. In this

section, we'll take a look at third-party services that integrate the Google API with other

applications or act as handy web tools—or even check Google by email!

Chapter 4

Google's API doesn't search all Google properties, but sometimes it'd be real handy to

take that search for phone numbers or news stories and save it to a file. This collection of

scrapers shows you how.

Chapter 5

We'll take a look under the hood at Google's API, considering several different languages

and how Google works with each one. Hint: if you've always wanted to learn Perl but

never knew what to "do with it," this is your section.

Chapter 6

Once you've got an understanding of the Google API, you'll start thinking of all kinds of

ways you can use it. Take inspiration from this collection of useful applications that use

the Google API.

Chapter 7

All work and no play makes for a dull web surfer. This collection of pranks and games

turns Google into a poet, a mirror, and a master chef. Well, a chef anyway. Or at least

someone who throws ingredients together.

Chapter 8

If you're a web wrangler, you see Google from two sides—from the searcher side and

from the side of someone who wants to get the best search ranking for a web site. In this

section, you'll learn about Google's (in)famous PageRank, cleaning up for a Google visit,

and how to make sure your pages aren't indexed by Google if you don't want them there.

How to Use This Book

You can read this book from cover to cover if you like, but for the most part, each hack stands on

its own. So feel free to browse, flipping around whatever sections interest you most. If you're a

Perl "newbie," you might want to try some of the easier hacks and then tackle the more extensive

ones as you get more confident.

Conventions Used in This Book

The following is a list of the typographical conventions used in this book:

Italic

Used to indicate new terms, URLs, filenames, file extensions, directories, commands and

options, program names, and to highlight comments in examples. For example, a path in

the filesystem will appear as /Developer/Applications.

Constant width

Used to show code examples, verbatim Google searches, the contents of files, or the

output from commands.

Constant width bold

Used in examples and tables to show commands or other text that should be typed

literally.

Constant width italic

Used in examples and tables to show text that should be replaced with user-supplied

values.

Color

The second color is used to indicate a cross-reference within the text.

You should pay special attention to notes set apart from the text with the following icons:

This is a tip, suggestion, or a general note. It contains useful

supplementary information about the topic at hand.

This is a warning or note of caution.

The thermometer icons, found next to each hack, indicate the relative complexity of the hack:

beginner moderate expert

How to Contact Us

We have tested and verified the information in this book to the best of our ability, but you may

find that features have changed (or even that we have made mistakes!). As reader of this book,

you can help us to improve future editions by sending us your feedback. Please let us know about

any errors, inaccuracies, bugs, misleading or confusing statements, and typos that you find

anywhere in this book.

Please also let us know what we can do to make this book more useful to you. We take your

comments seriously and will try to incorporate reasonable suggestions into future editions. You

can write to us at:

O'Reilly & Associates, Inc.

1005 Gravenstein Hwy N.

Sebastopol, CA 95472

(800) 998-9938 (in the U.S. or Canada)

(707) 829-0515 (international/local)

(707) 829-0104 (fax)

To ask technical questions or to comment on the book, send email to:

[email protected]

The web site for Google Hacks lists examples, errata, and plans for future editions. You can find

this page at:

http://www.oreilly.com/catalog/googlehks/

For more information about this book and others, see the O'Reilly web site:

http://www.oreilly.com

Gotta Hack? To explore Hacks books online or to contribute a hack for future titles, visit:

http://hacks.oreilly.com

Chapter 1. Searching Google

Section 1.1. Hacks #1-28

Section 1.2. What Google Isn't

Section 1.3. What Google Is

Section 1.4. Google Basics

Section 1.5. The Special Syntaxes

Section 1.6. Advanced Search

Hack 1. Setting Preferences

Hack 2. Language Tools

Hack 3. Anatomy of a Search Result

Hack 4. Specialized Vocabularies: Slang and Terminology

Hack 5. Getting Around the 10 Word Limit

Hack 6. Word Order Matters

Hack 7. Repetition Matters

Hack 8. Mixing Syntaxes

Hack 9. Hacking Google URLs

Hack 10. Hacking Google Search Forms

Hack 11. Date-Range Searching

Hack 12. Understanding and Using Julian Dates

Hack 13. Using Full-Word Wildcards

Hack 14. inurl: Versus site:

Hack 15. Checking Spelling

Hack 16. Consulting the Dictionary

Hack 17. Consulting the Phonebook

Hack 18. Tracking Stocks

Tải ngay đi em, còn do dự, trời tối mất!