Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu SQL Antipatterns- P4 ppt
MIỄN PHÍ
Số trang
50
Kích thước
296.6 KB
Định dạng
PDF
Lượt xem
896

Tài liệu SQL Antipatterns- P4 ppt

Nội dung xem thử

Mô tả chi tiết

ANTIPATTERN: USING INDEXES WITHOUT A PLAN 151

Too Many Indexes

You benefit from an index only if you run queries that use that index.

There’s no benefit to creating indexes that you don’t use. Here are some

examples:

Download Index-Shotgun/anti/create-table.sql

CREATE TABLE Bugs (

bug_id SERIAL PRIMARY KEY,

date_reported DATE NOT NULL,

summary VARCHAR(80) NOT NULL,

status VARCHAR(10) NOT NULL,

hours NUMERIC(9,2),

➊ INDEX (bug_id),

➋ INDEX (summary),

➌ INDEX (hours),

➍ INDEX (bug_id, date_reported, status)

);

In the previous example, there are several useless indexes:

➊ bug_id: Most databases create an index automatically for a primary

key, so it’s redundant to define another index. There’s no benefit

to it, and it could just be extra overhead. Each database brand

has its own rules for when to create an index automatically. You

need to read the documentation for the database you use.

➋ summary: An indexing for a long string datatype like VARCHAR(80) is

larger than an index for a more compact data type. Also, you’re

not likely to run queries that search or sort by the full summary

column.

➌ hours: This is another example of a column that you’re probably not

going to search for specific values.

➍ bug_id, date_reported, status: There are good reasons to use com￾pound indexes, but many people create compound indexes that

are redundant or seldom used. Also, the order of columns in a

compound index is important; you should use the columns left￾to-right in search criteria, join criteria, or sorting order.

Hedging Your Bets

Bill Cosby told a story about his vacation in Las Vegas: He was so

frustrated by losing in the casinos that he decided he had to win

something—once—before he left. So he bought $200 in quarter chips,

went to the roulette table, and put chips on every square, red and black.

He covered the table. The dealer spun the ball. . . and it fell on the floor.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ANTIPATTERN: USING INDEXES WITHOUT A PLAN 152

Some people create indexes on every column—and every combination

of columns—because they don’t know which indexes will benefit their

queries. If you cover a database table with indexes, you incur a lot of

overhead with no assurance of payoff.

When No Index Can Help

The next type of mistake is to run a query that can’t use any index.

Developers create more and more indexes, trying to find some magical

combination of columns or index options to make their query run faster.

We can think of a database index using an analogy to a telephone book.

If I ask you to look up everyone in the telephone book whose last name

is Charles, it’s an easy task. All the people with the same last name are

listed together, because that’s how the telephone book is ordered.

However, if I ask you to look up everyone in the telephone book whose

first name is Charles, this doesn’t benefit from the order of names in the

book. Anyone can have that first name, regardless of their last name,

so you have to search through the entire book line by line.

The telephone book is ordered by last name and then by first name,

just like a compound database index on last_name, first_name. This index

doesn’t help you search by first name.

Download Index-Shotgun/anti/create-index.sql

CREATE INDEX TelephoneBook ON Accounts(last_name, first_name);

Some examples of queries that can’t benefit from this index include the

following:

• SELECT * FROM Accounts ORDER BY first_name, last_name;

This query shows the telephone book scenario. If you create a com￾pound index for the columns last_name followed by first_name (as in

a telephone book), the index doesn’t help you sort primarily by

first_name.

• SELECT * FROM Bugs WHERE MONTH(date_reported) = 4;

Even if you create an index for the date_reported column, the order

of the index doesn’t help you search by month. The order of this

index is based on the entire date, starting with the year. But each

year has a fourth month, so the rows where the month is equal to

4 are scattered through the table.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

HOW TO RECOGNIZE THE ANTIPATTERN 153

Some databases support indexes on expressions, or indexes on

generated columns, as well as indexes on plain columns. But you

have to define the index prior to using it, and that index helps only

for the expression you specify in its definition.

• SELECT * FROM Bugs WHERE last_name = 'Charles' OR first_name = 'Charles';

We’re back to the problem that rows with that specific first name

are scattered unpredictably with respect to the order of the index

we defined. The result of the previous query is the same as the

result of the following:

SELECT * FROM Bugs WHERE last_name = 'Charles'

UNION

SELECT * FROM Bugs WHERE first_name = 'Charles';

The index in our example helps find that last name, but it doesn’t

help find that first name.

• SELECT * FROM Bugs WHERE description LIKE '%crash%';

Because the pattern in this search predicate could occur any￾where in the string, there’s no way the sorted index data structure

can help.

13.3 How to Recognize the Antipattern

The following are symptoms of the Index Shotgun antipattern:

• “Here’s my query; how can I make it faster?”

This is probably the single most common SQL question, but it’s

missing details about table description, indexes, data volume, and

measurements of performance and optimization. Without this

context, any answer is just guesswork.

• “I defined an index on every field; why isn’t it faster?”

This is the classic Index Shotgun antisolution. You’ve tried every

possible index—but you’re shooting in the dark.

• “I read that indexes make the database slow, so I don’t use them.”

Like many developers, you’re looking for a one-size-fits-all strategy

for performance improvement. No such blanket rule exists.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

LEGITIMATE USES OF THE ANTIPATTERN 154

Low-Selectivity Indexes

Selectivity is a statistic about a database index. It’s the ratio of

the number of distinct values in the index to the total number

of rows in the table:

SELECT COUNT(DISTINCT status) /

COUNT(status) AS selectivity FROM Bugs;

The lower the selectivity ratio, the less effective an index is. Why

is this? Let’s consider an analogy.

This book has an index of a different type: each entry in a

book’s index lists the pages where the entry’s words appear.

If a word appears frequently in the book, it may list many page

numbers. To find the part of the book you’re looking for, you

have to turn to each page in the list one by one.

Indexes don’t bother to list words that appear on too many

pages. If you have to flip back and forth from the index to the

pages of the book too much, then you might as well just read

the whole book cover to cover.

Likewise in a database index, if a given value appears on many

rows in the table, it’s more trouble to read the index than simply

to scan the entire table. In fact, in these cases it can actually

be more expensive to use that index.

Ideally your database tracks the selectivity of indexes and

shouldn’t use an index that gives no benefit.

13.4 Legitimate Uses of the Antipattern

If you need to design a database for general use, without knowing what

queries are important to optimize, you can’t be sure of which indexes

are best. You have to make an educated guess. It’s likely that you’ll

miss some indexes that could have given benefit. It’s also likely that

you’ll create some indexes that turn out to be unneeded. But you have

to make the best guess you can.

13.5 Solution: MENTOR Your Indexes

The Index Shotgun antipattern is about creating or dropping indexes

without reason, so let’s come up with ways to analyze a database and

find good reasons to include indexes or omit them.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

SOLUTION: MENTOR YOUR INDEXES 155

The Database Isn’t Always the Bottleneck

Common wisdom in software developer communities is that the

database is always the slowest part of your application and the

source of performance issues. However, this isn’t true.

For example, in one application I worked on, my manager

asked me to find out why it was so slow, and he insisted it was

the fault of the database. After I used a profiling tool to mea￾sure the application code, I found that it spent 80 percent of its

time parsing its own HTML output to find form fields so it could

populate values into forms. The performance issue had nothing

to do with the database queries.

Before making assumptions about where the performance

problem exists, use software diagnostic tools to measure. Oth￾erwise, you could be practicing premature optimization.

You can use the mnemonic MENTOR to describe a checklist for analyz￾ing your database for good index choices: Measure, Explain, Nominate,

Test, Optimize, and Rebuild.

Measure

You can’t make informed decisions without information. Most data￾bases provide some way to log the time to execute SQL queries so you

can identify the operations with the greatest cost. For example:

• Microsoft SQL Server and Oracle both have SQL Trace facilities

and tools to report and analyze trace results. Microsoft calls this

tool the SQL Server Profiler, and Oracle calls it TKProf.

• MySQL and PostgreSQL can log queries that take longer to exe￾cute than a specified threshold of time. MySQL calls this the slow

query log, and its long_query_time configuration parameter defaults

to 10 seconds. PostgreSQL has a similar configuration variable

log_min_duration_statement.

PostgreSQL also has a companion tool called pgFouine, which

helps you analyze the query log and identify queries that need

attention (http://pgfouine.projects.postgresql.org/).

Once you know which queries account for the most time in your appli￾cation, you know where you should focus your optimizing attention for

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Tải ngay đi em, còn do dự, trời tối mất!