Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu SQL Antipatterns- P4 ppt
Nội dung xem thử
Mô tả chi tiết
ANTIPATTERN: USING INDEXES WITHOUT A PLAN 151
Too Many Indexes
You benefit from an index only if you run queries that use that index.
There’s no benefit to creating indexes that you don’t use. Here are some
examples:
Download Index-Shotgun/anti/create-table.sql
CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
date_reported DATE NOT NULL,
summary VARCHAR(80) NOT NULL,
status VARCHAR(10) NOT NULL,
hours NUMERIC(9,2),
➊ INDEX (bug_id),
➋ INDEX (summary),
➌ INDEX (hours),
➍ INDEX (bug_id, date_reported, status)
);
In the previous example, there are several useless indexes:
➊ bug_id: Most databases create an index automatically for a primary
key, so it’s redundant to define another index. There’s no benefit
to it, and it could just be extra overhead. Each database brand
has its own rules for when to create an index automatically. You
need to read the documentation for the database you use.
➋ summary: An indexing for a long string datatype like VARCHAR(80) is
larger than an index for a more compact data type. Also, you’re
not likely to run queries that search or sort by the full summary
column.
➌ hours: This is another example of a column that you’re probably not
going to search for specific values.
➍ bug_id, date_reported, status: There are good reasons to use compound indexes, but many people create compound indexes that
are redundant or seldom used. Also, the order of columns in a
compound index is important; you should use the columns leftto-right in search criteria, join criteria, or sorting order.
Hedging Your Bets
Bill Cosby told a story about his vacation in Las Vegas: He was so
frustrated by losing in the casinos that he decided he had to win
something—once—before he left. So he bought $200 in quarter chips,
went to the roulette table, and put chips on every square, red and black.
He covered the table. The dealer spun the ball. . . and it fell on the floor.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
ANTIPATTERN: USING INDEXES WITHOUT A PLAN 152
Some people create indexes on every column—and every combination
of columns—because they don’t know which indexes will benefit their
queries. If you cover a database table with indexes, you incur a lot of
overhead with no assurance of payoff.
When No Index Can Help
The next type of mistake is to run a query that can’t use any index.
Developers create more and more indexes, trying to find some magical
combination of columns or index options to make their query run faster.
We can think of a database index using an analogy to a telephone book.
If I ask you to look up everyone in the telephone book whose last name
is Charles, it’s an easy task. All the people with the same last name are
listed together, because that’s how the telephone book is ordered.
However, if I ask you to look up everyone in the telephone book whose
first name is Charles, this doesn’t benefit from the order of names in the
book. Anyone can have that first name, regardless of their last name,
so you have to search through the entire book line by line.
The telephone book is ordered by last name and then by first name,
just like a compound database index on last_name, first_name. This index
doesn’t help you search by first name.
Download Index-Shotgun/anti/create-index.sql
CREATE INDEX TelephoneBook ON Accounts(last_name, first_name);
Some examples of queries that can’t benefit from this index include the
following:
• SELECT * FROM Accounts ORDER BY first_name, last_name;
This query shows the telephone book scenario. If you create a compound index for the columns last_name followed by first_name (as in
a telephone book), the index doesn’t help you sort primarily by
first_name.
• SELECT * FROM Bugs WHERE MONTH(date_reported) = 4;
Even if you create an index for the date_reported column, the order
of the index doesn’t help you search by month. The order of this
index is based on the entire date, starting with the year. But each
year has a fourth month, so the rows where the month is equal to
4 are scattered through the table.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
HOW TO RECOGNIZE THE ANTIPATTERN 153
Some databases support indexes on expressions, or indexes on
generated columns, as well as indexes on plain columns. But you
have to define the index prior to using it, and that index helps only
for the expression you specify in its definition.
• SELECT * FROM Bugs WHERE last_name = 'Charles' OR first_name = 'Charles';
We’re back to the problem that rows with that specific first name
are scattered unpredictably with respect to the order of the index
we defined. The result of the previous query is the same as the
result of the following:
SELECT * FROM Bugs WHERE last_name = 'Charles'
UNION
SELECT * FROM Bugs WHERE first_name = 'Charles';
The index in our example helps find that last name, but it doesn’t
help find that first name.
• SELECT * FROM Bugs WHERE description LIKE '%crash%';
Because the pattern in this search predicate could occur anywhere in the string, there’s no way the sorted index data structure
can help.
13.3 How to Recognize the Antipattern
The following are symptoms of the Index Shotgun antipattern:
• “Here’s my query; how can I make it faster?”
This is probably the single most common SQL question, but it’s
missing details about table description, indexes, data volume, and
measurements of performance and optimization. Without this
context, any answer is just guesswork.
• “I defined an index on every field; why isn’t it faster?”
This is the classic Index Shotgun antisolution. You’ve tried every
possible index—but you’re shooting in the dark.
• “I read that indexes make the database slow, so I don’t use them.”
Like many developers, you’re looking for a one-size-fits-all strategy
for performance improvement. No such blanket rule exists.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
LEGITIMATE USES OF THE ANTIPATTERN 154
Low-Selectivity Indexes
Selectivity is a statistic about a database index. It’s the ratio of
the number of distinct values in the index to the total number
of rows in the table:
SELECT COUNT(DISTINCT status) /
COUNT(status) AS selectivity FROM Bugs;
The lower the selectivity ratio, the less effective an index is. Why
is this? Let’s consider an analogy.
This book has an index of a different type: each entry in a
book’s index lists the pages where the entry’s words appear.
If a word appears frequently in the book, it may list many page
numbers. To find the part of the book you’re looking for, you
have to turn to each page in the list one by one.
Indexes don’t bother to list words that appear on too many
pages. If you have to flip back and forth from the index to the
pages of the book too much, then you might as well just read
the whole book cover to cover.
Likewise in a database index, if a given value appears on many
rows in the table, it’s more trouble to read the index than simply
to scan the entire table. In fact, in these cases it can actually
be more expensive to use that index.
Ideally your database tracks the selectivity of indexes and
shouldn’t use an index that gives no benefit.
13.4 Legitimate Uses of the Antipattern
If you need to design a database for general use, without knowing what
queries are important to optimize, you can’t be sure of which indexes
are best. You have to make an educated guess. It’s likely that you’ll
miss some indexes that could have given benefit. It’s also likely that
you’ll create some indexes that turn out to be unneeded. But you have
to make the best guess you can.
13.5 Solution: MENTOR Your Indexes
The Index Shotgun antipattern is about creating or dropping indexes
without reason, so let’s come up with ways to analyze a database and
find good reasons to include indexes or omit them.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
SOLUTION: MENTOR YOUR INDEXES 155
The Database Isn’t Always the Bottleneck
Common wisdom in software developer communities is that the
database is always the slowest part of your application and the
source of performance issues. However, this isn’t true.
For example, in one application I worked on, my manager
asked me to find out why it was so slow, and he insisted it was
the fault of the database. After I used a profiling tool to measure the application code, I found that it spent 80 percent of its
time parsing its own HTML output to find form fields so it could
populate values into forms. The performance issue had nothing
to do with the database queries.
Before making assumptions about where the performance
problem exists, use software diagnostic tools to measure. Otherwise, you could be practicing premature optimization.
You can use the mnemonic MENTOR to describe a checklist for analyzing your database for good index choices: Measure, Explain, Nominate,
Test, Optimize, and Rebuild.
Measure
You can’t make informed decisions without information. Most databases provide some way to log the time to execute SQL queries so you
can identify the operations with the greatest cost. For example:
• Microsoft SQL Server and Oracle both have SQL Trace facilities
and tools to report and analyze trace results. Microsoft calls this
tool the SQL Server Profiler, and Oracle calls it TKProf.
• MySQL and PostgreSQL can log queries that take longer to execute than a specified threshold of time. MySQL calls this the slow
query log, and its long_query_time configuration parameter defaults
to 10 seconds. PostgreSQL has a similar configuration variable
log_min_duration_statement.
PostgreSQL also has a companion tool called pgFouine, which
helps you analyze the query log and identify queries that need
attention (http://pgfouine.projects.postgresql.org/).
Once you know which queries account for the most time in your application, you know where you should focus your optimizing attention for
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.