Tài liệu SQL Antipatterns- P5 ppt

SOLUTION: USE THE RIGHT TOOL FOR THE JOB 201

inverted index is a list of all words one might search for. In a manyto-many relationship, the index associates these words with the text

entries that contain the respective word. That is, a word like crash can

appear in many bugs, and each bug may match many other keywords.

This section shows how to design an inverted index.

First, define a table Keywords to list keywords for which users search,

and define an intersection table BugsKeywords to establish a many-tomany relationship:

Download Search/soln/inverted-index/create-table.sql

CREATE TABLE Keywords (

keyword_id SERIAL PRIMARY KEY,

keyword VARCHAR(40) NOT NULL,

UNIQUE KEY (keyword)

);

CREATE TABLE BugsKeywords (

keyword_id BIGINT UNSIGNED NOT NULL,

bug_id BIGINT UNSIGNED NOT NULL,

PRIMARY KEY (keyword_id, bug_id),

FOREIGN KEY (keyword_id) REFERENCES Keywords(keyword_id),

FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id)

);

Next, add a row to BugsKeywords for every keyword that matches the

description text for a given bug. We can use substring-match query

to determine these matches using LIKE or regular expressions. This is

nothing more costly than the naive searching method described in the

“Antipattern” section, but we gain efficiency because we only need to

perform the search once. If we save the result in the intersection table,

all subsequent searches for the same keyword are much faster.

Next, we write a stored procedure to make it easier to search for a

given keyword.3

If the word has already been searched, the query is

faster because the rows in BugsKeywords are a list of the documents

that contain the keyword. If no one has searched for the given keyword

before, we need to search the collection of text entries the hard way.

Download Search/soln/inverted-index/search-proc.sql

CREATE PROCEDURE BugsSearch(keyword VARCHAR(40))

BEGIN

SET @keyword = keyword;

3. This example stored procedure uses MySQL syntax.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

SOLUTION: USE THE RIGHT TOOL FOR THE JOB 202

➊ PREPARE s1 FROM 'SELECT MAX(keyword_id) INTO @k FROM Keywords

WHERE keyword = ?';

EXECUTE s1 USING @keyword;

DEALLOCATE PREPARE s1;

IF (@k IS NULL) THEN

➋ PREPARE s2 FROM 'INSERT INTO Keywords (keyword) VALUES (?)';

EXECUTE s2 USING @keyword;

DEALLOCATE PREPARE s2;

➌ SELECT LAST_INSERT_ID() INTO @k;

➍ PREPARE s3 FROM 'INSERT INTO BugsKeywords (bug_id, keyword_id)

SELECT bug_id, ? FROM Bugs

WHERE summary REGEXP CONCAT(''[[:<:]]'', ?, ''[[:>:]]'')

OR description REGEXP CONCAT(''[[:<:]]'', ?, ''[[:>]]'')';

EXECUTE s3 USING @k, @keyword, @keyword;

DEALLOCATE PREPARE s3;

END IF;

➎ PREPARE s4 FROM 'SELECT b.* FROM Bugs b

JOIN BugsKeywords k USING (bug_id)

WHERE k.keyword_id = ?';

EXECUTE s4 USING @k;

DEALLOCATE PREPARE s4;

END

➊ Search for the user-specified keyword. Return either the integer

primary key from Keywords.keyword_id or null if the word has not

been seen previously.

➋ If the word was not found, insert it as a new word.

➌ Query for the primary key value generated in Keywords.

➍ Populate the intersection table by searching Bugs for rows containing the new keyword.

➎ Finally, query the full rows from Bugs that match the keyword_id,

whether the keyword was found or had to be inserted as a new

entry.

Now we can call this stored procedure and pass the desired keyword.

The procedure returns the set of matching bugs, whether it has to

calculate the matching bugs and populate the intersection table for a

new keyword or whether it simply benefits from the result of an earlier

search.

Download Search/soln/inverted-index/search-proc.sql

CALL BugsSearch('crash');

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

SOLUTION: USE THE RIGHT TOOL FOR THE JOB 203

There’s another piece to this solution: we need to define a trigger to

populate the intersection table as each new bug is inserted. If you need

to support edits to bug descriptions, you may also have to write a trigger

to reanalyze text and add or delete rows in the BugsKeywords table.

Download Search/soln/inverted-index/trigger.sql

CREATE TRIGGER Bugs_Insert AFTER INSERT ON Bugs

FOR EACH ROW

BEGIN

INSERT INTO BugsKeywords (bug_id, keyword_id)

SELECT NEW.bug_id, k.keyword_id FROM Keywords k

WHERE NEW.description REGEXP CONCAT('[[:<:]]', k.keyword, '[[:>:]]')

OR NEW.summary REGEXP CONCAT('[[:<:]]', k.keyword, '[[:>:]]');

END

The keyword list is populated naturally as users perform searches, so

we don’t need to fill the keyword list with every word found in the

knowledge-base articles. On the other hand, if we can anticipate likely

keywords, we can easily run a search for them, thus bearing the initial

cost of being the first to search for each keyword so that doesn’t fall on

our users.

I used an inverted index for my knowledge-base application that I described at the start of this chapter. I also enhanced the Keywords table

with an additional column num_searches. I incremented this column

each time a user searched for a given keyword so I could track which

searches were most in demand.

You don’t have to use SQL to solve every problem.

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Enita non sunt multiplicanda praeter necessitatem

(Latin, “entities are not to be multiplied beyond necessity”).

William of Ockham

Chapter 18

Spaghetti Query

Your boss is on the phone with his boss, and he waves to you to come

over. He covers his phone receiver with his hand and whispers to you,

“The executives are in a budget meeting, and we’re going to have our

staff cut unless we can feed my VP some statistics to prove that our

department keeps a lot of people busy. I need to know how many products we work on, how many developers fixed bugs, the average bugs

fixed per developer, and how many of our fixed bugs were reported by

customers. Right now!”

You leap to your SQL tool and start writing. You want all the answers at

once, so you make one complex query, hoping to do the least amount

of duplicate work and therefore produce the results faster.

Download Spaghetti-Query/intro/report.sql

SELECT COUNT(bp.product_id) AS how_many_products,

COUNT(dev.account_id) AS how_many_developers,

COUNT(b.bug_id)/COUNT(dev.account_id) AS avg_bugs_per_developer,

COUNT(cust.account_id) AS how_many_customers

FROM Bugs b JOIN BugsProducts bp ON (b.bug_id = bp.bug_id)

JOIN Accounts dev ON (b.assigned_to = dev.account_id)

JOIN Accounts cust ON (b.reported_by = cust.account_id)

WHERE cust.email NOT LIKE '%@example.com'

GROUP BY bp.product_id;

The numbers come back, but they seem wrong. How did we get dozens

of products? How can the average bugs fixed be exactly 1.0? And it

wasn’t the number of customers; it was the number of bugs reported

by customers that your boss needs. How can all the numbers be so far

off? This query will be a lot more complex than you thought.

Your boss hangs up the phone. “Never mind,” he sighs. “It’s too late.

Let’s clean out our desks.”

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

OBJECTIVE: DECREASE SQL QUERIES 205

18.1 Objective: Decrease SQL Queries

One of the most common places where SQL programmers get stuck is

when they ask, “How can I do this with a single query?” This question is

asked for virtually any task. Programmers have been trained that one

SQL query is difficult, complex, and expensive, so they reason that two

SQL queries must be twice as bad. More than two SQL queries to solve

a problem is generally out of the question.

Programmers can’t reduce the complexity of their tasks, but they want

to simplify the solution. They state their goal with terms like “elegant”

or “efficient,” and they think they’ve achieved those goals by solving the

task with a single query.

18.2 Antipattern: Solve a Complex Problem in One Step

SQL is a very expressive language—you can accomplish a lot in a single

query or statement. But that doesn’t mean it’s mandatory or even a

good idea to approach every task with the assumption it has to be done

in one line of code. Do you have this habit with any other programming

language you use? Probably not.

Unintended Products

One common consequence of producing all your results in one query

is a Cartesian product. This happens when two of the tables in the

query have no condition restricting their relationship. Without such a

restriction, the join of two tables pairs each row in the first table to

every row in the other table. Each such pairing becomes a row of the

result set, and you end up with many more rows than you expect.

Let’s see an example. Suppose we want to query our bugs database to

count the number of bugs fixed, and the number of bugs open, for a

given product. Many programmers would try to use a query like the

following to calculate these counts:

Download Spaghetti-Query/anti/cartesian.sql

SELECT p.product_id,

COUNT(f.bug_id) AS count_fixed,

COUNT(o.bug_id) AS count_open

FROM BugsProducts p

LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED')

LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN')

WHERE p.product_id = 1

GROUP BY p.product_id;

Report erratum

this copy is (P1.0 printing, May 2010)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Thư viện tri thức trực tuyến

Tài liệu SQL Antipatterns- P5 ppt

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Tài liệu SQL Server

Tài liệu SQL Antipatterns- P3 ppt

Tài liệu SQL Antipatterns- P4 ppt

Tài liệu SQL Anywhere Studio 9- P6 docx

Tài liệu SQL Anywhere Studio 9- P7 docx

Tài liệu SQL Anywhere Studio 9- P9 pdf