Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu SQL Antipatterns- P3 ppt
Nội dung xem thử
Mô tả chi tiết
SOLUTION: SIMPLIFY THE RELATIONSHIP 101
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
issue_id BIGINT UNSIGNED NOT NULL,
author BIGINT UNSIGNED NOT NULL,
comment_date DATETIME,
comment TEXT,
FOREIGN KEY (issue_id) REFERENCES Issues(issue_id),
FOREIGN KEY (author) REFERENCES Accounts(account_id),
);
Note that the primary keys of Bugs and FeatureRequests are also foreign
keys. They reference the surrogate key value generated in the Issues
table, instead of generating a new value for themselves.
Given a specific comment, you can retrieve the referenced bug or feature request using a relatively simple query. You don’t have to include
the Issues table in that query at all, unless you defined attribute columns
in that table. Also, since the primary key value of the Bugs table and its
ancestor Issues table are the same, you can join Bugs directly to Comments. You can join two tables even if there is no foreign key constraint
linking them directly, as long as you use columns that represent comparable information in your database.
Download Polymorphic/soln/super-join.sql
SELECT *
FROM Comments AS c
LEFT OUTER JOIN Bugs AS b USING (issue_id)
LEFT OUTER JOIN FeatureRequests AS f USING (issue_id)
WHERE c.comment_id = 9876;
Given a specific bug, you can retrieve its comments just as easily.
Download Polymorphic/soln/super-join.sql
SELECT *
FROM Bugs AS b
JOIN Comments AS c USING (issue_id)
WHERE b.issue_id = 1234;
The point is that if you use an ancestor table like Issues, you can rely on
the enforcement of your database’s data integrity by foreign keys.
In every table relationship, there is one referencing table
and one referenced table.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The sublime and the ridiculous are often so nearly related
that it is difficult to class them separately.
Thomas Paine
Chapter 8
Multicolumn Attributes
I can’t count the number of times I have created a table to store people’s contact information. Always this kind of table has commonplace
columns such as the person’s name, salutation, address, and probably
company name.
Phone numbers are a little trickier. People use multiple numbers: a
home number, a work number, a fax number, and a mobile number are
common. In the contact information table, it’s easy to store these in
four columns.
But what about additional numbers? The person’s assistant, second
mobile phone, or field office have distinct phone numbers, and there
could be other unforeseen categories. I could create more columns for
the less common cases, but that seems clumsy because it adds seldomused fields to data entry forms. How many columns is enough?
8.1 Objective: Store Multivalue Attributes
This is the same objective as in Chapter 2, Jaywalking, on page 25:
an attribute seems to belong in one table, but the attribute has multiple values. Previously, we saw that combining multiple values into
a comma-separated string makes it hard to validate the values, hard
to read or change individual values, and hard to compute aggregate
expressions such as counting the number of distinct values.
We’ll use a new example to illustrate this antipattern. We want the bugs
database to allow tags so we can categorize bugs. Some bugs may be
categorized by the software subsystem that they affect, for instance
printing, reports, or email. Other bugs may be categorized by the nature
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
ANTIPATTERN: CREATE MULTIPLE COLUMNS 103
of the defect; for instance, a crash bug could be tagged crash, while you
could tag a report of slowness with performance, and you could tag a
bad color choice in the user interface with cosmetic.
The bug-tagging feature must support multiple tags, because tags are
not necessarily mutually exclusive. A defect could affect multiple systems or could affect the performance of printing.
8.2 Antipattern: Create Multiple Columns
We still have to account for multiple values in the attribute, but we
know the new solution must store only a single value in each column.
It might seem natural to create multiple columns in this table, each
containing a single tag.
Download Multi-Column/anti/create-table.sql
CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
description VARCHAR(1000),
tag1 VARCHAR(20),
tag2 VARCHAR(20),
tag3 VARCHAR(20)
);
As you assign tags to a given bug, you’d put values in one of these three
columns. Unused columns remain null.
Download Multi-Column/anti/update.sql
UPDATE Bugs SET tag2 = 'performance' WHERE bug_id = 3456;
bug_id description tag1 tag2 tag3
1234 Crashes while saving crash NULL NULL
3456 Increase performance printing performance NULL
5678 Support XML NULL NULL NULL
Most tasks you could do easily with a conventional attribute now become more complex.
Searching for Values
When searching for bugs with a given tag, you must search all three
columns, because the tag string could occupy any of these columns.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
ANTIPATTERN: CREATE MULTIPLE COLUMNS 104
For example, to retrieve bugs that reference performance, use a query
like the following:
Download Multi-Column/anti/search.sql
SELECT * FROM Bugs
WHERE tag1 = 'performance'
OR tag2 = 'performance'
OR tag3 = 'performance';
You might need to search for bugs that reference both tags, performance
and printing. To do this, use a query like the following one. Remember
to use parentheses correctly, because OR has lower precedence than
AND.
Download Multi-Column/anti/search-two-tags.sql
SELECT * FROM Bugs
WHERE (tag1 = 'performance' OR tag2 = 'performance' OR tag3 = 'performance')
AND (tag1 = 'printing' OR tag2 = 'printing' OR tag3 = 'printing');
The syntax required to search for a single value over multiple columns
is lengthy and tedious to write. You can make it more compact by using
an IN predicate in a slightly untraditional manner:
Download Multi-Column/anti/search-two-tags.sql
SELECT * FROM Bugs
WHERE 'performance' IN (tag1, tag2, tag3)
AND 'printing' IN (tag1, tag2, tag3);
Adding and Removing Values
Adding and removing a value from the set of columns presents its own
issues. Simply using UPDATE to change one of the columns isn’t safe,
since you can’t be sure which column is unoccupied, if any. You might
have to retrieve the row into your application to see.
Download Multi-Column/anti/add-tag-two-step.sql
SELECT * FROM Bugs WHERE bug_id = 3456;
In this case, for instance, the result shows you that tag2 is null. Then
you can form the UPDATE statement.
Download Multi-Column/anti/add-tag-two-step.sql
UPDATE Bugs SET tag2 = 'performance' WHERE bug_id = 3456;
You face the risk that in the moment after you query the table and
before you update it, another client has gone through the same steps
of reading the row and updating it. Depending on who applied their
update first, either you or he risks getting an update conflict error or
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
ANTIPATTERN: CREATE MULTIPLE COLUMNS 105
having his changes overwritten by the other. You can avoid this twostep query by using complex SQL expressions.
The following statement uses the NULLIF( ) function to make each column null if it equals a specific value. NULLIF( ) returns null if its two
arguments are equal.1
Download Multi-Column/anti/remove-tag.sql
UPDATE Bugs
SET tag1 = NULLIF(tag1, 'performance'),
tag2 = NULLIF(tag2, 'performance'),
tag3 = NULLIF(tag3, 'performance')
WHERE bug_id = 3456;
The following statement adds the new tag performance to the first column that is currently null. However, if none of the three columns is
null, then the statement makes no change to the row, and the new tag
value is not recorded at all. Also, constructing this statement is laborious. Notice you must repeat the string performance six times.
Download Multi-Column/anti/add-tag.sql
UPDATE Bugs
SET tag1 = CASE
WHEN 'performance' IN (tag2, tag3) THEN tag1
ELSE COALESCE(tag1, 'performance') END,
tag2 = CASE
WHEN 'performance' IN (tag1, tag3) THEN tag2
ELSE COALESCE(tag2, 'performance') END,
tag3 = CASE
WHEN 'performance' IN (tag1, tag2) THEN tag3
ELSE COALESCE(tag3, 'performance') END
WHERE bug_id = 3456;
Ensuring Uniqueness
You probably don’t want the same value to appear in multiple columns,
but when you use the Multicolumn Attributes antipattern, the database
can’t prevent this. In other words, it’s hard to prevent the following
statement:
Download Multi-Column/anti/insert-duplicate.sql
INSERT INTO Bugs (description, tag1, tag2, tag3)
VALUES ('printing is slow', 'printing', 'performance', 'performance');
1. The NULLIF( ) is a standard function in SQL; it’s supported by all brands except Informix
and Ingres.
Report erratum
this copy is (P1.0 printing, May 2010)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.