Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu SAS 9.1 SQL Procedure- P2 ppt
Nội dung xem thử
Mô tả chi tiết
46 Grouping by One Column Chapter 2
Grouping by One Column
The following example sums the populations of all countries to find the total
population of each continent:
proc sql;
title ’Total Populations of World Continents’;
select Continent, sum(Population) format=comma14. as TotalPopulation
from sql.countries
where Continent is not missing
group by Continent;
Note: Countries for which a continent is not listed are excluded by the WHERE
clause.
Output 2.42 Grouping by One Column
Total Populations of World Continents
Total
Continent Population
---------------------------------------------------
Africa 710,529,592
Asia 3,381,858,879
Australia 18,255,944
Central America and Caribbean 66,815,930
Europe 872,192,202
North America 384,801,818
Oceania 5,342,368
South America 317,568,801
Grouping without Summarizing
When you use a GROUP BY clause without an aggregate function, PROC SQL treats
the GROUP BY clause as if it were an ORDER BY clause and displays a message in the
log that informs you that this has happened. The following example attempts to group
high and low temperature information for each city in the SQL.WORLDTEMPS table
by country:
proc sql outobs=12;
title ’High and Low Temperatures’;
select City, Country, AvgHigh, AvgLow
from sql.worldtemps
group by Country;
The output and log show that PROC SQL transforms the GROUP BY clause into an
ORDER BY clause.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermar
Retrieving Data from a Single Table Grouping by Multiple Columns 47
Output 2.43 Grouping without Aggregate Functions
High and Low Temperatures
City Country AvgHigh AvgLow
-------------------------------------------------------
Algiers Algeria 90 45
Buenos Aires Argentina 87 48
Sydney Australia 79 44
Vienna Austria 76 28
Nassau Bahamas 88 65
Hamilton Bermuda 85 59
Sao Paulo Brazil 81 53
Rio de Janeiro Brazil 85 64
Quebec Canada 76 5
Montreal Canada 77 8
Toronto Canada 80 17
Beijing China 86 17
Output 2.44 Grouping without Aggregate Functions (Partial Log)
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because
neither the SELECT clause nor the optional HAVING clause of the
associated table-expression referenced a summary function.
Grouping by Multiple Columns
To group by multiple columns, separate the column names with commas within the
GROUP BY clause. You can use aggregate functions with any of the columns that you
select. The following example groups by both Location and Type, producing total square
miles for the deserts and lakes in each location in the SQL.FEATURES table:
proc sql;
title ’Total Square Miles of Deserts and Lakes’;
select Location, Type, sum(Area) as TotalArea format=comma16.
from sql.features
where type in (’Desert’, ’Lake’)
group by Location, Type;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark
48 Grouping and Sorting Data Chapter 2
Output 2.45 Grouping by Multiple Columns
Total Square Miles of Deserts and Lakes
Location Type TotalArea
--------------------------------------------------
Africa Desert 3,725,000
Africa Lake 50,958
Asia Lake 25,300
Australia Desert 300,000
Canada Lake 12,275
China Desert 500,000
Europe - Asia Lake 143,550
North America Desert 140,000
North America Lake 77,200
Russia Lake 11,780
Saudi Arabia Desert 250,000
Grouping and Sorting Data
You can order grouped results with an ORDER BY clause. The following example
takes the previous example and adds an ORDER BY clause to change the order of the
Location column from ascending order to descending order:
proc sql;
title ’Total Square Miles of Deserts and Lakes’;
select Location, Type, sum(Area) as TotalArea format=comma16.
from sql.features
where type in (’Desert’, ’Lake’)
group by Location, Type
order by Location desc;
Output 2.46 Grouping with an ORDER BY Clause
Total Square Miles of Deserts and Lakes
Location Type TotalArea
--------------------------------------------------
Saudi Arabia Desert 250,000
Russia Lake 11,780
North America Lake 77,200
North America Desert 140,000
Europe - Asia Lake 143,550
China Desert 500,000
Canada Lake 12,275
Australia Desert 300,000
Asia Lake 25,300
Africa Desert 3,725,000
Africa Lake 50,958
Grouping with Missing Values
When a column contains missing values, PROC SQL treats the missing values as a
single group. This can sometimes provide unexpected results.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark
Retrieving Data from a Single Table Grouping with Missing Values 49
Finding Grouping Errors Caused by Missing Values
In this example, because the SQL.COUNTRIES table contains some missing values
in the Continent column, the missing values combine to form a single group that has
the total area of the countries that have a missing value in the Continent column:
/* incorrect output */
proc sql outobs=12;
title ’Areas of World Continents’;
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
group by Continent
order by Continent, Name;
The output is incorrect because Bermuda, Iceland, and Kalaallit Nunaat are not
actually part of the same continent; however, PROC SQL treats them that way because
they all have a missing character value in the Continent column.
Output 2.47 Finding Grouping Errors Caused by Missing Values (Incorrect Output)
Areas of World Continents
Name Continent TotalArea
-----------------------------------------------------------------------
Bermuda 876,800
Iceland 876,800
Kalaallit Nunaat 876,800
Algeria Africa 11,299,595
Angola Africa 11,299,595
Benin Africa 11,299,595
Botswana Africa 11,299,595
Burkina Faso Africa 11,299,595
Burundi Africa 11,299,595
Cameroon Africa 11,299,595
Cape Verde Africa 11,299,595
Central African Republic Africa 11,299,595
To correct the query from the previous example, you can write a WHERE clause to
exclude the missing values from the results:
/* corrected output */
proc sql outobs=12;
title ’Areas of World Continents’;
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
where Continent is not missing
group by Continent
order by Continent, Name;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark
50 Filtering Grouped Data Chapter 2
Output 2.48 Adjusting the Query to Avoid Errors Due to Missing Values (Corrected Output)
Areas of World Continents
Name Continent TotalArea
-----------------------------------------------------------------------
Algeria Africa 11,299,595
Angola Africa 11,299,595
Benin Africa 11,299,595
Botswana Africa 11,299,595
Burkina Faso Africa 11,299,595
Burundi Africa 11,299,595
Cameroon Africa 11,299,595
Cape Verde Africa 11,299,595
Central African Republic Africa 11,299,595
Chad Africa 11,299,595
Comoros Africa 11,299,595
Congo Africa 11,299,595
Note: Aggregate functions, such as the SUM function, can cause the same
calculation to repeat for every row. This occurs whenever PROC SQL remerges data.
See “Remerging Summary Statistics” on page 41 for more information about
remerging.
Filtering Grouped Data
You can use a HAVING clause with a GROUP BY clause to filter grouped data. The
HAVING clause affects groups in a way that is similar to the way in which a WHERE
clause affects individual rows. When you use a HAVING clause, PROC SQL displays
only the groups that satisfy the HAVING expression.
Using a Simple HAVING Clause
The following example groups the features in the SQL.FEATURES table by type and
then displays only the numbers of islands, oceans, and seas:
proc sql;
title ’Numbers of Islands, Oceans, and Seas’;
select Type, count(*) as Number
from sql.features
group by Type
having Type in (’Island’, ’Ocean’, ’Sea’)
order by Type;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermar