Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Wiley - Data Mining with Microsoft SQL Server 2008 (2009)02 pptx
Nội dung xem thử
Mô tả chi tiết
Maclennan c01.tex V2 - 10/04/2008 1:59am Page 3
Introduction to Data Mining in SQL Server 2008 3
Figure 1-1 Student table
In contrast, the data mining approach for this problem is almost the reverse
of the query-and-explore method. Instead of guessing a hypothesis and trying
it out in different ways, you ask the question in terms of the data that can
support many hypotheses, and allow your data mining system to explore them
for you.
In this case, you indicate that the columns IQ, Gender, ParentIncome,
and ParentEncouragement are to be used as hypotheses in determining
CollegePlans. As the data mining system passes over the data, it analyzes the
influence of each input column on the target column.
Figure 1-2 shows the hypothetical result of a decision tree algorithm operating on this data set. In this case, each path from the root node to the leaf node
forms a rule about the data. Looking at this tree, you see that students with IQs
greater than 100 and who are encouraged by their parents are highly likely to
attend college. In this case, you have extracted knowledge from the data.
As shown here, data mining applies algorithms such as decision trees,
clustering, association, time series, and so on to a data set, and then analyzes
its contents. This analysis produces patterns, which can be explored for
valuable information. Depending on the underlying algorithm, these patterns
can be in the form of trees, rules, clusters, or simply a set of mathematical
formulas. The information found in the patterns can be used for reporting (to